Revealing the nature of hidden charm pentaquarks with machine learning

Zhenyu Zhang Guangdong Provincial Key Laboratory of Nuclear Science, Institute of Quantum Matter, South China Normal University, Guangzhou 510006, China Guangdong-Hong Kong Joint Laboratory of Quantum Matter, Southern Nuclear Science Computing Center, South China Normal University, Guangzhou 510006, China Jiahao Liu Guangdong Provincial Key Laboratory of Nuclear Science, Institute of Quantum Matter, South China Normal University, Guangzhou 510006, China Guangdong-Hong Kong Joint Laboratory of Quantum Matter, Southern Nuclear Science Computing Center, South China Normal University, Guangzhou 510006, China Jifeng Hu hujf@m.scnu.edu.cn Guangdong Provincial Key Laboratory of Nuclear Science, Institute of Quantum Matter, South China Normal University, Guangzhou 510006, China Guangdong-Hong Kong Joint Laboratory of Quantum Matter, Southern Nuclear Science Computing Center, South China Normal University, Guangzhou 510006, China Qian Wang qianwang@m.scnu.edu.cn Guangdong Provincial Key Laboratory of Nuclear Science, Institute of Quantum Matter, South China Normal University, Guangzhou 510006, China Guangdong-Hong Kong Joint Laboratory of Quantum Matter, Southern Nuclear Science Computing Center, South China Normal University, Guangzhou 510006, China Ulf-G. Meißner meissner@hiskp.uni-bonn.de Helmholtz-Institut für Strahlen- und Kernphysik and Bethe Center for Theoretical Physics, Universität Bonn, D-53115 Bonn, Germany Institute for Advanced Simulation, Institut für Kernphysik and Jülich Center for Hadron Physics, Forschungszentrum Jülich, D-52425 Jïlich, Germany Tbilisi State University, 0186 Tbilisi, Georgia

Abstract

We study the nature of the hidden charm pentaquarks, i.e. the $P_{c}(4312)$ , $P_{c}(4440)$ and $P_{c}(4457)$ , with a neural network approach in pionless effective field theory. In this framework, the normal $\chi^{2}$ fitting approach cannot distinguish the quantum numbers of the $P_{c}(4440)$ and $P_{c}(4457)$ . In contrast to that, the neural network-based approach can discriminate them, which still cannot be seen as a proof of the spin of the states since pion exchange is not considered in the approach. In addition, we also illustrate the role of each experimental data bin of the invariant $J/\psi p$ mass distribution on the underlying physics in both neural network and fitting methods. Their similarities and differences demonstrate that neural network methods can use data information more effectively and directly. This study provides more insights about how the neural network-based approach predicts the nature of exotic states from the mass spectrum.

Keywords: Exotic hadrons, Machine learning, QCD mass spectrum

I Introduction

In the last two decades, we have witnessed the emergence of tens of candidates for the so-called exotic hadrons which are beyond the conventional meson and baryon configurations, such as the hidden charm pentaquarks [1, 2, 3], the fully charmed tetraquark $X(6900)$ [4, 5, 6], or the doubly charmed tetraquark $T_{cc}^{+}$ [7, 8]. The first hidden charm pentaquarks were reported by the LHCb Collaboration in 2015 in Ref. [1] by observing the $J/\psi p$ invariant mass via the process $\Lambda_{b}\to J/\psi pK^{-}$ . Four years later, the $P_{c}(4312)$ , $P_{c}(4440)$ and $P_{c}(4457)$ were reported with an order of magnitude larger luminosity in Ref. [3]. Since then, many interpretations have been proposed for these states, e.g. compact multi-quark, hadronic molecule, hybrid, and triangle singularity [9, 10, 11, 12, 13, 14, 15, 16, 17, 16, 18, 19, 20]. Traditionally, judging whether a model works or not is to compare it with the experimental data. However, such a top-down approach makes the conclusion model-dependent. Physicists are thus trying to find bottom-up approaches [21] to obtain a definitive conclusion. One direction is to use machine learning to explore hadron properties, which benefits from the evolution of computing capabilities during the last decades. For instance, genetic algorithms [22, 23, 24] and neural networks [25, 26, 27, 28] can be utilized to explore the nature of hadrons or the properties of nuclei. Machine learning has been widely used in nuclear physics [29, 30, 31, 32, 33, 34, 35], high energy nuclear physics [36, 37, 38], experimental data analysis [39, 40], and theoretical physics [41, 42], even to discover the physical principles underneath the experimental data [21, 43, 44]. However, its application in hadron physics is in the early stage. A preliminary attempt was made in Refs. [45, 46, 21, 47] for the hadronic molecular picture [12], which is one of the natural explanations of near-threshold peaks. Refs. [48, 47] demonstrate that deep learning can be applied to classify poles and regress the model parameters in the one-channel coupling case.

Here, extending and improving the works [45, 46], we consider the multi-channel coupling case. We take the hidden charm pentaquarks, i.e. $P_{c}(4312)$ , $P_{c}(4440)$ and $P_{c}(4457)$ [3] as examples, to achieve the following goals:

(1)

In the $\Sigma_{c}^{(*)}\bar{D}^{(*)}$ hadronic molecular picture [49, 50, 51], both the $P_{c}(4440)$ and the $P_{c}(4457)$ are related to the $\Sigma_{c}\bar{D}^{*}$ threshold. Their spin-parity ( $J^{P}$ ) could be either $\frac{1}{2}^{-},~{}\frac{3}{2}^{-}$ (solution A) or $\frac{3}{2}^{-},~{}\frac{1}{2}^{-}$ (solution B) [49, 50, 51], from the viewpoint of pionless effective field theory (EFT). In solution A, the lower mass hidden charm pentaquark has lower spin, and vice versa in solution B. Although the $\chi^{2}$ fitting cannot distinguish the two solutions, we try to find out which solution is preferred in machine learning.
(2)

In the traditional fitting approach, the physics is embedded in the model parameters. One has to extract the model parameters from the experimental data and further extract the physical quantities of interest. We use a neural network (NN) based approach to extract the pertinent physical quantities directly and illustrate the role of each experimental data point (here the bins in the mass distributions) in the physics.

We have used one particular NN for the above goals and demonstrated that the NN-based approach consistently favors solution A over B compared to the fitting approach, and clearly gives the properties of the poles in the multi-channel case.

II Model Description

The $P_{c}(4312)$ and $P_{c}(4440)/P_{c}(4457)$ states are close to the $\Sigma_{c}\bar{D}$ and $\Sigma_{c}\bar{D}^{*}$ thresholds, respectively, making them prime candidates for hadronic molecules. In the heavy quark limit, these hidden charm pentaquarks are related to the scattering between the $(\Sigma_{c},\Sigma_{c}^{*})$ and the $(\bar{D},\bar{D}^{*})$ heavy quark spin doublets. Note that the heavy quark symmetry breaking effect is about $(m_{\rho}/m_{D})^{2}\sim 25\%$ in the charm sector, with $m_{\rho}$ and $m_{D}$ the masses of the $\rho$ and $D$ mesons. Since both the $P_{c}(4440)$ and $P_{c}(4457)$ are related to the $\Sigma_{c}\bar{D}^{*}$ channel, their mass difference comes from the spin-spin interaction between the $\Sigma_{c}$ baryon and the $\bar{D}^{*}$ meson for a given total spin, which should vanish in the exact heavy quark limit. Thus in the heavy quark limit the $P_{c}(4440)$ and $P_{c}(4457)$ should coincide with each other. Their hyperfine splitting stems from the heavy quark symmetry breaking effect. Consequently, we consider the heavy quark symmetry breaking effect from the mass difference in the same doublet. By constructing the interactions between the two doublets, i.e. the $(\Sigma_{c},\Sigma_{c}^{*})$ and the $(\bar{D},\bar{D}^{*})$ doublets, in the EFT respecting the heavy quark spin symmetry (HQSS), Refs. [51, 50] extracted the pole positions of the seven hidden charm pentaquarks by fitting the $J/\psi p$ invariant mass distribution for the process $\Lambda_{b}\to J/\psi pK^{-}$ ¹¹1Note that there is additional dynamic $\Lambda_{c}\bar{D}^{(*)}$ channel in Ref. [51] in comparison with Ref. [50]. However, the parameters of the $\Lambda_{c}\bar{D}^{(*)}$ channel are out-of-control due to the absence of the experimental data in the $\Lambda_{c}\bar{D}^{*}$ channel and the properties of the poles are driven by the $\Sigma_{c}^{(*)}\bar{D}^{(*)}$ channel. As the result, we only consider the elastic $\Sigma_{c}^{(*)}\bar{D}^{(*)}$ channel and the inelastic $\eta_{c}p$ , $J/\psi p$ channels in our framework. . Following the same procedure, we consider the $\Sigma_{c}^{(*)}\bar{D}^{(*)}$ and $J/\psi p$ , $\eta_{c}p$ channels as elastic and inelastic channels, respectively. The potential quantum numbers of hidden charm pentaquarks are $\frac{1}{2}^{-},~{}\frac{3}{2}^{-}$ and $\frac{5}{2}^{-}$ which correspond to the scattering of the $\Sigma_{c}\bar{D}-\Sigma_{c}\bar{D}^{*}-\Sigma_{c}^{*}\bar{D}^{*}$ , $\Sigma_{c}^{*}\bar{D}-\Sigma_{c}\bar{D}^{*}-\Sigma_{c}^{*}\bar{D}^{*}$ and $\Sigma_{c}^{*}\bar{D}^{*}$ , respectively. In the heavy quark limit, their interaction only depends on the light degrees of freedom [52]. To extract their interactions, one can expand the two-hadron basis in terms of the heavy-light basis $|H\otimes L\rangle$ , with $H$ and $L$ the heavy and light degrees of freedom, respectively,

\displaystyle\begin{pmatrix}|\Sigma_{c}\bar{D}\rangle\\[6.0pt] |\Sigma_{c}\bar{D}^{*}\rangle\\[6.0pt] |\Sigma_{c}^{*}\bar{D}^{*}\rangle\end{pmatrix}_{\frac{1}{2}}=\begin{pmatrix}\frac{1}{2}&\frac{1}{2\sqrt{3}}&\sqrt{\frac{2}{3}}\\[6.0pt] \frac{1}{2\sqrt{3}}&\frac{5}{6}&-\frac{\sqrt{2}}{3}\\[6.0pt] \sqrt{\frac{2}{3}}&-\frac{\sqrt{2}}{3}&-\frac{1}{3}\end{pmatrix}\begin{pmatrix}|0\otimes\frac{1}{2}\rangle\\[6.0pt] |1\otimes\frac{1}{2}\rangle\\[6.0pt] |1\otimes\frac{3}{2}\rangle\\ \end{pmatrix},

(1)

\displaystyle\begin{pmatrix}|\Sigma_{c}\bar{D}^{*}\rangle\\[6.0pt] |\Sigma_{c}^{*}\bar{D}\rangle\\[6.0pt] |\Sigma_{c}^{*}\bar{D}^{*}\rangle\end{pmatrix}_{\frac{3}{2}}=\begin{pmatrix}\frac{1}{\sqrt{3}}&-\frac{1}{3}&\frac{\sqrt{5}}{3}\\[6.0pt] -\frac{1}{2}&\frac{1}{\sqrt{3}}&\frac{1}{2}\sqrt{\frac{5}{3}}\\[6.0pt] \frac{1}{2}\sqrt{\frac{5}{3}}&\frac{\sqrt{5}}{3}&-\frac{1}{6}\end{pmatrix}\begin{pmatrix}|0\otimes\frac{3}{2}\rangle\\[6.0pt] |1\otimes\frac{1}{2}\rangle\\[6.0pt] |1\otimes\frac{3}{2}\rangle\\ \end{pmatrix},

(2)

\displaystyle|\Sigma_{c}^{*}\bar{D}^{*}\rangle_{\frac{5}{2}}=|1\otimes\frac{3}{2}\rangle.

(3)

Here, the subscripts $\frac{1}{2}$ , $\frac{3}{2}$ and $\frac{5}{2}$ give the total spin of the two-hadron system. In the HQSS, as the interaction only depends on the light degrees of freedom [52], the scattering amplitude is described by the two parameters [51]

	$\displaystyle C_{\frac{1}{2}}$	$\displaystyle\equiv$	$\displaystyle\langle H\otimes\frac{1}{2}\left\|\hat{\mathcal{H}}\right\|H\otimes\frac{1}{2}\rangle,$		(4)
	$\displaystyle C_{\frac{3}{2}}$	$\displaystyle\equiv$	$\displaystyle\langle H\otimes\frac{3}{2}\left\|\hat{\mathcal{H}}\right\|H\otimes\frac{3}{2}\rangle,$		(5)

with the subscripts corresponding to the spin of the light degrees of freedom. The coupling for the $S$ -wave and $D$ -wave inelastic channels, i.e. the $J/\psi p$ and $\eta_{c}p$ channels, are described by two parameters

	$\displaystyle g_{S}$	$\displaystyle\equiv$	$\displaystyle\langle 1\otimes\frac{1}{2}\left\|\hat{\mathcal{H}}\right\|J/\psi p\rangle_{S}=\langle 0\otimes\frac{1}{2}\left\|\hat{\mathcal{H}}\right\|\eta_{c}p\rangle_{S},$		(6)
	$\displaystyle g_{D}k^{2}$	$\displaystyle\equiv$	$\displaystyle\langle 1\otimes\frac{3}{2}\left\|\hat{\mathcal{H}}\right\|J/\psi p\rangle_{D}=\langle 0\otimes\frac{3}{2}\left\|\hat{\mathcal{H}}\right\|\eta_{c}p\rangle_{D},$		(7)

respectively. The weak bare production amplitude can also be parameterized in terms of the $|H\otimes L\rangle$ basis

\displaystyle\mathcal{F}_{i}^{J}\equiv\langle\Lambda_{b}\left|\hat{\mathcal{H}}_{W}\right|(H\otimes L)_{i}^{J}K^{-}\rangle,

(8)

with $(H\otimes L)_{i}^{J}$ the $i$ th state in the $|H\otimes L\rangle$ basis for spin $J$ . With this definition, the bare decay amplitudes read

\displaystyle P^{\frac{1}{2}}=\begin{pmatrix}\frac{1}{2}\mathcal{F}_{1}^{\frac{1}{2}}+\frac{1}{2\sqrt{3}}\mathcal{F}_{2}^{\frac{1}{2}}+\sqrt{\frac{2}{3}}\mathcal{F}_{3}^{\frac{1}{2}}\\[6.0pt] \frac{1}{2\sqrt{3}}\mathcal{F}_{1}^{\frac{1}{2}}+\frac{5}{6}\mathcal{F}_{2}^{\frac{1}{2}}-\frac{\sqrt{2}}{3}\mathcal{F}_{3}^{\frac{1}{2}}\\[6.0pt] \sqrt{\frac{2}{3}}\mathcal{F}_{1}^{\frac{1}{2}}-\frac{\sqrt{2}}{3}\mathcal{F}_{2}^{\frac{1}{2}}-\frac{1}{3}\mathcal{F}_{3}^{\frac{1}{2}}\end{pmatrix},

(9)

\displaystyle P^{\frac{3}{2}}=\begin{pmatrix}\frac{1}{\sqrt{3}}\mathcal{F}_{1}^{\frac{3}{2}}-\frac{1}{3}\mathcal{F}_{2}^{\frac{3}{2}}+\frac{\sqrt{5}}{3}\mathcal{F}_{3}^{\frac{3}{2}}\\[6.0pt] -\frac{1}{2}\mathcal{F}_{1}^{\frac{3}{2}}+\frac{1}{\sqrt{3}}\mathcal{F}_{2}^{\frac{3}{2}}+\frac{1}{2}\sqrt{\frac{5}{3}}\mathcal{F}_{3}^{\frac{3}{2}}\\[6.0pt] \frac{1}{2}\sqrt{\frac{5}{3}}\mathcal{F}_{1}^{\frac{3}{2}}+\frac{\sqrt{5}}{3}\mathcal{F}_{2}^{\frac{3}{2}}-\frac{1}{6}\mathcal{F}_{3}^{\frac{3}{2}}\end{pmatrix},

(10)

\displaystyle P^{\frac{5}{2}}=\mathcal{F}_{1}^{\frac{5}{2}},

(11)

with the upper index $J=\frac{1}{2},\frac{3}{2},\frac{5}{2}$ for the total spin of the two-hadron system. Putting pieces together, the inelastic amplitude of the $J/\psi p$ for the decay process $\Lambda_{b}\to J/\psi pK^{-}$ can be expressed as [51, 50]

\displaystyle U^{J}_{i}(E,k)=-\sum_{\beta}\int\frac{d^{3}\bf{q}}{(2\pi)^{3}}\mathcal{V}^{J}_{i\beta}(k)G_{\beta}(E,q)U^{J}_{\beta}(E,q).

(12)

Refer to caption — Figure 1: (Color online) Graphical representation for the decay amplitudes $\Lambda_{b}\to K^{-}J/\psi(\eta_{c})p$ (a) and $\Lambda_{b}\to K^{-}\Sigma_{c}^{(*)}\bar{D}^{(*)}$ (b), with the filled squares representing their physical decay amplitudes. The physical decay amplitude $\Lambda_{b}\to K^{-}\Sigma_{c}^{(*)}\bar{D}^{(*)}$ is obtained by solving the Lippmann-Schwinger equation (LSE) Eq.(14). The first filled circle of figure (b) stands for the bare decay amplitude of the $\Lambda_{b}\to K^{-}\Sigma_{c}^{(*)}\bar{D}^{(*)}$ process. The other filled circles represent the transitions among the elastic $\Sigma_{c}^{(*)}\bar{D}^{(*)}$ and the inelastic channels $J/\psi p$ and $\eta_{c}p$ .

The corresponding diagrams are shown in Fig. 1a. Here, $U^{J}_{i}$ is the $i$ th $J/\psi p$ inelastic channel amplitude with the quantum number $J$ . $\mathcal{V}^{J}_{i\beta}$ is the transition vertex between the $\beta$ th elastic and the $i$ th inelastic channel, and described by the coupling constants $g_{S}$ for the $S$ -wave and $g_{D}$ for the $D$ -wave.

\displaystyle G_{\beta}(E,q)=\frac{2\mu_{\beta}}{q^{2}-p_{\beta}^{2}-i\epsilon},\quad p_{\beta}^{2}\equiv 2\mu_{\beta}(E-m_{\mathrm{th}}^{\beta}),

(13)

is the two-body propagator of the $\beta$ th elastic channel [51, 50], with $\mu_{\beta}$ and $m_{\mathrm{th}}^{\beta}$ the reduced mass and threshold of the $\beta$ th channel. The physical production amplitude of the $\beta$ th elastic channel $U^{J}_{\beta}$ for total spin $J$ is obtained from the Lippmann-Schwinger (LSE)

	$\displaystyle U^{J}_{\alpha}(E,p)$
	$\displaystyle=P^{J}_{\alpha}-\sum_{\beta}\int\frac{d^{3}\bf{q}}{(2\pi)^{3}}V^{J}_{\alpha\beta}(E,p,q)G_{\beta}(E,q)U^{J}_{\beta}(E,q),$		(14)

with $P^{J}_{\alpha}$ , constructed by seven parameters $\mathcal{F}^{J}_{n}$ defined as in Eq. (8), the production amplitude for the $\alpha$ th elastic channel. The corresponding diagrams are shown in Fig. 1b.

Note that the above integral equations can be reduced to algebric equations in the framework of a pionless EFT. For a general introduction to this type of EFT, see [53]. $V^{J}$ is the effective potential

\displaystyle V^{J}=V_{C}+V^{J}_{\mathrm{in}}(E),

(15)

which contains both the scattering between the elastic channels $V_{C}$ described by two parameters $C_{1/2}$ , $C_{3/2}$ and that $V^{J}_{\mathrm{in}}(E)$ between the elastic and inelastic channels described by the two parameters $g_{S}$ , $g_{D}$ . In total, eleven parameters are combined in a vector

	$\displaystyle\mathcal{P}$	$\displaystyle=$	$\displaystyle\bigl{(}g_{S},g_{D},C_{\frac{1}{2}},C_{\frac{3}{2}},$		(16)
			$\displaystyle\mathcal{F}^{\frac{1}{2}}_{1},\mathcal{F}^{\frac{1}{2}}_{2},\mathcal{F}^{\frac{1}{2}}_{3},\mathcal{F}^{\frac{3}{2}}_{1},\mathcal{F}^{\frac{3}{2}}_{2},\mathcal{F}^{\frac{3}{2}}_{3},\mathcal{F}^{\frac{5}{2}}_{1}\bigr{)}.$		(16)

To describe the $J/\psi p$ invariant mass distribution, a composite model is constructed via a probability distribution function (PDF) as:

	$\displaystyle{\rm PDF}(E;\mathcal{P})=\alpha\sum_{J}\int\|U^{J}\|^{2}\mathrm{p.s.}(E)G(E^{\prime}-E)dE^{\prime}$
	$\displaystyle+\quad(1-\alpha){\rm Chebyshev_{6}}(E)~{},$		(17)

with $U^{J}$ the $\Lambda_{b}\to J/\psi pK^{-}$ amplitude (14) for total spin $J=\frac{1}{2},~{}\frac{3}{2},~{}\frac{5}{2}$ and $\mathrm{p.s.}(E)$ is the phase space of the corresponding process. A Gaussian function $G(E^{\prime}-E)$ which represents the experimental detector resolution is convoluted with the physical invariant mass distribution. Here, we take the energy resolution as a constant of $\sigma\sim 2.3$ MeV [3] without considering its energy dependence. Further, ${\rm Chebyshev_{6}}$ is the sixth-order Chebyshev polynomial which represents the background distribution, with $1-\alpha$ its fraction, $\alpha\in[0,1]$ . The background fraction in the data is determined to be (96.0 $\pm$ 0.8)% as discussed in the Supplementary materials. The coefficients ( $c_{0},...,c_{6}$ ) for the background component are obtained by fitting the Chebyshev polynomials to data [54], as listed in the Supplementary materials. Note that the area enclosed by the signal (background) distribution, i.e. the remnant of the first term after removing the factor $\alpha$ (the second term after removing the factor $1-\alpha$ ) in Eq. (17) is normalized to one.

III States and Labels

The hidden charm pentaquarks are described as generated from the scattering of the $(\Sigma_{c},\Sigma_{c}^{*})$ and $(\bar{D},\bar{D}^{*})$ doublets. As the thresholds of the inelastic channels $J/\psi p$ and $\eta_{c}p$ are far away from those of elastic channels, their effects on the classification of the poles relevant to the physical observables are marginal. Thus, we consider the classification of poles by considering the elastic channels. In this case, the $\frac{1}{2}^{-}$ and $\frac{3}{2}^{-}$ states are given by a three-channel case with $2^{3}=8$ Riemann surfaces denoted as $\mathrm{R}_{\pm\pm\pm}$ , see [20] for a pictorial. Here, the “+” and “ $-$ ” signs in the $i$ th position mean the physical and unphysical sheet of the $i$ th channel, respectively. The physical observables, i.e. the $J/\psi p$ invariant mass distribution in our case, are real, reflecting the amplitude in the real axis. Those are related to the poles on the complex plane. We consider the poles most relevant to the physical observables, i.e. the poles on the physical sheet $\mathrm{R}_{+++}$ and those close-by ones. As we aim at extracting the most important information of those poles, it is sufficient to work with leading order contact interactions. In this case, the poles accounting for the near threshold structures can be classified as in Fig. 2. For the classification ²²2For the classification of poles, strictly, bound states do not decay to any final particles, which means them stable. However, in the standard model, there are only 11 stable particles, i.e. photon, the three generations of neutrino and antineutrino, electron, positron, proton, and antiproton. Other particles can decay into final states with lower masses. In our case, bound state, virtual state, and resonance are defined based on the most coupled elastic channels. of the states, we start from the one-channel case, i.e. $J^{P}=\frac{5}{2}^{-}$ state related to the $\Sigma_{c}^{*}\bar{D}^{*}$ channel. The “bound state”, “resonance”, and “virtual state” are defined as a pole on the physical sheet below threshold, a pole on the unphysical sheet above the threshold, and a pole on the unphysical sheet below the threshold, respectively (Fig. 2a). Those poles are labeled as $0$ , $1$ , $2$ in order. For the three-channel case (Fig. 2b), i.e. $J^{P}=\frac{1}{2}^{-}$ and $J^{P}=\frac{3}{2}^{-}$ channels, the poles are defined according to the most strongly coupled channel. The case with three poles on the $\mathrm{R}_{+++}$ , $\mathrm{R}_{-++}$ and $\mathrm{R}_{--+}$ sheets slightly below the first, second, and third thresholds, respectively, is defined as a “bound state” which is labeled as $0$ . The case with three poles on the $\mathrm{R}_{-++}$ , $\mathrm{R}_{--+}$ and $\mathrm{R}_{---}$ sheets slightly above the first, second, and third thresholds, respectively, is defined as a “resonance” which is labeled as $1$ . The case with three poles on the $\mathrm{R}_{-++}$ , $\mathrm{R}_{--+}$ and $\mathrm{R}_{---}$ sheets slightly below the first, second, and third thresholds, respectively, is defined as a “virtual state” which is labeled as $2$ . Note that these three states are defined according to their most strongly coupled channel, which the word “slightly” means. For instance, on the sheet $\mathrm{R}_{-++}$ , both the “resonance” (blue circle point) and “bound state” (green crossed point) locate between the first and the second thresholds.

However, the former one strongly couples to the first channel and is located close to the first threshold, defined as a resonance of the first channel. The latter one strongly couples to the second channel and is located close to the second threshold, defined as a bound state of the second channel. The situation for the other states is similar.

We use a four-digit label $``olll"$ , with $l=0,1,2$ and $o=0,1$ , to denote the situation of the poles. The last three labels are for the nature of the poles in the $J^{P}=\frac{1}{2}^{-},~{}\frac{3}{2}^{-}$ and $\frac{5}{2}^{-}$ channels in order. The first label “ $o$ ”, which we put in front, is used for the mass order. That is due to the indeterminacy of the quantum numbers of the $P_{c}(4440)$ and $P_{c}(4457)$ . As they are close to the $\Sigma_{c}\bar{D}^{*}$ threshold, the $S$ -wave interaction can give both $J^{P}=\frac{1}{2}^{-}$ and $\frac{3}{2}^{-}$ states. In the limit of HQSS, there are two solutions, i.e. Solution A and Solution B defined in Refs. [49, 50, 51], as discussed before. In Solution A, the higher spin state has larger mass, i.e. $J_{P_{c}(4440)}=\frac{1}{2}$ and $J_{P_{c}(4457)}=\frac{3}{2}$ . In Solution B, the situation is reversed, i.e. $J_{P_{c}(4457)}=\frac{1}{2}$ and $J_{P_{c}(4440)}=\frac{3}{2}$ . To distinguish these two scenarios, another label $o=1$ and $o=0$ for Solution A and Solution B, respectively, is required. In total, we have a four-digit label “ $olll$ ” to denote the situation of the poles. Taking the “1012” case for example, it means that the poles for $J^{P}=\frac{1}{2}^{-},\frac{3}{2}^{-}$ and $\frac{5}{2}^{-}$ channels are “bound state”, “resonance” and “virtual state”, respectively. The first label $1$ means Solution A, i.e. $J_{P_{c}(4440)}=\frac{1}{2},~{}J_{P_{c}(4457)}=\frac{3}{2}$ . Note that we can also have cases with the poles different from the three situations discussed above, e.g. the poles are far away from the thresholds. However, their probabilities are almost zero and are thus neglected.

IV Monte Carlo simulation

Based on the PDF given in Eq. (17), 240184 samples corresponding to physical states, i.e. with poles not too far away from the real axis in the considered energy range, are selected among 1.85 million samples uniformly generated in the space of $\mathcal{P}$ , distributed in the mass window from 4.25 $\mathrm{GeV}$ to 4.55 $\mathrm{GeV}$ . These samples are represented as a set: $\{\mathcal{H}_{j},\mathcal{L}_{j}\}$ where the $j$ index refers to a given sample. A histogram $\mathcal{H}$ with 150 bins (see Fig. 3) denotes the invariant mass spectrum of the $P_{c}$ states (background included) and the label $\mathcal{L}$ indicates the state label defined in Sec. III.

The values of $\mathcal{P}$ are uniformly sampled in the following ranges,

$\displaystyle g_{S}\in[0,10]\ {\rm GeV}^{-2},$	$\displaystyle\quad\mathcal{F}^{\frac{1}{2}}_{3}$	$\displaystyle\in[-3600,-3300],$
$\displaystyle g_{D}\in[0.5,1.5]\times g_{S},$	$\displaystyle\quad\mathcal{F}^{\frac{3}{2}}_{1}$	$\displaystyle\in[-3900,-3600],$
$\displaystyle C_{1/2}\in[-20,0]{\rm GeV}^{-2},$	$\displaystyle\quad\mathcal{F}^{\frac{3}{2}}_{2}$	$\displaystyle\in[-1900,-1600],$	(18)
$\displaystyle C_{3/2}\in[0.5,1.5]\times C_{1/2},$	$\displaystyle\quad\mathcal{F}^{\frac{3}{2}}_{3}$	$\displaystyle\in[-4800,-4500],$
$\displaystyle\mathcal{F}^{\frac{1}{2}}_{1}\in[0,300],$	$\displaystyle\quad\mathcal{F}^{\frac{5}{2}}_{1}$	$\displaystyle\in[600,900].$
$\displaystyle\mathcal{F}^{\frac{1}{2}}_{2}\in[700,1000],$

The ranges are empirically taken to produce a mass spectrum similar to the experimental data. Note that $g_{S}$ and $g_{D}$ as well as $C_{1/2}$ and $C_{3/2}$ are strongly correlated, so the second of these coefficients is related by a factor in the range $[0.5,1.5]$ to the corresponding first one. Note that these strong correlations were found in Refs. [50, 51], and are also found if one uses a much larger set of samples to train the NNs. The Monte Carlo (MC) production is performed with the open source software ROOT [54] and GSL [55], as categorized in Tab. 1. Considering different background fractions, the set of MC samples produced at the background fraction $1-\alpha=$ 90% is denoted as $\{\mathcal{S}^{90}\}$ , so are the other sets.

Table 1: Distribution of parameters. The state label of the second column is defined as in Fig. 2. The first label

``0"

means

J_{P_{c}(4440)}=\frac{3}{2}

and

~{}J_{P_{c}(4457)}=\frac{1}{2}

. The first label

``1"

means

J_{P_{c}(4440)}=\frac{1}{2}

and

~{}J_{P_{c}(4457)}=\frac{3}{2}

. The last column represents the number of samples for this label.

Mass relation label	State label	Number of samples
0	000	46,951
1	000	4283
1	001	1260
1	002	4360
0	100	3740
0	110	4320
0	111	7520
1	111	360
0	200	9590
1	200	280
1	210	3980
1	211	2690
1	220	50,240
1	221	50,512
1	222	50,098

V Training and Verification

The NN is implemented with an infrastructure of ResNet [56] as discussed in the Supplementary materials, and solved with the Adam [57] optimizer parametrized by the cross-entropy loss function. In order to balance the impact of unequal sample sizes, the cross-entropy loss function is weighted by the reciprocal of the number of samples. To avoid the network falling into a local minimum solution during the training process, the weights of the neurons are randomly initialized with a normal distribution centered at 0 and with a width of 0.01, and the biases of the neurons are set to zero. A reasonable solution can be obtained after training 500 epochs, using an initial learning rate value of 0.001.

Here the learning rate of the next interval is reduced to half of the previous one, controlled by the stepLR method. The interval is set to 100 epochs in our case. Note that 70% of MC simulation samples are used for training and the rest 30% for a benchmark. To avoid overfitting in the training, dropout layers are introduced in the network during the training. The dropout probability is chosen to 0.3 to optimise performance. In case of overfitting, the obtained NN works well on training datasets while probably does not on testing datasets. Thus the loss function calculated with training datasets should obviously differ from the one calculated with testing datasets. However, Fig. 9 in the Supplementary materials indicates that there is no obvious difference. The training details are summarized in the Supplementary materials. In short, the NNs successfully retrieve the state label with an accuracy (standard deviation) of 75.91(1.18)%, 73.14(1.05)%, 65.25(1.80)% and 54.35(2.32)% for MC simulation samples $\{\mathcal{S}^{90}\}$ , $\{\mathcal{S}^{92}\}$ , $\{\mathcal{S}^{94}\}$ and $\{\mathcal{S}^{96}\}$ . The prediction accuracy decreases as the background increases, as expected.

Turning to the mass relation, we verified 100 $\{\mathcal{S}^{90}\}$ samples corresponding to the label “1xxx”, and another 100 samples corresponding to the label “0xxx”. Fig. 4a/b shows the predicted probability distribution for the label “1xxx/0xxx” samples, in which all labels are successfully retrieved. In other words, the NN can accurately predict the mass relationship. For comparison, we also perform an analysis with the $\chi^{2}$ /ndf-fitting method. Fig. 4c/d shows the normalized $\chi^{2}$ /ndf distributions corresponding to the label “1xxx/0xxx”. A 3% misidentification is observed for the label “1xxx” samples.

VI Application to Data

We now use the well-trained NNs to analyse the experimental data. The results are listed in Tab. 2. To reduce the systematic uncertainties arising from the NNs, we trained five and ten NN models, which have an identical structure under different initialization, for each group of samples with different background fractions. The probabilities of the sum of all the other labels are smaller than 1%. The labels with the top three probabilities are “1000”, “1001” and “1002”, with the sum of them larger than $99\%$ . The standard deviation decreases with the increasing number of NN models, as expected. It means that our NNs can well determine the first three labels and favors Solution A, i.e. $J_{P_{c}(4440)}=\frac{1}{2}$ and $J_{P_{c}(4457)}=\frac{3}{2}$ , as the first label is $1$ .

Table 2: Predicted probability (standard deviation) for the 15 states corresponding to the different labels

{\mathcal{L}}

. The label “100X” means the sum of “1000”, “1001”, and “1002”. The probability of the sum of other labels is listed in the last column.

Models	Label
Models	0000	1000	1001	1002	100X	Others
Prediction(%) of NN trained with $\{S^{90}\}$ samples.
NN 1	0.69	89.13	1.42	8.75	99.30	0.01
NN 2	0.03	5.83	38.47	55.30	99.60	0.37
NN 3	0.03	5.39	15.79	78.41	99.59	0.11
NN 4	0.01	1.9	27.01	70.95	99.86	0.13
NN 5	2.40	94.45	0.15	2.99	97.59	0.01
5 NNs average	0.63(1.03)	39.34	16.57	43.28	99.19(0.91)	0.13(0.15)
10 NNs average	0.36(0.74)	21.16	20.69	57.62	99.47(0.68)	0.12(0.13)
Prediction(%) of NN trained with $\{S^{92}\}$ samples.
NN 1	0.00	0.15	5.37	94.47	99.99	0.00
NN 2	0.00	0.07	4.11	95.81	99.99	0.00
NN 3	0.00	0.78	13.57	85.61	99.96	0.03
NN 4	0.00	0.81	19.02	80.16	99.99	0.00
NN 5	0.14	15.13	16.91	67.80	99.84	0.00
5 NNs average	0.03(0.06)	3.39	11.80	84.77	99.95(0.06)	0.01(0.01)
10 NNs average	0.01(0.04)	1.78	9.50	88.70	99.97(0.04)	0.00(0.01)
Prediction(%) of NN trained with $\{S^{94}\}$ samples.
NN 1	0.00	1.20	2.26	96.53	99.99	0.00
NN 2	0.02	2.23	11.51	86.24	99.98	0.00
NN 3	0.00	0.26	10.26	89.47	99.99	0.00
NN 4	0.15	24.05	48.79	26.89	99.73	0.13
NN 5	0.02	4.12	72.61	23.23	99.96	0.02
5 NNs average	0.04(0.06)	6.37	29.09	64.47	99.93(0.11)	0.03(0.06)
10 NNs average	0.02(0.05)	4.25	26.00	69.70	99.95(0.08)	0.02(0.04)
Prediction(%) of NN trained with $\{S^{96}\}$ samples.
NN 1	0.00	1.29	19.79	78.92	100.00	0.00
NN 2	0.00	3.38	25.64	70.97	99.99	0.01
NN 3	0.00	6.57	32.20	61.23	100.00	0.00
NN 4	0.60	55.77	21.65	21.96	99.38	0.00
NN 5	0.00	0.27	3.52	96.20	99.99	0.00
5 NNs average	0.12(0.27)	13.46	20.56	65.86	99.87(0.28)	0.00(0.01)
10 NNs average	0.11(0.21)	13.13	24.97	61.79	99.89(0.22)	0.00(0.01)

That shows one of the advantages of machine learning comparing to the normal fitting approach. However, this conclusion is only based on the pionless EFT. One cannot obtain a solid conclusion about the hadronic molecules without the one-pion-exchange (OPE) potential. Thus it makes little sense to compare our results with that of Refs. [51, 50]. However, the hidden charm pentaquarks in pionless EFT are sufficient to illustrate the difference between machine learning and the normal fitting approach, as well as the potential merit of machine learning. The second label $0$ and the third label $0$ mean that the poles for the $J^{P}=\frac{1}{2}^{-}$ and $J^{P}=\frac{3}{2}^{-}$ channels behave as “bound states”, which is different from the virtual state conclusion for the $P_{c}(4312)$ in Ref. [21] also using machine learning. The pole situation for the $J^{P}=\frac{5}{2}^{-}$ is undetermined, i.e. all of the three labels $0$ , $1$ , $2$ appearing for the fourth label, as the structure around the $\Sigma_{c}^{*}\bar{D}^{*}$ is not significant. To resolve this issue, precise data in this energy region would be needed. In the normal fitting approach, the $J$ =1/2, 3/2 and 5/2 channels are described by the same set of parameters in the heavy quark limit. As the result, once the set of parameters is determined, all the poles of the three channels are determined as well. However, in our NN approach, we extract the types of poles without extracting the intermediate parameters directly from the experimental data. This makes the poles of the $J$ =1/2, 3/2 well determined, but the $J$ =5/2 channel undetermined. Of importance is to understand how the NN learns information from the input mass spectrum $\mathcal{H}$ . The SHapley Additive exPlanation (SHAP) value is widely used as a feature importance metric for well-established models in machine learning. One of the prevailing methods for estimating NN model features is the DeepLIFT algorithm based on DeepExplainer. A positive (negative) SHAP value indicates that a given data point is pushing the NN classification in favor of (against) a given class. A large absolute SHAP value implies a large impact of a given mass bin on the classification. This method requires a certain amount of dataset as a reference for evaluating network features. We choose 3000 { $\mathcal{S}^{90}$ } samples in the dataset as a reference to test the SHAP values of both 1000 { $\mathcal{S}^{90}$ } samples and the experimental data on our network. The upper panel of Fig. 18 shows the SHAP values of experimental data corresponding to 5 labels, i.e. “0000”, “1001”, “0100”, “1000” and “1002”. The lower panel of Fig. 18 shows the SHAP values of MC samples corresponding to the “1200” label, with others presented in the Supplementary materials. Both panels of Fig. 18 indicate that data points around the peaks in the mass spectrum have a greater impact.

A comparison is performed by checking the traditional $\chi^{2}$ /ndf-fitting approach, which minimizes the

\chi^{2}\equiv\sum_{k}^{N}\left(\frac{N(k;\mathcal{P})-N^{d}(k)}{\sigma_{N^{d}}(k)}\right)^{2}

(19)

between the theoretical prediction $N(k;\mathcal{P})$ and experimental data $N^{d}(k)$ by summing over all bins of $\mathcal{H}$ , where $\mathcal{P}$ represents the model parameters to be determined and $k$ is the bin index. We check how the $j$ th bin of $\mathcal{H}$ impacts on the parameters $\mathcal{P}$ and the pole positions.

For this purpose, we define the uncertainty $\Delta U_{i}\equiv|\frac{\mathcal{P}_{i}(\mathrm{on})-\mathcal{P}_{i}(\mathrm{off})}{\mathcal{P}_{i}(\mathrm{on})}|$ for the $i$ th parameter of $\mathcal{P}$ , which is determined with the central values obtained by switching on/off the $j$ th bin of $\mathcal{H}$ in the fitting. The bigger uncertainty observed by excluding a bin means a larger impact power of the bin on a parameter. Fig. 6 illustrates the $\Delta U_{i}$ distributions for two of the eleven parameters. The distributions for other parameters can be found in the Supplementary materials. The bins near the peaks in the mass spectrum do not show stronger constraints on the uncertainties of the parameters $C_{1/2}$ and $\mathcal{F}^{1/2}_{1}$ than the others. That is because that the model parameters are correlated to each other and do not reflect the underlying physics directly. In addition, to check the impact of the $j$ th experimental point on the pole positions of the $J^{P}$ channel, the uncertainty

\Delta U_{i}^{J^{P}}\equiv\left|\frac{\mathrm{Re}[\mathrm{Pole}_{i}](\mathrm{on})-\mathrm{Re}[\mathrm{Pole}_{i}](\mathrm{off})}{\mathrm{Re}[\mathrm{Pole}_{i}](\mathrm{on})}\right|

(20)

for the real part of the $i$ th pole position is defined by switching on/off the $j$ th bin of $\mathcal{H}$ in the fitting. A similar behavior can be found for the imaginary part of the poles, see Supplementary materials. Fig. 7 illustrates the $\Delta U_{i}^{J^{P}}$ distributions for the $J^{P}=\frac{1}{2}^{-},\frac{3}{2}^{-}$ and $\frac{5}{2}^{-}$ channels, respectively, without considering the correlation among the parameters. Although those values are of the order $10^{-5}$ , one can still see that the experimental data around the $\Sigma_{c}\bar{D}$ , $\Sigma_{c}\bar{D}^{*}$ , and $\Sigma_{c}^{*}\bar{D}$ thresholds are more important than the others. The reason why the data around the $\Sigma_{c}^{*}\bar{D}^{*}$ threshold is not important is that the small production amplitude of the $\Sigma_{c}^{*}\bar{D}^{*}$ channel and the experimental data have little constraint about the corresponding poles [50, 51]. One can also obtain the same conclusion from Fig. 7g, where the experimental data around the $J^{P}=\frac{5}{2}^{-}$ dynamic channel $\Sigma_{c}^{*}\bar{D}^{*}$ do not show any significance. Due to the coupled-channel effect, the experimental data around the coupled channels still have strong constraints on the physics, e.g. the seven poles, around the other coupled channels. Taking the first pole of the $J^{P}=\frac{1}{2}^{-}$ channel as an example, e.g. Fig. 7a, the data around the $\Sigma_{c}\bar{D}^{*}$ threshold are as significant as those around the $\Sigma_{c}\bar{D}$ threshold. As the sample data of Fig. 7 corresponds to Solution A, the data around $P_{c}(4440)$ ( $P_{c}(4457)$ ) are more important than those around the $P_{c}(4457)$ ( $P_{c}(4440)$ ) in Fig. 7b (Fig. 7e) as expected.

VII Conclusion

We have investigated the nature of the famous hidden charm pentaquarks with a NN-based approach in a pionless EFT, which strongly favors $J_{P_{c}(4440)}=\frac{1}{2}$ and $J_{P_{c}(4457)}=\frac{3}{2}$ , i.e. solution A in Refs [49, 50, 51, 58, 59]. From this NN, we find that solution A is systematically preferred over solution B. This conclusion is based on the pionless EFT, which is used to illustrate the difference between the machine learning and the normal fitting approach. Furthermore, we also performed checks on both the NN-based approach and the $\chi^{2}$ /ndf-fitting approach. Our conclusion is that both approaches work well on the MC simulation samples. In the NN-based approach, the role of each data bin on the underlying physics is well reflected by the SHAP value. For the $\chi^{2}$ /ndf-fitting approach, such a direct relation does not exist. This further explains why the two solutions can be better distinguished in the NN-based approach than in the $\chi^{2}$ /ndf-fitting approach. At the same time, the effect of the background fraction on the accuracy of the network is accurately obtained, which provides a paradigm for NNs to study exotic hadrons by first extracting the background fraction from the data, and then analyzing the physics of the possible states from the data with the network trained from the corresponding background fraction data. This study provides more insights about how the NN-based approach predicts the nature of exotic states from the mass spectrum.

Acknowledgements: We are grateful to Meng-Lin Du, Feng Kun Guo and Christoph Hanhart for the helpful discussion. This work is partly supported by the National Natural Science Foundation of China with Grant No. 12035007, Guangdong Provincial funding with Grant No. 2019QN01X172, Guangdong Major Project of Basic and Applied Basic Research No. 2020B0301030008. Q.W. and U.G.M. are also supported by the NSFC and the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) through the funds provided to the Sino-German Collaborative Research Center TRR110 “Symmetries and the Emergence of Structure in QCD” (NSFC Grant No. 12070131001, DFG Project-ID 196253076-TRR 110). The work of U.G.M. is further supported by the Chinese Academy of Sciences (CAS) President’s International Fellowship Initiative (PIFI) (Grant No. 2018DM0034) and Volkswagen Stiftung (Grant No. 93562).

Author contributions: Zhenyu Zhang and Jiahao Liu did the calculations. Jifeng Hu, Qian Wang and Ulf-G. Meißner focused on how to use machine learning to solve the problems in hadron physics. All the authors made substantial contributions to the physical and technical discussions as well as the editing of the manuscript. All the authors have read and approved the final version of the manuscript.

References

[1] LHCb Collaboration, Aaij R, et al. Observation of $J/\psi p$ resonances consistent with pentaquark states in $\Lambda_{b}^{0}\to J/\psi K^{-}p$ decays. Phys Rev Lett 2015;115:072001.
[2] LHCb Collaboration, Aaij R, et al. Model-independent evidence for $J/\psi p$ contributions to $\Lambda_{b}^{0}\to J/\psi pK^{-}$ decays. Phys Rev Lett 2016;117:082002.
[3] LHCb Collaboration, Aaij R, et al. Observation of a narrow pentaquark state, $P_{c}(4312)^{+}$ , and of two-peak structure of the $P_{c}(4450)^{+}$ . Phys Rev Lett 2019;122:222001.
[4] LHCb Collaboration, Aaij R, et al. Observation of structure in the $J/\psi$ -pair mass spectrum. Sci Bull 2020;65:1983–1993.
[5] CMS Collaboration. Observation of new structures in the $\mathrm{J}/\psi\mathrm{J}/\psi$ mass spectrum in $\mathrm{p}\mathrm{p}$ collisions at $\sqrt{s}=13$ TeV. 2022, CMS-PAS-BPH-21-003.
[6] ATLAS Collaboration. Observation of an excess of di-charmonium events in the four-muon final state with the ATLAS detector. 2022, ATLAS-CONF-2022-040.
[7] LHCb Collaboration, Aaij R, et al. Observation of an exotic narrow doubly charmed tetraquark. Nat Phys 2022;18:751–754.
[8] LHCb Collaboration, Aaij R, et al. Study of the doubly charmed tetraquark $T_{cc}^{+}$ . Nat Commun 2022;13:3351.
[9] Chen H-X, Chen W, Liu X, et al. A review of the open charm and open bottom systems. Rept Prog Phys 2017;80:076201.
[10] Chen H-X, Chen W, Liu X, et al. The hidden-charm pentaquark and tetraquark states. Phys Rept 2016;639:1–121.
[11] Lebed RF, Mitchell RE, Swanson ES. Heavy-quark QCD exotica. Prog Part Nucl Phys 2017;93:143–194.
[12] Guo F-K, Christoph H, Mei $\ss{}$ ner Ulf-G, et al. Hadronic molecules. Rev Mod Phys 2018;90:015004.
[13] Dong Y, Faessler A, Lyubovitskij VE. Description of heavy exotic resonances as molecular states using phenomenological lagrangians. Prog Part Nucl Phys 2017;94:282–310.
[14] Albuquerque RM, Dias JM, Khemchandani KP, et al. QCD sum rules approach to the $X,~{}Y$ and $Z$ states. J Phys G 2019;46:093002.
[15] Brambilla N, Eidelman S, Hanhart C, et al. The $XYZ$ states: experimental and theoretical status and perspectives. Phys Rept 2020;873:1–154.
[16] Liu Y-R, Chen H-X, Chen W, et al. Pentaquark and tetraquark states. Prog Part Nucl Phys 2019;107:237–320.
[17] Guo F-K, Liu X-H, Sakai S. Threshold cusps and triangle singularities in hadronic reactions. Prog Part Nucl Phys 2020;112:103757.
[18] JPAC Collaboration, Albaladejo M, et al. Novel approaches in hadron spectroscopy. Prog Part Nucl Phys 2022;127:103981.
[19] Zou B-S. Building up the spectrum of pentaquark states as hadronic molecules. Sci Bull 2021;66:1258.
[20] Mai M, Mei $\ss{}$ ner U-G, Urbach C. Towards a theory of hadron resonances. Phys Rept 2023;1001:1–66.
[21] Ng L, Bibrzycki L, Nys J, et al. Deep learning exotic hadrons. Phys Rev D 2022;105:L091501.
[22] Ireland D G, Janssen S, Ryckebusch J. A genetic algorithm analysis of N^∗ resonances in $p(\gamma,K^{+})\Lambda$ reactions. Nucl Phys A 2004;740:147–167.
[23] Fernández-Ramírez C, de Guerra EM, Udías A, et al. Properties of nucleon resonances by means of a genetic algorithm. Phys Rev C 2008;77:065212.
[24] The NNPDF collaboration, Ball RD, et al. Parton distributions for the LHC Run II. J High Energy Phys 2015;04:040.
[25] Forte S, Garrido L, Latorre JI, et al. Neural network parametrization of deep inelastic structure functions. J High Energy Phys 2002;05:062.
[26] Rojo J, Latorre JI. Neural network parametrization of spectral functions from hadronic tau decays and determination of QCD vacuum condensates. J High Energy Phys 2004;01:055.
[27] Keeble JWT, Rios A. Machine learning the deuteron. Phys Lett B 2020;809:135743.
[28] Adams C, Carleo G, Lovato A, et al. Variational Monte Carlo calculations of A $\leq$ 4 nuclei with an artificial neural-network correlator ansatz. Phys Rev Lett 2021;127:022502.
[29] Niu Z-M, Liang H-Z. Nuclear mass predictions based on bayesian neural network approach with pairing and shell effects. Phys Lett B 2018;778:48–53.
[30] Niu Z-M, Liang H-Z, Sun B-H, et al. Predictions of nuclear $\beta$ -decay half-lives with machine learning and their impact on $r$ -process nucleosynthesis. Phys Rev C 2019;99:064307.
[31] Ma C-W, Peng D, Wei H-L, et al. Isotopic cross-sections in proton induced spallation reactions based on the bayesian neural network method. Chin Phys C 2020;44:014104.
[32] Kaspschak B, Mei $\ss{}$ ner U-G. How machine learning conquers the unitary limit. Commun Theor Phys 2021;73:035101.
[33] Kaspschak B, Mei $\ss{}$ ner U-G. Three-body renormalization group limit cycles based on unsupervised feature learning. Mach Learn Sci Tech 2022;3:025003.
[34] Bedaque P, Boehnlein A, Cromaz M, et al. A.I. for nuclear physics. Eur Phys J A 2021;57:100.
[35] Dong X-X, An R, Lu J-X, et al. Nuclear charge radii in bayesian neural networks revisited. arXiv:2206.13169, 2022.
[36] Baldi P, Bauer K, Eng C, et al. Jet substructure classification in high-energy physics with deep neural networks. Phys Rev D 2016;93:094034.
[37] Baldi P, Sadowski P, Whiteson D. Searching for exotic particles in high-energy physics with deep learning. Nat Commun 2014;5:4308.
[38] Boehnlein A, Diefenthaler M, Sato N, et al. Colloquium: machine learning in nuclear physics. Rev Mod Phys 2022;94:031003.
[39] Guest D, Cranmer K, Whiteson D. Deep learning and its application to LHC physics. Annu Rev Nucl Part Sci 2018;68:161–181.
[40] Whiteson S, Whiteson D. Machine learning for event selection in high energy physics. Eng Appl Artif Intell 2009;22:1203–1217.
[41] Carleo G, Troyer M, Solving the quantum many-body problem with artificial neural networks. Science 2017;355:602–606.
[42] Huang H-Y, Kueng R, Torlai G, et al. Provably efficient machine learning for quantum many-body problems. Science 2022;377:eabk3333.
[43] Lu Y, Wang Y-J, Chen Y, et al. Rediscovery of numerical lüscher’s formula from the neural network. arXiv:2210.02184, 2022.
[44] Zhang Z, Ma R, Hu J, et al. Approach the gell-mann-okubo formula with machine learning. Chin Phys Lett 2022;39:111201.
[45] Sombillo DLB, Ikeda Y, Sato T, et al. Model independent analysis of coupled-channel scattering: a deep learning approach. Phys Rev D 2021;104:036001.
[46] Sombillo DLB, Ikeda Y, Sato T, et al. Unveiling the pole structure of S-matrix using deep learning. Rev Mex Fis Suppl 2022;3:0308067.
[47] Liu J, Zhang Z, Hu J, et al. Study of exotic hadrons with machine learning. Phys Rev D 2022;105:076013.
[48] Sombillo DLB, Ikeda Y, Sato T, et al. Classifying the pole of an amplitude using a deep neural network. Phys Rev D 2020;102:016024.
[49] Liu M-Z, Pan Y-W, Peng F-Z, et al. Emergence of a complete heavy-quark spin symmetry multiplet: seven molecular pentaquarks in light of the latest LHCb analysis. Phys Rev Lett 2019;122:242001.
[50] Du M-L, Vadim B, Guo F-K, et al. Interpretation of the LHCb $P_{c}$ states as hadronic molecules and hints of a narrow $P_{c}(4380)$ . Phys Rev Lett 2020;124:072001.
[51] Du M-L, Vadim B, Guo F-K, et al. Revisiting the nature of the $P_{c}$ pentaquarks. J High Energy Phys 2021;08:157.
[52] Voloshin MB. Radiative transitions from $\Upsilon(5S)$ to molecular bottomonium. Phys Rev D 2011;84:031502.
[53] Mei $\ss{}$ ner U-G, Rusetsky A. Effective field theories. Cambridge: Cambridge University Press 2022.
[54] Brun R, Rademakers F, ROOT: an object oriented data analysis framework. Nucl Instrum Meth A 1997;389:81–86.
[55] Galassi M, Theiler J, Davies J, et al. GNU scientific library reference manual (3rd Ed.). ISBN 0954612078.
[56] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. arXiv:1512.03385, 2015.
[57] Kingma DP, Ba J. Adam: a method for stochastic optimization. arXiv:1412.6980, 2014.
[58] Wang B, Meng L, Zhu S-L. Hidden-charm and hidden-bottom molecular pentaquarks in chiral effective field theory. J High Energy Phys 2019;11:108.
[59] Yang Z-C, Sun Z-F, He J, et al. The possible hidden-charm molecular baryons composed of anti-charmed meson and charmed baryon. Chin Phys C 2012;36:6–13.
[60] Murtagh F. Multilayer perceptrons for classification and regression. Neurocomputing 1991;2:183.
[61] Paszke A, Gross S, Massa F, et al. Pytorch: an imperative style, high-performance deep learning library. arXiv:1912.01703, 2019.
[62] Clavijo JM, Glaysher P, Jitsev J, et al. Adversarial domain adaptation to reduce sample bias of a high energy physics event classifier. Mach learn: sci technol 2021;3:015014.

Supplemental Material for “Revealing the nature of hidden charm pentaquarks with machine learning”

Appendix A Fitting to the Background Distributions

The background, see Fig. 8, is described by sixth-order Chebyshev polynomials (22), with the coefficients listed in Tab. 3, by fitting the experimental background. The template recursive functions (21) for defining the Chebyshev polynomials is,

	$\displaystyle T_{0}(x)=1,$
	$\displaystyle T_{1}(x)=x,$
	$\displaystyle T_{2}(x)=2x^{2}-1,$		(21)
	$\displaystyle...$
	$\displaystyle T_{N}(x)=2xT_{N-1}(x)-T_{N-2}(x).$

This allows for the implementation of the Chebyshev polynomials as

	$\displaystyle\mathrm{Chebyshev}_{0}(x)=c_{0},$
	$\displaystyle\mathrm{Chebyshev}_{1}(x)=c_{0}+c_{1}x,$
	$\displaystyle\mathrm{Chebyshev}_{2}(x)=c_{2}T_{2}(x)+\mathrm{Chebyshev}_{1}(x),$		(22)
	$\displaystyle...$
	$\displaystyle\mathrm{Chebyshev}_{6}(x)=c_{6}T_{6}(x)+\mathrm{Chebyshev}_{5}(x),$

where the $c_{0},c_{1},...$ are coefficients, as given in Tab. 3 for the problem at hand.

Table 3: The coefficients of the sixth-order Chebyshev polynomials.

Coefficients	Value	Error
$c_{0}$	67.96	$\pm$ 0.18
$c_{1}$	4.13	$\pm$ 0.18
$c_{2}$	$-$ 16.60	$\pm$ 0.15
$c_{3}$	$-$ 2.55	$\pm$ 0.15
$c_{4}$	3.27	$\pm$ 0.14
$c_{5}$	2.88	$\pm$ 0.14
$c_{6}$	$-$ 1.66	$\pm$ 0.14

Appendix B Determination of the Background Fraction in Data

We produce a group of samples for different background fractions from 0% to 90% every 10%, as well as (92%, 94%, 96%). We trained a ResNet-based NN which employs the MSELoss function to measure the Euclidean distance between the predicted background fractions and the truth values. The NN successfully retrieves the background fraction, as illustrated in Fig. 9 (left). The right plot shows the difference between the predicated values and the background values. The bias and uncertainty values are -0.0583 and 0.71, respectively. We then use this NN to determine the background fraction for experimental data. The background fraction for data is determined to be $(96.02\pm 0.76$ )% in case of training the NN with samples $\{\mathcal{S}^{50}...\mathcal{S}^{90},\mathcal{S}^{92},\mathcal{S}^{94},\mathcal{S}^{96}\}$ , and is determined to be $(95.72\pm 0.72)$ % in case of training the NN with samples $\{\mathcal{S}^{0}...\mathcal{S}^{90},\mathcal{S}^{92},\mathcal{S}^{94},\mathcal{S}^{96}\}$ . The background fraction is determined to be $(95.79\pm 0.76)$ % predicted with a different training. In short, the background fraction is taken as their average value $(96.0\pm 0.8)\%$ .

Appendix C The impact of the energy bins on the model parameters

We define the uncertainty

\Delta U_{i}\equiv\left|\frac{\mathcal{P}_{i}(\mathrm{on})-\mathcal{P}_{i}(\mathrm{off})}{\mathcal{P}_{i}(\mathrm{on})}\right|

(23)

for the $i$ th parameter of $\mathcal{P}$ , which is determined with the central values obtained by switching on/off the $j$ th bin of $\mathcal{H}$ in the fitting. We use $\Delta U_{i}$ to measure the influence of each data point on a parameter. Figure 10 illustrates $\Delta U_{i}$ distributions w.r.t eleven parameters. The larger $\Delta U_{i}$ points correspond to data points deviating strongly from the interpolation of its neighbour data points.

Appendix D The impact of the energy bins on the imaginary part of the pole positions.

We define the uncertainty

\mathrm{Im}(\Delta U_{i}^{J^{P}})\equiv\left|\frac{\mathrm{Im}[\mathrm{Pole}_{i}](\mathrm{on})-\mathrm{Im}[\mathrm{Pole}_{i}](\mathrm{off})}{\mathrm{Im}[\mathrm{Pole}_{i}](\mathrm{on})}\right|

(24)

for the imaginary part of the $i$ th pole position by switching on/off the $j$ th bin of $\mathcal{H}$ in the fitting. Figs. 11, 12, 13 illustrates the $\mathrm{Im}(\Delta U_{i}^{J^{P}})$ distributions for the $J^{P}=\frac{1}{2}^{-},\frac{3}{2}^{-},\frac{5}{2}^{-}$ channels, respectively, without considering the correlation among the parameters.

Appendix E Detailed Predictions

To reduce the systematic uncertainties arising from the NNs, we trained several NN models with identical structure but different initializations. In particular, for the background fractions $\{S^{90}\}$ , $\{S^{92}\}$ , $\{S^{94}\}$ , and $\{S^{96}\}$ , we train ten NNs, respectively. Their predicted results are presented in Tab. 4 with maximum probabilities in boldface. As shown in the table, the first labels of all the maximum probabilities are “1”, which indicates that all the NNs favor solution A, i.e. $J_{P_{c}(4440)}=\frac{1}{2}$ and $J_{P_{c}(4457)}=\frac{3}{2}$ . As these 40 NNs can distinguish solution A from solution B.

Table 4: Predicted probability of 15 state labels. The bold numbers in each line are the maximum probability.

Prediction of NN trained with samples at a background level 90%.
	Label
	0000	1000	1001	1002	0100	0110	0111	1111	0200	1200	1210	1211	1220	1221	1222
NN 1 (77.29%)	0.69	89.13	1.42	8.75	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.01	0.00
NN 2 (74.64%)	0.03	5.83	38.47	55.30	0.00	0.03	0.08	0.00	0.07	0.04	0.02	0.00	0.00	0.09	0.04
NN 3 (74.97%)	0.03	5.39	15.79	78.41	0.00	0.02	0.03	0.00	0.00	0.00	0.00	0.00	0.00	0.03	0.03
NN 4 (75.94%)	0.01	1.90	27.01	70.95	0.00	0.01	0.01	0.00	0.00	0.01	0.00	0.00	0.01	0.03	0.06
NN 5 (74.07%)	2.40	94.45	0.15	2.99	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.01	0.00
NN 6 (77.56%)	0.05	2.76	31.64	65.36	0.00	0.01	0.03	0.00	0.00	0.02	0.00	0.00	0.01	0.02	0.08
NN 7 (76.79%)	0.11	4.16	35.17	60.23	0.00	0.04	0.06	0.00	0.01	0.04	0.00	0.00	0.00	0.04	0.13
NN 8 (76.78%)	0.18	4.23	16.10	79.44	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.02	0.02
NN 9 (75.22%)	0.11	2.54	18.26	79.02	0.00	0.00	0.02	0.00	0.00	0.00	0.00	0.00	0.00	0.03	0.01
NN 10 (75.87%)	0.00	1.24	22.85	75.77	0.00	0.00	0.02	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.09
Prediction of NN trained with samples at a background level 92%.
NN 1 (72.94%)	0.00	0.15	5.37	94.47	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
NN 2 (72.68%)	0.00	0.07	4.11	95.81	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
NN 3 (74.58%)	0.00	0.78	13.57	85.61	0.00	0.00	0.01	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.02
NN 4 (71.78%)	0.00	0.81	19.02	80.16	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
NN 5 (72.75%)	0.14	15.13	16.91	67.80	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
NN 6 (71.72%)	0.00	0.18	9.32	90.48	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
NN 7 (72.75%)	0.00	0.32	18.44	81.24	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
NN 8 (74.57%)	0.00	0.27	6.39	93.34	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
NN 9 (73.24%)	0.00	0.00	1.05	98.95	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
NN 10 (74.34%)	0.00	0.02	0.85	99.12	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
Prediction of NN trained with samples at a background level 94%.
NN 1 (65.28%)	0.00	1.20	2.26	96.53	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
NN 2 (64.11%)	0.02	2.23	11.51	86.24	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
NN 3 (67.83%)	0.00	0.26	10.26	89.47	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
NN 4 (65.53%)	0.15	24.05	48.79	26.89	0.00	0.04	0.02	0.00	0.01	0.03	0.00	0.00	0.00	0.02	0.01
NN 5 (64.17%)	0.02	4.12	72.61	23.23	0.00	0.00	0.01	0.00	0.00	0.00	0.00	0.00	0.00	0.01	0.00
NN 6 (67.44%)	0.00	1.22	13.92	84.86	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
NN 7 (61.85%)	0.00	5.07	15.90	79.01	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
NN 8 (65.73%)	0.03	1.31	58.42	40.21	0.00	0.00	0.02	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
NN 9 (65.34%)	0.00	2.94	21.80	75.25	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.01
NN 10 (65.74%)	0.00	0.08	4.56	95.34	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
Prediction of NN trained with samples at a background level 96%.
NN 1 (55.64%)	0.00	1.29	19.79	78.92	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
NN 2 (54.75%)	0.00	3.38	25.64	70.97	0.01	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
NN 3 (53.44%)	0.00	6.57	32.20	61.23	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
NN 4 (51.20%)	0.60	55.77	21.65	21.96	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
NN 5 (56.41%)	0.00	0.27	3.52	96.20	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
NN 6 (50.72%)	0.39	34.89	31.27	33.42	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
NN 7 (54.34%)	0.00	5.16	21.02	73.82	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
NN 8 (53.04%)	0.02	5.04	33.86	61.08	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
NN 9 (58.18%)	0.00	0.11	31.92	67.97	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
NN 10 (55.79%)	0.05	18.82	28.80	52.32	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00

Appendix F Neural Network

We have compared a MLP-based [60] NN to a ResNet-based [56] NN, which are implemented using PyTorch [61]. Both NN models are trained with the $\{\mathcal{S}^{90}\}$ samples. The Adam [57] optimizer is used to solve these models with a combination of the momentum algorithm with the RMSProp algorithm [62]. For the activation function, the Relu function compared to the Sigmod function, can effectively avoid the problem of gradient disappearance and alleviate potential overfitting in particular when training a deep neutral network, while the Sigmod function is exponential and computationally expensive. A reasonable solution can be obtained after 500 training epochs using an initial learning rate value of 0.001. A large learning rate makes convergence of the loss function difficult, while a small learning rate makes too slow convergence. Our strategy is to dynamically reduce it. The initial learning rate of 0.001 is successfully tested, and the next interval is reduced to the $\frac{1}{2}$ of the previous one, controlled by the stepLR method. The interval is set to 100 epochs in our case. To avoid the network falling into the local minimum solution during the training process, the weights of the neurons are randomly initialized with a normal distribution, and the biases of neurons are set to zero.

To avoid overfitting in the training, we introduce dropout layers in the network. A large dropout probability tends to activate too few neurons and the loss function will not converge; a small dropout probability tends to reduce generalizability and even occur over-fitting. After training, the dropout probability is chosen to 0.3 for ResNet-based NN and 0.2 for MLP-based NN to optimize performance. In case of overfiting, the obtained NN works well on training datasets while probably does not on testing datasets. Thus the loss function calculated with training datasets should obviously differs from the one calculated with testing datasets. However, one cannot see difference in Fig. 16, which indicates the NN works well both for training and testing datasets. In summary, the CrossEntropyLoss function converges as the training epochs increase, and the predicted accuracy increases as training epochs increase. The ResNet-based NN achieves a better solution, as shown in Fig. 14. A larger number of neurons and hidden layers require more computing resources and result in difficult training and over-fitting; while a smaller number of neurons and hidden layers result in non-convergence of the loss function and under-fitting. We found an architecture ResNet(1000-1000-2000-2000-1000-1000_10) which works well after several attempts without optimizing its architecture. The ResNet-based NN is structured as shown in Fig. 15.

Appendix G What is the reason for the saturation of NN’s accuracy?

The low accuracy is because of two reasons,

(1) Background fraction. Higher background fractions imply lower prediction accuracy within our expectation. Clearly at an extreme case 100% background fraction will result in a zero accuracy. To make this point clear, we study how the prediction accuracy varies with background fractions, with lots of samples ( $\mathcal{S}^{0}$ , $\mathcal{S}^{10}$ , $\mathcal{S}^{20}$ , $\mathcal{S}^{30}$ , $\mathcal{S}^{40}$ , $\mathcal{S}^{50}$ , $\mathcal{S}^{60}$ , $\mathcal{S}^{70}$ , $\mathcal{S}^{80}$ , $\mathcal{S}^{90}$ , $\mathcal{S}^{92}$ , $\mathcal{S}^{94}$ , $\mathcal{S}^{96}$ ), as summarized in the Tab. 5 (1st and 2nd column), and Fig. 17 (left).

(2) The forth label, since the peak corresponding to the forth label is submerged in background, the NN fails to predict this label. Thus when we count correct predictions including this label “OOOO” (2nd column), the accuracy drops down. In turn, when we count correct predictions excluding this label “OOOX” (3rd column), the accuracy goes up. Please see also Fig. 17. We also show the more results (accuracy of “OXXX”, “XOXX”, “XXOX”, “XXXO”) in Fig. 17 (right), and find that the accuracy of “XXXO” is the lowest.

Table 5: The value of accuracy with background fraction. Where “O” and “X” same as Fig. 17. “OOOO” means testing the accuracy of all labels, “OOOX” means only testing the accuracy of first three labels.

Background fraction	Accuracy of “OOOO”(%)	Accuracy of “OOOX”(%)
$\mathcal{S}^{0}$	95.70	99.14
$\mathcal{S}^{10}$	95.03	98.80
$\mathcal{S}^{20}$	94.73	98.86
$\mathcal{S}^{30}$	94.56	99.10
$\mathcal{S}^{40}$	93.52	98.74
$\mathcal{S}^{50}$	92.49	98.60
$\mathcal{S}^{60}$	91.70	98.70
$\mathcal{S}^{70}$	89.52	97.91
$\mathcal{S}^{80}$	85.99	96.45
$\mathcal{S}^{90}$	74.70	91.93
$\mathcal{S}^{92}$	73.11	90.90
$\mathcal{S}^{94}$	65.30	86.62
$\mathcal{S}^{96}$	56.02	77.30

Appendix H SHAP Value

The SHapley Additive exPlanation (SHAP) value is widely used as a feature importance metric for well-established models in machine learning. One of prevailing methods for estimating neural network model features is the DeepLIFT algorithm based DeepExplainer. A positive (negative) SHAP value indicates that a given data point is pushing the NN classification in favor of (against) a given class. A large absolute SHAP value implies a large impact of a given mass bin on the classification. The method requires a dataset as a reference (or denoted as background) for evaluating network features, and we test the SHAP values of 1000 { $\mathcal{S}^{90}$ } samples and the SHAP values of experimental data with our network based on 3000 { $\mathcal{S}^{90}$ } samples as a reference. Fig.18 shows the SHAP values of experimental data corresponding to 15 labels, indicating that data points around the peaks in the mass spectrum have a greater impact. Fig. 19 shows the SHAP values of MC samples corresponding to 15 labels.