Use of Excess Power Method and Convolutional Neural Network in All-Sky Search for Continuous Gravitational Waves

Takahiro S. Yamamoto Department of Physics, Kyoto University, Kyoto 606-8502, Japan Takahiro Tanaka Department of Physics, Kyoto University, Kyoto 606-8502, Japan Center for Gravitational Physics, Yukawa Institute for Theoretical Physics, Kyoto University, Kyoto 606-8502, Japan

Abstract

The signal of continuous gravitational waves has a longer duration than the observation period. Even if the waveform in the source frame is monochromatic, we will observe the waveform with modulated frequencies due to the motion of the detector. If the source location is unknown, a lot of templates having different sky positions are required to demodulate the frequency, and the required huge computational cost restricts the applicable parameter region of coherent search. In this work, we propose and examine a new method to select candidates, which reduces the cost of coherent search by following-up only the selected candidates. As a first step, we consider an idealized situation in which only a single-detector having 100% duty cycle is available and its detector noise is approximated by the stationary Gaussian noise. Also, we assume the signal has no spindown and the polarization angle, the inclination angle, and the initial phase are fixed to be $\psi=0$ , $\cos\iota=1$ , and $\phi_{0}=0$ , and they are treated as known parameters. We combine several methods: 1) the short-time Fourier transform with the re-sampled data such that the Earth motion for the source is canceled in some reference direction, 2) the excess power search in the Fourier transform of the time series obtained by picking up the amplitude in a particular frequency bin from the short-time Fourier transform data, and 3) the deep learning method to further constrain the source sky position. The computational cost and the detection probability are estimated. The injection test is carried out to check the validity of the detection probability. We find that our method is worthy of further study for analyzing $O(10^{7})$ sec strain data.

I Introduction

Advanced LIGO and Advanced Virgo detected the first event of gravitational waves from a binary black hole merger in 2015 Abbott et al. (2016). After the three observation runs, a lot of binary coalescence events are found Abbott et al. (2019a, 2020a). In addition to Advanced LIGO and Advanced Virgo, KAGRA Aso et al. (2013) and LIGO India Bala et al. (2011) are planning to join the gravitational wave detector network Abbott et al. (2020b). The gravitational wave astronomy is expected to get fruitful results for improving our understanding of the astronomical properties of compact objects Abbott et al. (2019b, 2020c, 2020d), the true nature of gravity Will (2014); Abbott et al. (2019c, 2020e), the origin of the Universe Maggiore (2000) and so on (see Sathyaprakash and Schutz (2009) for a review).

All gravitational wave signals which are detected so far have duration $O(10^{0-2})$ sec, which is much shorter than the observation period. By contrast, we also expect gravitational waves which last longer than the observation period. Such long-lived gravitational waves are called continuous gravitational waves (see Maggiore (2007); Creighton and Anderson (2011) as textbooks). Continuous gravitational waves are defined by the following three properties: 1) small change rate of the amplitude, 2) almost constant fundamental frequency, and 3) duration longer than the observation period. Rotating anisotropic neutron stars are typical candidate sources of continuous gravitational waves. In addition, there are several exotic objects proposed as possible candidates of the sources of continuous gravitational waves (Zhu et al. (2020); Arvanitaki et al. (2015); Brito et al. (2017)).

Continuous gravitational waves are modeled by simpler waveforms than those of coalescing binaries. The parameters characterizing a typical waveform are the amplitude, the initial frequency, and the frequency derivatives with time. Although the waveform generated by the source is analytically simple, the effect of the detector’s motion makes the data analysis for continuous gravitational waves challenging. The detector’s motion causes the modulation in the frequency, and resulting in the dispersion of the power in the frequency domain. If the source location is a priori known by electromagnetic observations, the modulation can be removed precisely enough. By contrast, for the unknown target search, we need to correlate the data with a tremendous amount of templates to cover the unknown source location on the sky. This severely restricts the applicability of the all-sky coherent search to strain data of long durations. Therefore, semi-coherent methods, in which the strain data is divided into a set of segments and statistics calculated for respective segments are summed up appropriately, are often used. Various semi-coherent methods (e.g. Time-Domain $\mathcal{F}$ -statistic Jaranowski et al. (1998), SkyHough Krishnan et al. (2004), FrequencyHough Astone et al. (2014)) were proposed so far, and they are actually used to analyze LIGO and Virgo’s data. Despite tremendous efforts, up to now no continuous gravitational wave event is detected Abbott et al. (2017, 2018, 2019d).

As another trend of the research, the deep learning method is introduced to the field of gravitational wave data analysis. After the pioneering work done by George & Huerta George and Huerta (2018), there are many proposals to use deep learning for wide purposes, e.g., parameter estimation for binary coalescence Gabbard et al. (2019); Chua and Vallisneri (2020); Nakano et al. (2019); Yamamoto and Tanaka (2020), noise classification George et al. (2018) and waveform modeling Chua et al. (2020). As for applications to the search for continuous gravitational waves, several groups already proposed deep learning methods. Dreissigacker et al. Dreissigacker et al. (2019); Dreissigacker and Prix (2020) applied neural networks to all-sky searches of signals with the duration $10^{5}$ sec and $10^{6}$ sec. They used Fourier transformed strain as inputs. Their methods can be applied to the signal located in broad frequency bands and to the case of multiple detectors and realistic noise. Also, it is shown that the synergies between the deep learning and standard methods or other machine learning techniques are also powerful Morawski et al. (2019); Beheshtipour and Papa (2020).

In this paper, we propose a new method designed for detecting monochromatic waves, combining several transformations and the deep learning method. In Sec. II, the waveform model and some assumptions are introduced. The coherent matched filtering and the time resampling technique are briefly reviewed in Sec. III. In Sec. IV, we explain our strategy that combines several traditional methods such as the resampling, the short-time Fourier transform, and the excess power search with the deep learning method. We show the results of the assessment of the performance of our new method in Sec. V. Sec. VI is devoted to the conclusion.

II Waveform model

We consider a monochromatic gravitational wave. We denote by $f_{\mathrm{gw}}$ its frequency constant in time. With the assumption that the source is at rest with respect to the solar system barycenter (SSB), a complex-valued gravitational waveform in the source frame ${h}^{\mathrm{source}}(\tau)$ will be simply written as

{h}^{\mathrm{source}}(\tau)=h_{0}e^{2\pi if_{\mathrm{gw}}\tau+i\phi_{0}}\,,

(1)

where $\tau$ is called SSB time, $h_{0}$ and $\phi_{0}$ are the amplitude and the initial phase, respectively. In this work, for simplicity, we assume that

\phi_{0}=0\,,

(2)

and regard it as a known parameter. The phase of a gravitational wave is modulated due to the detector motion and the modulation depends on the source location. The normal vector pointing from the Earth’s center to the sky position specified by a right ascension $\alpha$ and a declination angle $\delta$ is defined by

\bm{{n}}(\alpha,\delta)=\left(\begin{matrix}1&0&0\\ 0&\cos\epsilon&\sin\epsilon\\ 0&-\sin\epsilon&\cos\epsilon\end{matrix}\right)\left(\begin{matrix}\cos\alpha\cos\delta\\ \sin\alpha\cos\delta\\ \sin\delta\end{matrix}\right)\,,

(3)

with the tilt angle between the Earth’s rotation axis and the orbital angular momentum $\epsilon$ . Here, we work in the SSB frame, in which the $z$ -axis is along the Earth’s orbital angular momentum and the $x$ -axis points towards the vernal equinox. Defining the detector time $t$ so as to satisfy

\tau=t+\frac{\bm{{r}}(t)\cdot\bm{{n}}({\alpha}_{\mathrm{s}},{\delta}_{\mathrm{s}})}{c}\,,

(4)

we obtain the waveform in the detector frame

h(t):=h_{0}e^{i\Phi(t)}\,,

(5)

with

\Phi(t)=2\pi f_{\mathrm{gw}}t+2\pi f_{\mathrm{gw}}\frac{\bm{{r}}(t)\cdot\bm{{n}}(\alpha_{\mathrm{s}},\delta_{\mathrm{s}})}{c}\,.

(6)

A subscript “s” indicates the quantity related to the gravitational wave source. Namely, $(\alpha_{\mathrm{s}},\delta_{\mathrm{s}})$ means the sky position of the source. In the following, we use the notation $\bm{{n}}_{\mathrm{s}}:=\bm{{n}}(\alpha_{\mathrm{s}},\delta_{\mathrm{s}})$ .

For the modeling of the detector motion, we adopt a little simplification, which we believe will not affect our main result. We assume that the position vector of the detector can be written by a sum of the Earth’s rotation part $\bm{{r}}_{\oplus}(t)$ , and the Earth’s orbital motion part $\bm{{r}}_{\odot}(t)$ . The Earth is assumed to take a circular orbit on $xy$ -plane. Then, we can write $\bm{{r}}_{\odot}(t)$ as

\bm{{r}}_{\odot}(t)=R_{\mathrm{ES}}\left(\begin{matrix}\cos(\varphi_{\odot}+\Omega_{\odot}t)\\ \sin(\varphi_{\odot}+\Omega_{\odot}t)\\ 0\end{matrix}\right)\,,

(7)

where $R_{\mathrm{ES}}$ , $\Omega_{\odot}$ and $\varphi_{\odot}$ are the distance between the Earth and the Sun, the angular velocity of the orbital motion and the initial phase, respectively. The detector motion due to the Earth’s rotation can be described as

\bm{{r}}_{\oplus}(t)=R_{\mathrm{E}}\left(\begin{matrix}1&0&0\\ 0&\cos\epsilon&\sin\epsilon\\ 0&-\sin\epsilon&\cos\epsilon\end{matrix}\right)\left(\begin{matrix}\cos\lambda\cos(\varphi_{\oplus}+\Omega_{\oplus}t)\\ \cos\lambda\sin(\varphi_{\oplus}+\Omega_{\oplus}t)\\ \sin\lambda\end{matrix}\right)\,,

(8)

where $R_{\mathrm{E}}$ , $\lambda$ , $\Omega_{\oplus}$ and $\varphi_{\oplus}$ are the radius of the Earth, the latitude of the detector, the angular velocity of the Earth’s rotation and the initial phase, respectively. The modulated phase $\Phi(t)$ can be decomposed into

\Phi(t)=2\pi{f}_{\mathrm{gw}}t+\Phi_{\oplus}(t)+\Phi_{\odot}(t),

(9)

where

	$\displaystyle\Phi_{\oplus}(t)$	$\displaystyle=2\pi{f}_{\mathrm{gw}}\frac{\bm{{r}}_{\oplus}(t)\cdot{\bm{{n}}}_{\mathrm{s}}}{c},$		(10)
	$\displaystyle\Phi_{\odot}(t)$	$\displaystyle=2\pi{f}_{\mathrm{gw}}\frac{\bm{{r}}_{\odot}(t)\cdot{\bm{{n}}}_{\mathrm{s}}}{c}.$		(11)

Finally, we take into account the amplitude modulation due to the detector’s motion, which can be described by the antenna pattern function. In this work, the polarization angle and the inclination angle are, respectively, assumed to be

\psi=0\,,\qquad\cos\iota=1\,,

(12)

and, similarly to $\phi_{0}$ , they are treated as known parameters. Then, the gravitational wave to be observed by a detector can be written as

{h}_{\mathrm{obs}}(t)=G(t)h(t)+G^{\ast}(t)h^{\ast}\,,

(13)

with

G(t):=\frac{F_{+}(t)+iF_{\times}(t)}{2}\,.

(14)

The definitions of $F_{+}(t)$ and $F_{\times}(t)$ are the same as those used in Jaranowski et al., Jaranowski et al. (1998). In this work, the antenna pattern function of LIGO Hanford is employed. The strain data is written as

s(t)={h}_{\mathrm{obs}}(t)+n(t)\,,

(15)

where $n(t)$ is the detector noise. We assume that the strain data from the detector has no gaps in time and the detector noise is stationary and Gaussian.

III Coherent search method

Before explaining our method, we briefly review the coherent search method and the time resampling technique Jaranowski et al. (1998).

If the expected waveforms can be modeled precisely and the noise is Gaussian, the matched filtering is the optimal method for the detection and the parameter estimation, besides the computational cost. The noise weighted inner product is defined by

(a|b):=4\mathrm{Re}\left[\int^{\infty}_{0}df\ \frac{\tilde{a}(f)\tilde{b}^{\ast}(f)}{{S}_{\mathrm{n}}(f)}\right]\,,

(16)

where ${S}_{\mathrm{n}}(f)$ is the power spectral density of the detector noise. A signal-to-noise ratio (SNR) can be calculated with the inner product between a strain $s(t)$ and a template ${h}_{\mathrm{temp}}(t)$ as

{\rho}_{\mathrm{MF}}:=\frac{(s|{h}_{\mathrm{temp}})}{\sqrt{({h}_{\mathrm{temp}}|{h}_{\mathrm{temp}})}}\,.

(17)

Theoretically predicted waveforms ${h}_{\mathrm{temp}}(t)$ have various parameters characterizing the source properties and the geometrical information. A set of waveforms having different parameters is called a template bank. For each template in a template bank, we can assign the value of SNR calculated by Eq. (17). If the maximum value of SNR in the template set exceeds a threshold value, it is a sign that an actual signal may exist and the parameter inference is also obtained from the distribution of SNR.

Due to a long duration and a narrow frequency band of continuous gravitational waves, the inner product (16) can be recast into the time-domain expression as

(a|b)\simeq\frac{2}{{S}_{\mathrm{n}}({f}_{\mathrm{gw}})}\mathrm{Re}\left[\int^{{T}_{\mathrm{obs}}}_{0}dt\ a(t)b^{\ast}(t)\right]\,,

(18)

where ${T}_{\mathrm{obs}}$ is the observation time. For a monochromatic source, the waveform can be modeled by Eq. (1). The modulation due to the detector motion is only in the phase of the waveform. The time resampling technique nullifies the phase modulation by redefining the time coordinate. If the position of the source is a priori known, the exact relation (4) can be obtained. Therefore, the phase modulation can be completely removed and the monochromatic waveform is applicable to the time-domain matched filtering (18). Calculation of the inner product (18) between the resampled signal and a monochromatic waveform is equivalent to the Fourier transform. Thus, the fast algorithm (i.e., Fast Fourier transform) can be employed to rapidly search the gravitational wave frequency, ${f}_{\mathrm{gw}}$ .

When the source location is unknown, we need to search all-sky by placing a set of grid points $\{{\bm{{n}}}_{\mathrm{g}}^{(i)}\}_{i=1}^{{N}_{\mathrm{grid}}}$ to keep the maximum loss of SNR within the acceptable range. For each grid point, we carry out the Fourier transform after the transformation

\zeta:=t+\frac{\bm{{r}}(t)\cdot{\bm{{n}}}_{\mathrm{g}}}{c}\,.

(19)

Here, we omit the superscript $(i)$ , for brevity. The necessary number of grid points ${N}_{\mathrm{grid}}$ can be estimated by the angular resolution of gravitational wave sources. The angular resolution of gravitational wave sources can be roughly given by the ratio between the wavelength of the gravitational wave and the diameter of the Earth’s orbit, i.e.,

{(\delta\theta)}_{\mathrm{coh}}\sim\frac{\lambda_{\mathrm{gw}}}{2R_{\mathrm{ES}}}\sim 10^{-5}\text{ [rad]}\,.

(20)

Here, we adopt 100Hz as the fiducial value for $c{\lambda}_{\mathrm{gw}}^{-1}$ . Thus, the required number of grid points is, at least,

N_{\mathrm{grids}}\sim\frac{4\pi}{{(\delta\theta)}_{\mathrm{coh}}^{2}}\sim 1.3\times 10^{11}\,.

(21)

The time resampling and the Fourier transform are applied to each grid point. The number of floating point operations required for carrying out FFT is $\sim 1.7\times 10^{12}$ per grid point, with the signal of a duration $10^{7}$ sec and a sampling frequency 1024 Hz. Even if we have a 1PFlops machine, the computational time becomes $2.2\times 10^{8}$ sec, which is longer than the signal duration. For this reason, the coherent all-sky search is not realistic even for monochromatic sources yet.

IV Our Method

IV.1 Subtracting the effect due to the Earth’s rotation

As stated in Sec. III, the time resampling technique can demodulate the phase, but complete demodulation is not available because of the limitation of computational resources. In our work, the time resampling technique is employed to eliminate only the effect caused by the Earth’s diurnal rotation, $\Phi_{\oplus}(t)$ . Assuming a representative grid point $\bm{{n}}_{\mathrm{g}}$ , we can rewrite the phase (6) as

	$\displaystyle\Phi(t)$	$\displaystyle=2\pi{f}_{\mathrm{gw}}t+\Phi_{\oplus}(t)+\Phi_{\odot}(t)$
		$\displaystyle=2\pi{f}_{\mathrm{gw}}\zeta+\delta\Phi_{\oplus}(t)+\delta\Phi_{\odot}(t)\,,$		(22)

where

	$\displaystyle\delta\Phi_{\oplus}(t)$	$\displaystyle:=2\pi f_{\mathrm{gw}}\frac{\bm{{r}}_{\oplus}(t)\cdot\Delta\bm{{n}}}{c}\,,$		(23)
	$\displaystyle\delta\Phi_{\odot}(t)$	$\displaystyle:=2\pi f_{\mathrm{gw}}\frac{\bm{{r}}_{\odot}(t)\cdot\Delta\bm{{n}}}{c}\,,$		(24)

and $\Delta\bm{{n}}:=\bm{{n}}_{\mathrm{s}}-\bm{{n}}_{\mathrm{g}}$ . Since the residual phase varies with time, we will place grid points so that the amplitude of the residual phase in the worst case, i.e.,

\min_{\bm{{n}}_{\mathrm{g}}}\max_{t}|\delta\Phi_{\oplus}(t)|

becomes smaller than a threshold $\delta\Phi_{\epsilon}$ for any source direction $\bm{{n}}_{\mathrm{s}}$ within the area covered by the grid point ${\bm{{n}}}_{\mathrm{g}}$ . To optimize the grid placement, we employ the method proposed in Ref. Nakano et al. (2003). The residual phase $\delta\Phi_{\oplus}$ is expanded up to the first order of $\Delta\alpha:=\alpha_{\mathrm{s}}-\alpha_{\mathrm{g}}$ and $\Delta\delta:=\delta_{\mathrm{s}}-\delta_{\mathrm{g}}$ . Then, we get

	$\displaystyle\delta\Phi_{\oplus}\simeq$	$\displaystyle\frac{2\pi{f}_{\mathrm{gw}}}{c}R_{\text{E}}\cos\lambda\left\{-\Delta\delta\sin\delta_{\mathrm{g}}\cos(\alpha_{\mathrm{g}}-\varphi_{\oplus}-\Omega_{\oplus}t)\right.$
		$\displaystyle\left.-\Delta\alpha\cos\delta_{\mathrm{g}}\sin(\alpha_{\mathrm{g}}-\varphi_{\oplus}-\Omega_{\oplus}t)\right\}\,.$		(25)

Here, the constant term is neglected because it degenerates with the initial phase $\phi_{0}$ . The maximum value of the residual phase is

	$\displaystyle\max_{t}\|\delta\Phi_{\oplus}\|=$	$\displaystyle\frac{2\pi{f}_{\mathrm{gw}}}{c}R_{\text{E}}\|\cos\lambda\|$
		$\displaystyle\times\sqrt{(\Delta\delta)^{2}\sin^{2}{\delta}_{\mathrm{g}}+(\Delta\alpha)^{2}\cos^{2}{\delta}_{\mathrm{g}}}\,.$		(26)

The grid points are to be determined to satisfy $\max_{t}|\delta\Phi_{\oplus}|<\delta\Phi_{\epsilon}$ for any source direction.

Because the residual phase (26) is symmetric under the transformation ${\delta}_{\mathrm{g}}\to-{\delta}_{\mathrm{g}}$ , the placement of grids on the negative $\delta$ side can be generated by inverting the sign of the grids on the positive $\delta$ side. Therefore, we focus on the case with $0\leq\delta\leq\pi/2$ .

Since the residual phase depends only on $\delta$ at $\delta_{\mathrm{g}}=\pi/2$ , a single template can cover the neighbor of $\delta=\pi/2$ . In fact, at $\delta=\pi/2$ , Eq. (26) becomes

\max_{t}|\delta\Phi_{\oplus}|=\frac{2\pi{f}_{\mathrm{gw}}}{c}R_{\text{E}}|\Delta\delta|\cos\lambda\,.

(27)

Therefore, the condition $\max_{t}|\delta\Phi_{\oplus}|\leq\delta\Phi_{\epsilon}$ gives the lower bound of $\delta_{1}$ such that the region $\delta_{1}\leq\delta\leq\pi/2$ can be covered by a signle patch represented by $\{({\alpha}_{\mathrm{g}},{\delta}_{\mathrm{g}})=(0,\pi/2)\}$ , to find

\delta_{1}:=\frac{\pi}{2}-\delta\Phi_{\epsilon}\times\frac{c}{2\pi{f}_{\mathrm{gw}}}\frac{1}{R_{\text{E}}\cos\lambda}\,.

(28)

Plural patches are necessary to cover the strip of a constant $\delta$ in the other range. We introduce a 2-dimensional metric corresponding to the residual phase (26),

d\sigma^{2}=\cos^{2}\delta d\alpha^{2}+\sin^{2}\delta d\delta^{2}\,.

(29)

In general, a metric in a 2-dimensional manifold can be transformed into a conformally flat metric by an appropriate coordinate transformation. When the space is conformally flat, the curve of a small constant distance measured from an arbitrary chosen point can be approximated by a circle. Therefore, a template spacing in the 2-dimensional parameter space becomes relatively easy. By defining new variables $X:=\alpha$ and $Y:=-\log|\cos\delta|$ , the metric can be transformed into

d\sigma^{2}=e^{-2Y}(dX^{2}+dY^{2})\,.

(30)

Along with Nakano et al. (2003), we can construct the sky patches covering the half-sky region with $0\leq\delta\leq\delta_{1}$ . Figure 1 shows a part of grid points constructed under the condition

\delta\Phi_{\epsilon}=0.058\,,

(31)

which we adopt throughout this paper. The total number of grid points to cover the whole sky is

N_{\rm grid}=352,436\,,

(32)

for $f_{\mathrm{gw}}$ = 100Hz.

Refer to caption — Figure 1: Grid point placement on a fraction of $(\alpha,\cos\delta)$ -plane. Blue dots are grid points and orange contours show the $\max_{t}|\delta\Phi_{\oplus}(t)|=\delta\Phi_{\epsilon}$ contours for each grid point. The region $\{(\alpha,\delta)|\delta<\delta_{1}\}$ is covered by a single template $(\alpha_{\mathrm{g}},\delta_{\mathrm{g}})=(0,\pi/2)$ and the shape of the patch is square on this plane.

IV.2 Modeling the effect due to the Earth’s orbital motion

As we choose $\delta\Phi_{\epsilon}$ to be sufficiently small, we neglect $\delta\Phi_{\oplus}$ in the following discussion. Then, after subtracting the phase modulation due to the Earth’s rotation, the phase of the gravitational wave (22) becomes

\Phi(t)=2\pi{f}_{\mathrm{gw}}\zeta+\delta\Phi_{\odot}(t).

(33)

We apply the short-time Fourier transform (STFT) to the time-resampled strain,

s(\zeta)={h}_{\mathrm{obs}}(\zeta)+n(\zeta).

(34)

In the rest of the paper, we treat only the time-resampled data. Therefore, without confusion, the time-resampled data in Eq. (34) can be denoted by the same character as the original one. The strain is divided into ${N}_{\mathrm{seg}}$ segments having the duration ${T}_{\mathrm{seg}}$ and their start times are denoted by $\zeta_{j}:=jT_{\mathrm{slide}},\ (j=0,1,\cdots,{N}_{\mathrm{seg}}-1)$ . $T_{\mathrm{slide}}$ is not necessary to be equal to $T_{\mathrm{seg}}$ . The output of STFT with the window function $w(\zeta)$ is defined by

s^{\mathrm{STFT}}_{j,k}=h^{\mathrm{STFT}}_{j,k}+n^{\mathrm{STFT}}_{j,k}\,,

(35)

where

h^{\mathrm{STFT}}_{j,k}=\frac{1}{T_{\mathrm{seg}}}\int^{\zeta_{j}+T_{\mathrm{seg}}}_{\zeta_{j}}d\zeta^{\prime}\ w(\zeta^{\prime}-\zeta_{j}){h}_{\mathrm{obs}}(\zeta^{\prime})e^{-2\pi if_{k}\zeta^{\prime}}\,,

(36)

n^{\mathrm{STFT}}_{j,k}=\frac{1}{T_{\mathrm{seg}}}\int^{\zeta_{j}+T_{\mathrm{seg}}}_{\zeta_{j}}d\zeta^{\prime}\ w(\zeta^{\prime}-\zeta_{j})n(\zeta^{\prime})e^{-2\pi if_{k}\zeta^{\prime}}\,,

(37)

and $f_{k}:=k\Delta f=k/T_{\mathrm{seg}}$ is the frequency of the $k$ -th element of STFT. Let us focus on the positive frequency modes, i.e., $f_{k}>0$ . Then, the second term of Eq. (13) can be neglected and Eq. (36) can be approximated by

	$\displaystyle{h}^{\mathrm{STFT}}_{j,k}\simeq$	$\displaystyle\frac{1}{{T}_{\mathrm{seg}}}\int^{\zeta_{j}+{T}_{\mathrm{seg}}}_{\zeta_{j}}d\zeta^{\prime}\ \bigg{\{}w(\zeta^{\prime}-\zeta_{j})$
		$\displaystyle\times G(t(\zeta^{\prime}))e^{2\pi i\delta f_{k}\zeta^{\prime}}e^{i\delta\Phi_{\odot}(\zeta^{\prime})}\bigg{\}}.$		(38)

with $\delta f_{k}:={f}_{\mathrm{gw}}-f_{k}$ . In the expression of $G(t(\zeta))$ , the SSB time $\zeta$ appears only through the combination $\Omega_{\oplus}t(\zeta)$ . The difference between $\Omega_{\oplus}t(\zeta)$ and $\Omega_{\oplus}\zeta$ is negligiblly small. Therefore, in Eq. (38), $G(t(\zeta))$ can be replaced by $G(\zeta)$ . The duration ${T}_{\mathrm{seg}}$ is chosen so that $G(t(\zeta))$ can be approximated by a constant in each segment. With this choice of ${T}_{\mathrm{seg}}$ , the factor $e^{i\delta\Phi_{\odot}(t)}$ also can be seen as a constant in each segment because it varies slower than the antenna pattern function. Therefore, Eq. (38) can be approximated by

h^{\mathrm{STFT}}_{j,k}\simeq h_{0}e^{i\delta\Phi_{\odot}(\zeta_{j})}G(\zeta_{j})W_{k}(\zeta_{j})\,,

(39)

where

W_{k}(\zeta_{j}):=\frac{1}{T_{\mathrm{seg}}}\int^{\zeta_{j}+T_{\mathrm{seg}}}_{\zeta}d\zeta^{\prime}\ w(\zeta^{\prime}-\zeta_{j})e^{2\pi i\delta f_{k}\zeta^{\prime}}\,.

(40)

In this work, we use the tukey window,

	$\displaystyle w(\zeta)=$
	$\displaystyle\begin{cases}\frac{1}{2}-\frac{1}{2}\cos\left(\frac{2\pi\zeta}{\alpha T_{\mathrm{seg}}}\right)\,,&\left(0\leq\frac{\zeta}{T_{\mathrm{seg}}}<\frac{\alpha}{2}\right)\,,\\ 1\,,&\left(\frac{\alpha}{2}\leq\frac{\zeta}{T_{\mathrm{seg}}}\leq 1-\frac{\alpha}{2}\right)\,,\\ \frac{1}{2}-\frac{1}{2}\cos\left(\frac{2\pi(T_{\mathrm{seg}}-\zeta)}{\alpha T_{\mathrm{seg}}}\right)\,,&\left(1-\frac{\alpha}{2}<\frac{\zeta}{T_{\mathrm{seg}}}\leq 1\right)\,.\end{cases}$		(41)

We set the parameter $\alpha$ to 0.125. With $\beta_{k}:=T_{\mathrm{seg}}\delta f_{k}$ , Eq. (40) can be calculated as

	$\displaystyle W_{k}(\zeta_{j})=$	$\displaystyle\ e^{2\pi i\delta f_{k}\zeta_{j}}$
		$\displaystyle\times\frac{(1+e^{i\pi\alpha\beta_{k}})(1-e^{2\pi i\beta_{k}(1-\alpha/2)})}{4\pi i\beta_{k}(\alpha^{2}\beta_{k}^{2}-1)}\,.$		(42)

Using the Jacobi-Anger expansion, we can expand the factor $e^{i\delta\Phi_{\odot}(\zeta_{j})}$ that appears in Eq. (39) as

e^{i\delta\Phi_{\odot}(\zeta_{j})}=\sum_{\ell=-\infty}^{\infty}i^{\ell}J_{\ell}(X)e^{i\ell\Omega_{\odot}\zeta_{j}}e^{i\ell(\varphi_{\odot}-\phi_{X})}

(43)

where $J_{\ell}(z)$ is the Bessel function of the first kind and

	$\displaystyle X:=\frac{2\pi{f}_{\mathrm{gw}}{R}_{\mathrm{ES}}}{c}\sqrt{(\Delta n_{x})^{2}+(\Delta n_{y})^{2}},$		(44)
	$\displaystyle e^{i\phi_{X}}:=\Delta n_{x}+i\Delta n_{y}.$		(45)

Therefore, Eq. (39) can be expressed as

	$\displaystyle{h}^{\mathrm{STFT}}_{j,k}$	$\displaystyle\simeq h_{0}G(\zeta_{j})W_{k}(0)e^{2\pi i\delta f_{k}\zeta_{j}}$
		$\displaystyle\times\sum_{\ell=-\infty}^{\infty}i^{\ell}J_{\ell}(X)e^{i\ell(\varphi_{\odot}-\phi_{X})}e^{i\ell\Omega_{\odot}\zeta_{j}}\,.$		(46)

The Fourier transform of ${h}^{\mathrm{STFT}}_{j,k}$ with a fixed integer $k$ is defined by

\mathsf{H}_{\ell,k}:=\frac{1}{{N}_{\mathrm{seg}}}\sum_{j=0}^{{N}_{\mathrm{seg}}-1}{h}^{\mathrm{STFT}}_{j,k}e^{-2\pi ij\ell/{N}_{\mathrm{seg}}}.

(47)

We refer to $\mathsf{H}_{\ell,k}$ as the $\ell$ -domain signal. To understand the pattern hidden in $\mathsf{H}_{j,k}$ , we set aside the factor $G(\zeta_{j})$ for a while. Then, Eq. (47) can be estimated as

\mathsf{H}_{\ell,k}\sim h_{0}W_{k}(0)i^{\ell^{\prime}}J_{\ell^{\prime}}(X)e^{i\ell^{\prime}(\varphi_{\odot}-\phi_{X})},

(48)

with $\ell^{\prime}\sim\ell+2\pi\Omega_{\odot}^{-1}\delta f_{k}$ . Because of the fact that $J_{\ell}(z)\simeq 0$ for $|\ell|\gtrsim|z|$ and $X\lesssim O(10^{3})$ for ${f}_{\mathrm{gw}}=100$ Hz, the signal is localized within the region where only few thousand bins in the $\ell$ -domain. Putting back the antenna pattern $G(\zeta_{j})$ , we expect that $\ell$ -domain signals lose their amplitude and their localizations become worse than those for the idealized cases. Figure 2 shows an example of the $\ell$ -domain signal.

IV.3 Excess power method for finding candidates

By the method shown in the previous subsection, for every grid point $\bm{{n}}_{\mathrm{g}}$ and every frequency bin $f_{k}$ , we obtain an $\ell$ -domain strain defined by

\mathsf{S}_{\ell,k}:=\mathsf{H}_{\ell,k}+\mathsf{N}_{\ell,k},

(49)

with

\mathsf{N}_{\ell,k}:=\frac{1}{{N}_{\mathrm{seg}}}\sum_{j=0}^{{N}_{\mathrm{seg}}-1}{n}^{\mathrm{STFT}}_{j,k}e^{-2\pi ij\ell/{N}_{\mathrm{seg}}}.

(50)

There are ${T}_{\mathrm{obs}}/{T}_{\mathrm{seg}}\sim O(10^{6})$ data points in a single $\ell$ -domain strain and we know that the signal in $\ell$ -domain will be localized within a small region $\sim O(10^{3})$ . Thus, the excess power method Anderson et al. (2001) is useful for selecting the candidates with a minimal computational cost. We here divide an $\ell$ -domain signal into short chunks so that each chunk has the length $\delta\ell$ and neighbored segments have an overlap by $\delta\ell/2$ , which is one of the simplest choices but not the optimal one. Then, we obtain ${N}_{\mathrm{chunk/signal}}=2({N}_{\mathrm{seg}}-\delta\ell)/\delta\ell$ chunks from one $\ell$ -domain signal. The excess power statistic for the grid point ${\bm{{n}}}_{\mathrm{g}}$ , the frequency bin $f_{k}$ , and the $c$ -th chunk ( $c=0,1,\dots,{N}_{\mathrm{chunk/signal}}-1$ ) is defined by

\mathcal{E}({\bm{{n}}}_{\mathrm{g}},f_{k},c):=4\sum_{\ell=c\delta\ell/2}^{(c+2)\delta\ell/2-1}\frac{|\mathsf{S}_{\ell,k}|^{2}}{\tilde{\sigma}_{k}^{2}}\,,

(51)

where

\langle\mathsf{N}_{\ell,k}\mathsf{N}^{\ast}_{\ell^{\prime},k}\rangle=:\frac{1}{2}\tilde{\sigma}_{k}^{2}\delta_{\ell\ell^{\prime}}\,.

(52)

The variance of noise in $\ell$ -domain, $\tilde{\sigma}_{k}$ , is estimated as

\tilde{\sigma}_{k}^{2}=\frac{{S}_{\mathrm{n}}(f_{k})}{{N}_{\mathrm{seg}}{T}_{\mathrm{seg}}}\times\mathcal{W}\,,

(53)

where $\mathcal{W}$ is the factor coming from the window function and defined by

\mathcal{W}:=\int^{0.5}_{-0.5}dx[w(x)]^{2}\,.

(54)

The derivation of Eq. (53) is summarized in Appendix A.

We define the SNR of the excess power by

{\rho}_{\mathrm{EP}}({\bm{{n}}}_{\mathrm{g}},f_{k},c):=\frac{\mathcal{E}({\bm{{n}}}_{\mathrm{g}},f_{k},c)-\langle\mathcal{E}\rangle_{\mathrm{n}}}{\sigma_{\mathrm{n}}(\mathcal{E})}\,,

(55)

where

\langle\mathcal{E}\rangle_{\mathrm{n}}=2\delta\ell\,,

(56)

and

\sigma_{\mathrm{n}}(\mathcal{E}):=\sqrt{\langle(\mathcal{E}-\langle\mathcal{E}\rangle_{\mathrm{n}})^{2}\rangle_{\mathrm{n}}}=2\sqrt{\delta\ell}\,,

(57)

are, respectively, the expectation value and the standard deviation of $\mathcal{E}$ when only noise exists. We select the candidate set of parameter values $\{{\bm{{n}}}_{\mathrm{g}},f_{k},c\}$ , when

{\rho}_{\mathrm{EP}}({\bm{{n}}}_{\mathrm{g}},f_{k},c)>{\hat{\rho}}_{\mathrm{EP}}

is satisfied with a threshold value ${\hat{\rho}}_{\mathrm{EP}}$ . Strictly speaking, since the excess power statistic $\mathcal{E}$ is the sum of $2\delta\ell$ squared Gaussian random variables with the variance $1/2\sqrt{\delta\ell}$ , $\mathcal{E}$ follows a chi square distribution with the degree of freedom $2\delta\ell$ . However, since here we choose $\delta\ell$ to be large, the distribution of $\mathcal{E}$ can be approximated by a Gaussian distribution with the average $2\delta\ell$ and the standard deviation $2\sqrt{\delta\ell}$ . Therefore, in the absence of gravitational wave signal, the probability distribution of ${\rho}_{\mathrm{EP}}$ is a Gaussian distribution with zero mean and unit variance.

Also in the presence of some signal, the excess power statistics ${\rho}_{\mathrm{EP}}$ is given by a sum of many statistical variables. Thus, the statistical distribution of ${\rho}_{\mathrm{EP}}$ can be approximated by the Gaussian distribution whose mean and standard deviation are calculated as

{\mu}_{\mathrm{EP}}(\bm{{\xi}})=\frac{2P_{k}(\bm{{\xi}})}{\tilde{\sigma}^{2}_{k}\sqrt{\delta\ell}}\,,

(58)

and

{\sigma}_{\mathrm{EP}}(\bm{{\xi}})=\sqrt{1+\frac{4P_{k}(\bm{{\xi}})}{\tilde{\sigma}_{k}^{2}\delta\ell}}\,,

(59)

Here, we define

P_{k}(\bm{{\xi}}):=\sum_{\ell}|\mathsf{H}_{\ell,k}(\bm{{\xi}})|^{2}\,,

(60)

and we define $\bm{{\xi}}$ as a set of parameters $\bm{{\xi}}:=(h_{0},\vec{\xi})=(h_{0},{f}_{\mathrm{gw}},{\alpha}_{\mathrm{s}},{\delta}_{\mathrm{s}})$ . The false alarm rate and the detection efficiency will be assessed with this Gaussian approximation.

IV.4 Neural network for localizing

IV.4.1 fundamentals

Deep learning is one of the approaches for finding features being hidden in the data (see Goodfellow et al. (2016) as a textbook). Artificial neural networks (ANNs) are the architectures playing the central roll in deep learning. An ANN consists of consecutive layers and each layer is formed by a lot of units (neurons). Each layer takes inputs from the previous layer and processed data is passed to the next layer. As a simple example, the process occurring in each layer can be written as the combination of affine transformation and a non-linear transformation, i.e.,

x^{(\ell+1)}_{i}=g\left(\sum_{j=1}^{N^{(\ell)}}w^{(\ell)}_{ij}x^{(\ell)}_{j}+b^{(\ell)}\right)\ (i=1,2,\cdots,N^{(\ell+1)})\,,

(61)

where $x^{(\ell)}$ is a set of input data on the $\ell$ -th layer and $g$ is a nonlinear function, which is called an activation function. We use a ReLU function Jarrett et al. (2009), defined by

g(z)=\max[z,0]\,.

(62)

The parameters $w$ and $b$ are, respectively, called weights and biases. They are tunable parameters and optimized to capture the features of data. The process to optimize weights and biases is called training. Frequently, the affine transformation and the non-linear transformation are divided into two layers, called a linear layer and a non-linear transformation layer, respectively.

In addition to the layers as given by Eq. (61), many variants are proposed so far. In this work, we use also one-dimensional convolutional layers LeCun and Bengio (1998) and max-pooling layers Zhou and Chellappa (1988). The input of a convolutional layer, denoted by $x^{c}_{i}$ , is a set of vectors. For example, in the case of color images, each pixel has three channels corresponding to three primary colors of light. Therefore, the input data is a set of three two-dimensional arrays. The discrete convolution, which is represented as

o^{c^{\prime}}_{i}=\sum_{c=0}^{C-1}\sum_{k=0}^{K-1}f^{c,c^{\prime}}_{k}x^{c}_{i+k}+b^{c^{\prime}}\,,

(63)

is calculated in a convolutional layer. Here, $x$ is the input and $o$ is the output data of the layer. $C$ and $K$ are, respectively, the number of channels and the width of the kernel. Each pixel of the data is specified by an index $i$ . The parameters $f$ and $b$ are optimized during the training. A max pooling layer, whose operation can be written as

o^{c}_{i}=\max_{k=0,1,\cdots,K-1}[x^{c}_{si+k}]\,,

(64)

with the kernel size $K$ and the stride $s$ , reduces the length of the data and hence the computational cost.

In supervised learning, a given dataset consists of many pairs of input data and target values. An ANN learns the relation between input data and target values from the dataset and predicts values corresponding to newly given input data. In order to train an ANN, the deviation between the predicted values and the target value is quantified by a loss function. For a regression problem, the mean square loss,

L[\bm{{y}}(w),\bm{{t}}]=\frac{1}{2}\sum_{i=1}^{d}|y_{i}(w)-t_{i}|^{2}\,,

(65)

is often employed. Here, $\bm{{y}}$ and $\bm{{t}}$ are a set of predicted values and that of target values, respectively, and they are expressed as $d$ -dimensional vectors. The prediction depends on the weights of the neural network, which are denoted by a single symbol $w$ . An ANN is optimized so as to minimize the loss function for a given dataset, which is the sum of the loss functions for all data contained in the training dataset. Because the complete minimization using all dataset cannot be done, the iterative method is used. The weight $w$ is updated by the replacement algorithm given by

w\to w-\eta\nabla_{w}\sum_{n=1}^{{N}_{\mathrm{train}}}L[\bm{{y}}_{n}(w)-\bm{{t}}_{n}]\,,

(66)

where ${N}_{\mathrm{train}}$ is the number of data contained in the dataset and $\eta$ is called learning rate and characterizes the strength of each update. The algorithm shown in Eq. (66) is called gradient descent, which is the simplest procedure to update the weights, and many variants (e.g., momentum Polyak (1964), RMS prop Hinton (2012), Adam Kingma and Ba (2017)) are proposed so far. Regardless of the choice of the update algorithm, the gradients of a loss function is required and they can be quickly calculated by the backpropagation scheme Rumelhart et al. (1986). In Eq. (66), all data in the dataset are used for each iteration. In practice, the loss function for a subset of the dataset is calculated. The subset is called a batch and chosen randomly in every iteration. This procedure is called a mini-batch training.

In the training process, we optimize a neural network so that the loss function is minimized for a dataset. However, this strategy cannot be straightforwardly applied to practical situations. First, the trained neural network may fall in overfitting. Then, the neural network does not have an expected ability to correctly predict the label for a newly given input data which is not used for training. Second, we have to optimize the neural network model and the update procedure, too. For this purpose, we have to appropriately select the hyperparameters, such as the number of neurons of the $\ell$ -th layer ( $N^{(\ell)}$ ) and the learning rate ( $\eta$ ). They are not automatically tuned during the training process.

To solve these problems, we prepare a validation dataset which is independent from the training dataset. The weights of the neural network are optimized so that the loss function for the training dataset is minimized. The validation data is used for monitoring the training process and assessing which model is better for the problem that the user wants to solve. To prevent the overfitting, the training should be stopped when the loss for the validation dataset tend to deviate from that for the training dataset (early stopping). To optimize the hyperparameters, many neural network models having various structures are trained with different training schemes. Among them, we choose the one performing with the smallest loss for the validation dataset.

IV.4.2 setup in our analysis

The whole architecture of the neural network we used is shown in Table 1. The input data of the neural network is the complex valued numbers taken from a short chunk of the $\ell$ -domain signal, and the output is the predicted sky position. The $\ell$ -domain waveform $\mathsf{H}_{\ell,k}$ is determined mainly by the residual phase $\delta\Phi_{\odot}$ , which depends on the sky position $({\alpha}_{\mathrm{s}},{\delta}_{\mathrm{s}})$ through the vector $\Delta\bm{{n}}$ . Because $z_{\odot}=0$ , only $x$ and $y$ components of $\Delta\bm{{n}}$ affect $\delta\Phi_{\odot}$ . Therefore, we label each waveform with the values of $\Delta n_{x}$ and $\Delta n_{y}$ , which are the targets of the prediction of the neural network. The outputs of the neural network are inverted to the predicted values which are denoted by $({\alpha}_{\mathrm{p}},{\delta}_{\mathrm{p}})$ . We apply the neural network to each candidate, selected by the excess power method, in order to narrow down the possible area in which the source is likely to be located. For simplicity, the $(\alpha,\delta)$ -plane is regarded as a two-dimensional Euclidean space, and the shape of the predicted region is assumed to be a disk on the $(\alpha,\delta)$ -plane. For each candidate, the origin of the disk is set to the predicted point. The radius of the disk, denoted by ${r}_{\mathrm{NN}}$ , is fixed to a constant value. In the follow-up stage, the finer grids are placed to cover whole region of the disk.

Table 1: The architecture of the neural network used in this work. For convolution and max pooling layers, the input and the output are characterized by

(C,N)

where

C

is the number of channels and

N

is the length of the data. For convolutional layers, the lengths of kernels are 16, 16, 8, 8, 4 and 4 from the earlier to the later layer. The kernel size of the max pooling layers is 4.

Layer	Input	output
1-d convolution	(2, 2048)	(64, 2033)
ReLU	(64, 2033)	(64, 2033)
1-d convolution	(64, 2033)	(64, 2018)
ReLU	(64, 2018)	(64, 2018)
max pooling	(64, 2018)	(64, 504)
1-d convolution	(64, 504)	(128, 497)
ReLU	(128, 497)	(128, 497)
1-d convolution	(128, 497)	(128, 490)
ReLU	(128, 490)	(128, 490)
max pooling	(128, 490)	(128, 122)
1-d convolution	(128, 122)	(256, 119)
ReLU	(256, 119)	(256, 119)
1-d convolution	(256, 119)	(256, 116)
ReLU	(256, 116)	(256, 116)
max pooling	(256, 116)	(256, 29)
Dense	256 $\times$ 29	64
ReLU	64	64
Dense	64	64
ReLU	64	64
Dense	64	2

In order to train the neural network, we need to prepare the training dataset and the validation dataset. We use Eq. (47) as the model waveform and pick up only a short chunk containing the signal. The length of chunk is $\delta\ell=2048$ . We prepare 200,000 waveforms for the training and ten thousand waveforms for validation. At that time, we set $h_{0}=1$ . We assume that we use only a single detector and use the geometry information (e.g., the latitude of the detector) of LIGO Hanford in calculating the antenna pattern function as an example. In this work, we focus on one sky patch covered by a single grid point and a frequency bin fixed at $f_{k}=100$ Hz since the scaling to the search over the whole sky and the wider frequency band is straightforward. The sources are randomly distributed within the sky patch. The parameters $\beta_{k}$ are randomly sampled from a uniform distribution on $[-0.5,0.5]$ . The original strain has the duration $2^{24}$ sec and the sampling frequency $1024$ Hz. We introduce the normalized gravitational wave amplitude by

\hat{h}_{0}:=h_{0}\left(\frac{{S}_{\mathrm{n}}({f}_{\mathrm{ref}})}{\mathrm{1Hz^{-1}}}\right)^{-1/2}.

(67)

Here, we set ${f}_{\mathrm{ref}}=f_{k}$ . At each training step, the amplitude whose logarism is randomly chosen from the uniform distribution on $-2.1\leq{\log}_{\mathrm{10}}\hat{h}_{0}\leq-1.0$ is multiplied to the waveforms, and they are injected into the simulated noise. The different realizations of noise are sampled for every iterations. The real part and the imaginary part of the noise data mimicking $\mathsf{N}_{\ell,k}$ are generated from a Gaussian distribution with a zero mean and a variance

\frac{\mathcal{W}}{4{N}_{\mathrm{seg}}{T}_{\mathrm{seg}}}\,,

(68)

(see Eqs. (52) and (53)).

We employ the mini-batch training. We set the batch size to 256. The Adam Kingma and Ba (2017) is used for the update algorithm. We implement with the Python library PyTorch Paszke et al. (2019) and use a GPU GeForce 1080Ti. The parameter values we used are listed in Table 2.

Table 2: The values of the parameters we used in this work.

Symbol	Parameters	Value
${T}_{\mathrm{obs}}$	Observation period	$2^{24}$ sec
${f}_{\mathrm{s}}$	Sampling frequency	1024 Hz
${N}_{\mathrm{grid}}$	# of grids	352436
${N}_{\mathrm{bin}}$	# of frequency bins of STFT	3200
${T}_{\mathrm{seg}}$	Duration of a STFT segment	32 sec
${T}_{\mathrm{slide}}$	Dilation of STFT segment	32 sec
$\delta\ell$	Length of chunk	2048
${(\delta\ell)}_{\mathrm{slide}}$	Dilation of chunk	128
$f_{k}$	Fixed frequency bin	100 Hz
${\alpha}_{\mathrm{g}}$	Right ascension of grid	-0.158649 rad
${\delta}_{\mathrm{g}}$	Declination of grid	1.02631 rad

IV.5 Follow-up analysis by coherent matched filtering

After selecting candidates and narrowing down the possible area at which the source is likely to be located, we apply the coherent matched filtering for the follow-up analysis. The grid points with the resolution shown in Eq. (20) are placed to cover the selected area. Assuming a grid point, we can carry out the demodulation of the phase by using the time resampling technique. If the deviation between the directions of the grid point and the source is smaller than the resolution, the residual phase remaining after the time resampling is sufficiently small to avoid the loss of SNR.

In this operation, heterodyning and downsampling can significantly reduce the data length and hence the computational cost Dupuis and Woan (2005). Let us assume that we have a candidate labeled with $\{{\bm{{n}}}_{\mathrm{g}},f_{k},c\}$ . If the candidate is the true event, the gravitational wave frequency ${f}_{\mathrm{gw}}$ should take the value in the narrow frequency band indicated by

f_{k}-\frac{1}{2{T}_{\mathrm{seg}}}\leq{f}_{\mathrm{gw}}\leq f_{k}+\frac{1}{2{T}_{\mathrm{seg}}}\,.

(69)

By multiplying the factor $e^{-2\pi if_{k}\zeta}$ to the resampled strain, we can convert the gravitational wave signal frequency to near DC components (heterodyning). After that, the gravitational wave signal has a lower frequency than $1/2{T}_{\mathrm{seg}}$ Hz. Therefore, downsampling by appropriately averaging the resampled strain data with a sampling frequency $\sim 1/{T}_{\mathrm{seg}}$ reduces the number of data points without loss of the significance of the gravitational wave signal.

The coherent matched filtering follows the heterodyning and the downsampling processes. As stated in Eqs. (2) and (12), we fix $\psi=0$ , $\cos\iota=1$ , and $\phi_{0}=0$ and treat them as known parameters. Also, we assume that the signal waveform and the template completely match. The definition of a match is already given in Eq. (17). The gravitational waveform is written as

h(\bm{{\xi}})=h_{0}\cdot{h}_{\mathrm{temp}}(\vec{\xi})\,.

(70)

Among these parameters, the amplitude $h_{0}$ can be analytically marginalized to maximize the likelihood. Then, we obtain the signal-to-noise ratio in Eq. (17) and use it as the detection statistic. When only the detector noise dominates the strain data, ${\rho}_{\mathrm{MF}}$ follows the standard normal distribution. On the other hand, if the signal exists, the SNR follows a Gaussian distribution with a mean

{\mu}_{\mathrm{MF}}(\bm{{\xi}})=h_{0}\cdot\sqrt{({h}_{\mathrm{temp}}(\vec{\xi})|{h}_{\mathrm{temp}}(\vec{\xi}))}\,,

(71)

and a unit variance.

V Results

V.1 Computational cost

Our procedure is characterized by three parameters $\bm{{\rho}}:=(\mathrm{FAP}_{\mathrm{EP}},{r}_{\mathrm{NN}},{\hat{\rho}}_{\mathrm{MF}})$ , i.e.,

•

$\mathrm{FAP}_{\mathrm{EP}}$ , false alarm probability for each chunk,
•

${r}_{\mathrm{NN}}$ , the radius of the predicted region to which the follow-up analysis is applied,
•

${\hat{\rho}}_{\mathrm{MF}}$ , the threshold of the SNR of the coherent matched filtering.

${\mathcal{N}}_{\mathrm{EP}}$ denotes the computational cost of the excess power method. $2{N}_{\mathrm{seg}}$ multiplications and $2{N}_{\mathrm{seg}}$ additions of real numbers are required to calculate the excess powers for all chunks in one $\ell$ -domain signal. The computational cost for calculating the excess powers for all chunks can be estimated as

{\mathcal{N}}_{\mathrm{EP}}=4{N}_{\mathrm{seg}}\times{N}_{\mathrm{grid}}\times{N}_{\mathrm{bin}}\sim 4.7\times 10^{15}\,,

(72)

in the unit of the number of floating point operations. As we see in the following, this cost can be neglected. Next, we check the computational time of the neural network analysis. We estimate the computational time of the neural network by measuring the elapsed time for analyzing ten thousand data. Because the elapsed time is 1.4sec, the total computational time of the neural network is estimated as

{\mathcal{T}}_{\mathrm{NN}}\simeq\frac{1.4\mathrm{sec}}{10^{4}\mathrm{data}}\times{N}_{\mathrm{candidate}}\,,

(73)

where ${N}_{\mathrm{candidate}}$ is the number of candidates which are selected by the excess power method and is estimated as

$\displaystyle{N}_{\mathrm{candidate}}$	$\displaystyle={N}_{\mathrm{chunk}}\times{\mathrm{FAP}}_{\mathrm{EP}}$
	$\displaystyle={N}_{\mathrm{grid}}\cdot{N}_{\mathrm{bin}}\cdot\frac{{N}_{\mathrm{seg}}}{{(\delta\ell)}_{\mathrm{slide}}}\times{\mathrm{FAP}}_{\mathrm{EP}}$
	$\displaystyle\simeq 4.6\times 10^{10}\left(\frac{\mathrm{FAP}_{\mathrm{EP}}}{10^{-2}}\right)\,.$	(74)

Substituting it, we obtain

{\mathcal{T}}_{\mathrm{NN}}\simeq 6.4\times 10^{6}\mathrm{sec}\left(\frac{\mathrm{FAP}_{\mathrm{EP}}}{10^{-2}}\right)\,.

(75)

Therefore, we focus on the case $\mathrm{FAP}_{\mathrm{EP}}\leq 10^{-2}$ . The computational cost of our analysis is dominated mainly by the preprocessing of the observed strain data and the follow-up analysis. The computational cost of the entire analysis is denoted by ${\mathcal{N}}_{\mathrm{comp}}$ and is approximately calculated by

{\mathcal{N}}_{\mathrm{comp}}={\mathcal{N}}_{\mathrm{preprocess}}+{\mathcal{N}}_{\mathrm{follow-up}}\,,

(76)

where ${\mathcal{N}}_{\mathrm{preprocess}}$ and ${\mathcal{N}}_{\mathrm{follow-up}}$ are the computational cost of the preprocessing and the follow-up analysis, respectively. In this work, we fix the STFT segment duration and the length of the chunk. Thus, the computational cost of the preprocessing is a constant.

{\mathcal{N}}_{\mathrm{preprocess}}={N}_{\mathrm{grid}}\times({\mathcal{N}}_{\mathrm{STFT}}+{\mathcal{N}}_{\mathrm{FFT}}\times{N}_{\mathrm{bins}})\,.

(77)

The computational cost of the STFT is

{\mathcal{N}}_{\mathrm{STFT}}={N}_{\mathrm{seg}}\cdot 5{T}_{\mathrm{seg}}{f}_{\mathrm{s}}\log_{2}{T}_{\mathrm{seg}}{f}_{\mathrm{s}}\,,

(78)

and the computational cost of FFT is

{\mathcal{N}}_{\mathrm{FFT}}=5{N}_{\mathrm{seg}}\log_{2}{N}_{\mathrm{seg}}\,.

(79)

With ${T}_{\mathrm{obs}}=2^{24}$ sec, ${T}_{\mathrm{seg}}=2^{5}$ sec, ${f}_{\mathrm{s}}=2^{10}$ Hz, and ${N}_{\mathrm{grid}}\sim 3.5\times 10^{5}$ , the computational cost of the preprocess is estimated as

{\mathcal{N}}_{\mathrm{preprocess}}\simeq 2.3\times 10^{18}\,.

(80)

On the other hand, the computational cost of the follow-up analysis is determined by the combination of $\mathrm{FAP}_{\mathrm{EP}}$ and ${r}_{\mathrm{NN}}$ . The computational cost of the follow-up analysis is

{\mathcal{N}}_{\mathrm{follow-up}}={N}_{\mathrm{candidate}}\times\frac{\pi({r}_{\mathrm{NN}})^{2}}{{(\delta\theta)}_{\mathrm{coh}}^{2}}\times{\mathcal{N}}_{\mathrm{FFT,coh}}\,.

(81)

Here, ${(\delta\theta)}_{\mathrm{coh}}^{2}$ is the typical area of region where each grid point of the coherent analysis covers (see Eq. (20)). The computational cost of taking match is dominated by the Fourier transform and calculated as

{\mathcal{N}}_{\mathrm{FFT,coh}}=5({T}_{\mathrm{obs}}{f}_{\mathrm{s,coh}})\log_{2}({T}_{\mathrm{obs}}{f}_{\mathrm{s,coh}})\simeq 5.0\times 10^{7}\,,

(82)

where ${f}_{\mathrm{s,coh}}=1/{T}_{\mathrm{seg}}=2^{-5}$ Hz. Therefore, we estimate the computational cost of the follow-up analysis as

{\mathcal{N}}_{\mathrm{follow-up}}=9.0\times 10^{22}\left(\frac{{r}_{\mathrm{NN}}}{10^{-3}\mathrm{rad}}\right)^{2}\left(\frac{\mathrm{FAP}_{\mathrm{EP}}}{10^{-2}}\right)\,.

(83)

Substituting Eqs. (80) and (83) into Eq. (76), we can assess the computational cost of the entire analysis as a function of ${r}_{\mathrm{NN}}$ and $\mathrm{FAP}_{\mathrm{EP}}$ . Figure. 3 shows the computational cost for various combinations of $\mathrm{FAP}_{\mathrm{EP}}$ and ${r}_{\mathrm{NN}}$ . One can read a feasible combination of $\mathrm{FAP}_{\mathrm{EP}}$ and ${r}_{\mathrm{NN}}$ depending on one’s available computational resources.

V.2 False alarm probability

The false alarm probability of the entire process (see Wette (2012); Dreissigacker et al. (2018)) is

	$\displaystyle{p}_{\mathrm{fa}}(\bm{{\rho}})=$	$\displaystyle\left\{1-\left(\mathrm{Prob}\left[{\rho}_{\mathrm{EP}}<{\hat{\rho}}_{\mathrm{EP}}\mid{\rho}_{\mathrm{EP}}\sim\mathcal{N}(0,1)\right]\right)^{{N}_{\mathrm{chunk}}}\right\}$
		$\displaystyle\times\left\{1-\left(\mathrm{Prob}\left[{\rho}_{\mathrm{MF}}<{\hat{\rho}}_{\mathrm{MF}}\mid{\rho}_{\mathrm{MF}}\sim\mathcal{N}(0,1)\right]\right)^{{N}_{\mathrm{t}}}\right\}\,,$		(84)

where ${N}_{\mathrm{t}}$ is the number of required templates for the coherent search. It can be estimated by

{N}_{\mathrm{t}}={N}_{\mathrm{candidate}}\times\frac{\pi({r}_{\mathrm{NN}})^{2}}{(\delta\theta^{2})_{\mathrm{coh}}}\times{N}_{\mathrm{bin,coh}},

(85)

where ${N}_{\mathrm{bin,coh}}$ is the number of the frequency bins of the coherent search. Using the value listed in Table. 2, we obtain

{N}_{\mathrm{t}}\simeq 5.6\times 10^{21}\left(\frac{{r}_{\mathrm{NN}}}{10^{-3}\mathrm{rad}}\right)^{2}\left(\frac{\mathrm{FAP}_{\mathrm{EP}}}{10^{-2}}\right)\,.

(86)

Because the false alarm probability of the follow-up stage determines that of the entire process, we can approximate it as

{p}_{\mathrm{fa}}(\bm{{\rho}})\simeq\left\{1-\left(\mathrm{Prob}\left[{\rho}_{\mathrm{MF}}<{\hat{\rho}}_{\mathrm{MF}}\mid{\rho}_{\mathrm{MF}}\sim\mathcal{N}(0,1)\right]\right)^{{N}_{\mathrm{t}}}\right\}\,.

(87)

Furthermore, because ${N}_{\mathrm{t}}\gg 1$ , we can approximate

{p}_{\mathrm{fa}}(\bm{{\rho}})\simeq{N}_{\mathrm{t}}\cdot\mathrm{Prob}\left[{\rho}_{\mathrm{MF}}>{\hat{\rho}}_{\mathrm{MF}}\mid{\rho}_{\mathrm{MF}}\sim\mathcal{N}(0,1)\right]\,.

(88)

In this work, the threshold ${\hat{\rho}}_{\mathrm{MF}}$ is chosen so that the false alarm probability of the entire process is 0.01. As shown in Eq. (86), the number of templates depends on $({r}_{\mathrm{NN}},\mathrm{FAP}_{\mathrm{EP}})$ , and the same is true for ${\hat{\rho}}_{\mathrm{MF}}$ .

The false alarm probability of the matched filtering has already been studied in the literature. Therefore, in this work, we check only the validity of the statistical properties of the excess power method. It is computationally difficult to treat a whole signal of a duration ${T}_{\mathrm{obs}}=2^{24}$ sec. Therefore, we generate ${N}_{\mathrm{seg}}$ short noise data of a duration ${T}_{\mathrm{seg}}$ assuming ${S}_{\mathrm{n}}(f)=1$ . After applying a window function, FFT is carried out to each short strain. We pick up a FFT of $k$ -th frequency bin from each FFT data and regard them as $\{{n}^{\mathrm{STFT}}_{j,k}\}_{j=1}^{{N}_{\mathrm{seg}}}$ . We obtain $\mathsf{N}_{\ell,k}$ by taking the Fourier transform of $\{{n}^{\mathrm{STFT}}_{j,k}\}_{j=1}^{{N}_{\mathrm{seg}}}$ and divide it into ${N}_{\mathrm{seg}}/\delta\ell=128$ chunks. After repeating above procedures for 80 times, 10,240 chunks are generated. For each chunk, the excess power statistics $\mathcal{E}$ and SNRs ${\rho}_{\mathrm{EP}}$ are calculated. The histogram of the simulated values of ${\rho}_{\mathrm{EP}}$ is shown in Fig. 4. It seems to match the standard normal distribution. Additionally, we carry out the Kolmogorov-Smirnov test and obtain a p-value of 0.753254. It is numerically confirmed that the SNR of noise data follows the standard normal distribution.

V.3 Detection probability

The detection probability of the signal with an amplitude $h_{0}$ can be estimated by

	$\displaystyle\ {p}_{\mathrm{det}}(h_{0};\bm{{\rho}})$
	$\displaystyle=\left\langle{p}_{\mathrm{det}}(\bm{{\xi}};\bm{{\rho}})\right\rangle_{\vec{\xi}}$
$\displaystyle=$	$\displaystyle\bigg{\langle}\mathrm{Prob}\left[{\rho}_{\mathrm{EP}}>{\hat{\rho}}_{\mathrm{EP}}\mid{\rho}_{\mathrm{EP}}\sim\mathcal{N}({\mu}_{\mathrm{EP}}(\bm{{\xi}}),{\sigma}_{\mathrm{EP}}(\bm{{\xi}}))\right]$
	$\displaystyle\times\mathrm{Prob}\left[\text{source is located in a predicted region}\mid\bm{{\xi}},{r}_{\mathrm{NN}}\right]$
	$\displaystyle\times\mathrm{Prob}\left[{\rho}_{\mathrm{MF}}>{\hat{\rho}}_{\mathrm{MF}}\mid{\rho}_{\mathrm{MF}}\sim\mathcal{N}({\mu}_{\mathrm{MF}}(\bm{{\xi}}),1)\right]\bigg{\rangle}_{\vec{\xi}}\,,$	(89)

where

\langle\cdots\rangle_{\vec{\xi}}:=\int d\vec{\xi}\ (\cdots)\pi(\vec{\xi})\,,

(90)

is the average over the source parameters $\vec{\xi}$ with the probability density function $\pi(\vec{\xi})$ . As explained in Sec. IV.4, the neural network is trained with the waveforms sampled only from the vicinity of the reference grid point and the narrow frequency band. It is envisaged that the trained neural network does not work well for signals outside of the reference patch and the frequency band. Therefore, we only test for the limited parameter region. Correspondingly, the average operation is also taken over such narrow parameter space. To quantify the detection power, the amplitude parameter ${h}^{\mathrm{95\%}}_{0}$ is defined by

{p}_{\mathrm{det}}({h}^{\mathrm{95\%}}_{0};\bm{{\rho}})=0.95\,,

(91)

and correspondingly,

{\hat{h}}^{\mathrm{95\%}}_{0}:={h}^{\mathrm{95\%}}_{0}\left(\frac{{S}_{\mathrm{n}}({f}_{\mathrm{ref}})}{\mathrm{1Hz^{-1}}}\right)^{-1/2}.

(92)

The parameters, $\mathrm{FAP}_{\mathrm{EP}}$ and ${r}_{\mathrm{NN}}$ , are optimized so that ${h}^{\mathrm{95\%}}_{0}$ takes the smallest value under the condition of the computational power.

To explorer the parameter space of ( $\mathrm{FAP}_{\mathrm{EP}},{r}_{\mathrm{NN}}$ ), we place the regular grid on $\log_{10}\mathrm{FAP}_{\mathrm{EP}}$ from $-8$ to $-2$ by a step of 1, and the regular grid on $\log_{10}{r}_{\mathrm{NN}}$ from -4.5 to -3.0 by a step of 0.05. For every pair of $\mathrm{FAP}_{\mathrm{EP}}$ and ${r}_{\mathrm{NN}}$ , we calculate ${\hat{h}}^{\mathrm{95\%}}_{0}$ by the following procedure. First, we place a regular grid on $\log_{10}\hat{h}_{0}$ from -2.3 to -1.0 by a step of 0.05. For one sample of the amplitude, the parameters $\vec{\xi}$ are randomly sampled. The sampled parameters are denoted by $\{\vec{\xi}^{(i)}\}_{i=1}^{M}$ . The waveforms are generated with the sampled parameters. Each waveform is injected into different noise data in the same manner as the method explained in Sec. IV.4. The fraction of the events detected is employed as the estimator of the detection probability of the signal with parameter $(h_{0},\vec{\xi}^{(i)})$ . Repeating these procedure for every sampled parameters $\{\vec{\xi}^{(i)}\}_{i=1}^{M}$ , we obtain the set of the estimated detection probabilities. Then, the detection probability ${p}_{\mathrm{det}}(h_{0};\bm{{\rho}})$ is estimated by

{p}_{\mathrm{det}}(h_{0};\bm{{\rho}})\simeq\frac{1}{M}\sum_{i=1}^{M}{p}_{\mathrm{det}}(h_{0},\vec{\xi}^{(i)};\bm{{\rho}})\,.

(93)

Changing the value of the amplitude $\hat{h}_{0}$ , we get the estimated detection probability as a function of $h_{0}$ for a certain values of $\mathrm{FAP}_{\mathrm{EP}}$ and ${r}_{\mathrm{NN}}$ . If the estimated detection probability exceeds 95% for one or more samples of the amplitude, the obtained detection probabilities are fitted by a sigmoid-like function,

\varsigma(\mathcal{D};a,b)=\frac{1}{1+e^{(\mathcal{D}-a)/b}}\,,

(94)

where

\mathcal{D}:=(\hat{h}_{0})^{-1}\,,

(95)

is called the sensitivity depth, and the parameters $(a,b)$ of a sigmoid function is to be optimized. Using the optimized parameters $(a^{\ast},b^{\ast})$ , the estimated value of ${\mathcal{D}}^{\mathrm{95\%}}:=({\hat{h}}^{\mathrm{95\%}}_{0})^{-1}$ can be obtained as

{\mathcal{D}}^{\mathrm{95\%}}(\mathrm{FAP}_{\mathrm{EP}},{r}_{\mathrm{NN}})=a^{\ast}-b^{\ast}\ln\frac{1-0.95}{0.95}\,,

(96)

which is the inverse of $\varsigma(\mathcal{D})$ . In this work, we set $M=1024$ and the number of noise realization for each parameter set to be 512. Figure. 5 shows an example of the fitting. The estimated values of ${\mathcal{D}}^{\mathrm{95\%}}$ is shown in Fig. 6.

To confirm that the signals with the amplitude ${h}^{\mathrm{95\%}}_{0}$ are detected with 95% detection probability, we perform the injection test. To save the computational cost, we skip the follow-up stage and assume that the detection probability is determined by the excess power selection and the neural network analysis. We only use a short chunk centered at the support of the signal as an injection waveform. Ten thousand chunks with various signal parameters $\vec{\xi}$ are prepared and injected into Gaussian noise data. The waveform model and the noise property are the same as those of the training dataset of the neural network. The excess power is calculated for each chunk, and the neural network analysis is carried out if a chunk is selected as a candidate. Counting the number of detected events, we obtain the recovered value of the detection probability. The procedure shown above is repeated for each combination of $(\mathrm{FAP}_{\mathrm{EP}},{r}_{\mathrm{NN}})$ . Figure. 7 shows the result of the injection test. For all combinations of $(\mathrm{FAP}_{\mathrm{EP}},{r}_{\mathrm{NN}})$ , the detection probabilities are close to 95%. Therefore, our estimation of the detectable amplitude ${\hat{h}}^{\mathrm{95\%}}_{0}$ is convincing.

VI Conclusion

We proposed a new method of an all-sky search for continuous gravitational waves, combining the excess power and the deep learning methods. The time resampling and the STFT are used for localizing the signal into a relatively small number of elements in the whole data. Then, the excess power method selects the candidates of the grid point in the sky and the frequency bins where the signal likely exists. The deep neural network narrows down the region to be explored by the follow-up search by two orders of magnitude than the original area of the sky patch. Before the follow-up coherent search, the heterodyning and the downsampling can reduce the computational cost. We calculated the computational cost of our method. Most of the computational costs are spent by preprocessing the strain data and the follow-up coherent matched filtering search. The computational costs of the excess power method are negligibly small, and the computational time of the neural network can be suppressed to an acceptable level by setting $\mathrm{FAP}_{\mathrm{EP}}\leq 10^{-2}$ . We estimated the detection abilities of our method with the limited setup where the polarization angle, the inclination angle, and the initial phase are fixed and assumed as known parameters. The dataset for training the neural network and testing is generated from a very narrow parameter space of $({f}_{\mathrm{gw}},{\alpha}_{\mathrm{s}},{\delta}_{\mathrm{s}})$ . With a reasonable computational power, the sensitivity depth can be achieved ${\mathcal{D}}^{\mathrm{95\%}}\gtrsim 80$ .

Our training data, which is used for training the neural network, span the restricted parameter region. Namely, the gravitational wave frequencies of the training data are distributed within the small frequency band centered at 100 Hz of width $\pm 1/(2{T}_{\mathrm{seg}})$ and the source locations are sampled from very narrow regions around the fixed grid point. Nevertheless, we can expect our method can be applied to the all-sky search and the frequency band below 100 Hz. If the gravitational wave frequency becomes lower than 100Hz, the strength of the phase modulation becomes weaker (see Eq. (6)). Therefore, even if ${f}_{\mathrm{gw}}<100$ Hz, the signal power in $\ell$ -domain would still be concentrated in a narrow region, and it can be expected that the efficiency of the excess power method is maintained. We can employ a similar discussion also for the dependency on the source location. The power concentration in $\ell$ -domain is still valid even if we take into account the dependency of the source location, while it causes the variation of the signal amplitude. From the above discussion, only slight modifications of the construction of the training data and our neural network structure are enough to apply our strategies to an all-sky search of monochromatic sources having a frequency lower than $\sim 100$ Hz.

In addition to the above points, there are several rooms for improving our method. We fixed various parameters such as the width of the STFT ${T}_{\mathrm{seg}}$ and the length of each chunk $\delta\ell$ in a little hand-waving manner. Surveying and optimizing these parameters may improve the detection efficiency of our method. Especially, the sampling frequency when downsampling might reduce the computational cost significantly. As can be seen from Eq. (48), the deviation $\delta f_{k}$ causes the translation of the signal in the $\ell$ -domain. It is expected that we can further constrain the gravitational wave frequency than $\sim{T}_{\mathrm{seg}}^{-1}$ . Considering this effect, we can set the sampling frequencies of downsampled strains to a lower value than our current choice. This optimization would result in the further reduction of the computational time of the follow-up coherent search.

In the present paper, we assumed that the stationary Gaussian detector noise and 100% duty cycle. We also simplified the waveform model, e.g., the frequency change $df/dt$ is not incorporated. In spite of these simplifications, the obtained results can be regarded as a proof-of-principle and are enough to convince that our method has the potential for improving the all-sky search for continuous gravitational waves with the duration of $O(10^{7})$ sec. Relaxing these assumptions is beyond the scope of this paper and left as future work.

Acknowledgements.

We thank Yousuke Itoh for fruitful discussions. This work was supported by JSPS KAKENHI Grant Number JP17H06358 (and also JP17H06357), A01: Testing gravity theories using gravitational waves, as a part of the innovative research area, “Gravitational wave physics and astronomy: Genesis”, and JP20K03928. Some part of calculation has been performed by using a GeForce 1080Ti GPU at Nagaoka University of Technology.

Appendix A Noise statistics in $\ell$ -domain

In general, the power spectral density of a stochastic process $n(t)$ is defined by

\langle\tilde{n}(f)\tilde{n}^{\ast}(f^{\prime})\rangle=:\frac{1}{2}{S}_{\mathrm{n}}(f)\delta(f-f^{\prime})\,,

(97)

where the Fourier transform of $n(t)$ is defined by

\tilde{n}(f)=\int^{\infty}_{-\infty}dt\ n(t)e^{-2\pi ift}\,,

(98)

while we define the STFT by Eq. (37). Ignoring the effect of the window function, the variance of ${n}^{\mathrm{STFT}}_{j,k}$ can be approximated by

\langle({n}^{\mathrm{STFT}}_{j,k})({n}^{\mathrm{STFT}}_{j^{\prime},k^{\prime}})^{\ast}\rangle=\frac{1}{2{T}_{\mathrm{seg}}}{S}_{\mathrm{n}}(f_{k})\delta_{kk^{\prime}}\delta_{jj^{\prime}}\,.

(99)

Here, we assume that different STFT bins are statistically independent. The variance of $\mathsf{N}_{\ell,k}$ is

	$\displaystyle\langle\mathsf{N}_{\ell,k}\mathsf{N}^{\ast}_{\ell^{\prime},k^{\prime}}\rangle$
	$\displaystyle=\frac{1}{{N}_{\mathrm{seg}}^{2}}\sum_{j=1}^{{N}_{\mathrm{seg}}}\sum_{j^{\prime}=1}^{{N}_{\mathrm{seg}}}\langle({n}^{\mathrm{STFT}}_{j,k})({n}^{\mathrm{STFT}}_{j^{\prime},k^{\prime}})^{\ast}\rangle e^{-2\pi i(j\ell-j^{\prime}\ell^{\prime})/{N}_{\mathrm{seg}}}$
	$\displaystyle=\frac{{S}_{\mathrm{n}}(f_{k})}{2{T}_{\mathrm{seg}}{N}_{\mathrm{seg}}}\delta_{\ell\ell^{\prime}}\delta_{kk^{\prime}}\,.$		(100)

Therefore, we get

\tilde{\sigma}_{k}^{2}=2\langle\mathsf{N}_{\ell,k}\mathsf{N}^{\ast}_{\ell,k}\rangle=\frac{{S}_{\mathrm{n}}(f_{k})}{{T}_{\mathrm{seg}}{N}_{\mathrm{seg}}}\,.

(101)

References

Abbott et al. (2016) B. P. Abbott et al. (LIGO Scientific, Virgo), Phys. Rev. Lett. 116, 131103 (2016), eprint 1602.03838.
Abbott et al. (2019a) B. P. Abbott et al. (LIGO Scientific, Virgo), Phys. Rev. X 9, 031040 (2019a), eprint 1811.12907.
Abbott et al. (2020a) R. Abbott et al. (LIGO Scientific, Virgo) (2020a), eprint 2010.14527.
Aso et al. (2013) Y. Aso, Y. Michimura, K. Somiya, M. Ando, O. Miyakawa, T. Sekiguchi, D. Tatsumi, and H. Yamamoto (KAGRA), Phys. Rev. D 88, 043007 (2013), eprint 1306.6747.
Bala et al. (2011) I. Bala, S. Tarun, U. CS, D. Sanjeev, R. Sendhil, and S. Anand (2011), URL https://dcc.ligo.org/LIGO-M1100296/public.
Abbott et al. (2020b) B. P. Abbott et al. (KAGRA, LIGO Scientific, Virgo), Living Rev. Rel. 23, 3 (2020b).
Abbott et al. (2019b) B. P. Abbott et al. (LIGO Scientific, Virgo), Phys. Rev. X 9, 011001 (2019b), eprint 1805.11579.
Abbott et al. (2020c) R. Abbott et al. (LIGO Scientific, Virgo), Astrophys. J. Lett. 900, L13 (2020c), eprint 2009.01190.
Abbott et al. (2020d) R. Abbott et al. (LIGO Scientific, Virgo), Population Properties of Compact Objects from the Second LIGO-Virgo Gravitational-Wave Transient Catalog (2020d), eprint 2010.14533.
Will (2014) C. M. Will, Living Rev. Rel. 17, 4 (2014), eprint 1403.7377.
Abbott et al. (2019c) B. P. Abbott et al. (LIGO Scientific, Virgo), Phys. Rev. D 100, 104036 (2019c), eprint 1903.04467.
Abbott et al. (2020e) R. Abbott et al. (LIGO Scientific, Virgo) (2020e), eprint 2010.14529.
Maggiore (2000) M. Maggiore, Phys. Rept. 331, 283 (2000), eprint gr-qc/9909001.
Sathyaprakash and Schutz (2009) B. S. Sathyaprakash and B. F. Schutz, Living Rev. Rel. 12, 2 (2009), eprint 0903.0338.
Maggiore (2007) M. Maggiore, Gravitational Waves. Vol. 1: Theory and Experiments, Oxford Master Series in Physics (Oxford University Press, 2007), ISBN 978-0-19-857074-5, 978-0-19-852074-0.
Creighton and Anderson (2011) J. D. E. Creighton and W. G. Anderson, Gravitational-wave physics and astronomy: An introduction to theory, experiment and data analysis (2011).
Zhu et al. (2020) S. J. Zhu, M. Baryakhtar, M. A. Papa, D. Tsuna, N. Kawanaka, and H.-B. Eggenstein, Phys. Rev. D 102, 063020 (2020), eprint 2003.03359.
Arvanitaki et al. (2015) A. Arvanitaki, M. Baryakhtar, and X. Huang, Phys. Rev. D 91, 084011 (2015), eprint 1411.2263.
Brito et al. (2017) R. Brito, S. Ghosh, E. Barausse, E. Berti, V. Cardoso, I. Dvorkin, A. Klein, and P. Pani, Phys. Rev. D 96, 064050 (2017), eprint 1706.06311.
Jaranowski et al. (1998) P. Jaranowski, A. Krolak, and B. F. Schutz, Phys. Rev. D 58, 063001 (1998), eprint gr-qc/9804014.
Krishnan et al. (2004) B. Krishnan, A. M. Sintes, M. A. Papa, B. F. Schutz, S. Frasca, and C. Palomba, Phys. Rev. D 70, 082001 (2004), eprint gr-qc/0407001.
Astone et al. (2014) P. Astone, A. Colla, S. D’Antonio, S. Frasca, and C. Palomba, Phys. Rev. D 90, 042002 (2014), eprint 1407.8333.
Abbott et al. (2017) B. P. Abbott et al. (LIGO Scientific, Virgo), Phys. Rev. D 96, 062002 (2017), eprint 1707.02667.
Abbott et al. (2018) B. P. Abbott et al. (LIGO Scientific, Virgo), Phys. Rev. D 97, 102003 (2018), eprint 1802.05241.
Abbott et al. (2019d) B. P. Abbott et al. (LIGO Scientific, Virgo), Phys. Rev. D 100, 024004 (2019d), eprint 1903.01901.
George and Huerta (2018) D. George and E. A. Huerta, Phys. Rev. D 97, 044039 (2018), eprint 1701.00008.
Gabbard et al. (2019) H. Gabbard, C. Messenger, I. S. Heng, F. Tonolini, and R. Murray-Smith (2019), eprint 1909.06296.
Chua and Vallisneri (2020) A. J. K. Chua and M. Vallisneri, Phys. Rev. Lett. 124, 041102 (2020), eprint 1909.05966.
Nakano et al. (2019) H. Nakano, T. Narikawa, K.-i. Oohara, K. Sakai, H.-a. Shinkai, H. Takahashi, T. Tanaka, N. Uchikata, S. Yamamoto, and T. S. Yamamoto, Phys. Rev. D 99, 124032 (2019), eprint 1811.06443.
Yamamoto and Tanaka (2020) T. S. Yamamoto and T. Tanaka (2020), eprint 2002.12095.
George et al. (2018) D. George, H. Shen, and E. A. Huerta, Phys. Rev. D 97, 101501(R) (2018).
Chua et al. (2020) A. J. K. Chua, M. L. Katz, N. Warburton, and S. A. Hughes (2020), eprint 2008.06071.
Dreissigacker et al. (2019) C. Dreissigacker, R. Sharma, C. Messenger, R. Zhao, and R. Prix, Phys. Rev. D 100, 044009 (2019), eprint 1904.13291.
Dreissigacker and Prix (2020) C. Dreissigacker and R. Prix, Phys. Rev. D 102, 022005 (2020), eprint 2005.04140.
Morawski et al. (2019) F. Morawski, M. Bejger, and P. Ciecieląg (2019), eprint 1907.06917.
Beheshtipour and Papa (2020) B. Beheshtipour and M. A. Papa, Phys. Rev. D 101, 064009 (2020), eprint 2001.03116.
Nakano et al. (2003) H. Nakano, H. Takahashi, H. Tagoshi, and M. Sasaki, Phys. Rev. D 68, 102003 (2003), eprint gr-qc/0306082.
Anderson et al. (2001) W. G. Anderson, P. R. Brady, J. D. E. Creighton, and E. E. Flanagan, Phys. Rev. D 63, 042003 (2001), eprint gr-qc/0008066.
Goodfellow et al. (2016) I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning (MIT Press, 2016), http://www.deeplearningbook.org.
Jarrett et al. (2009) K. Jarrett, K. Kavukcuoglu, M. Ranzato, and Y. LeCun, in 2009 IEEE 12th International Conference on Computer Vision (ICCV) (IEEE Computer Society, Los Alamitos, CA, USA, 2009), pp. 2146–2153, ISSN 1550-5499, URL https://doi.ieeecomputersociety.org/10.1109/ICCV.2009.5459469.
LeCun and Bengio (1998) Y. LeCun and Y. Bengio, in The Handbook of Brain Theory and Neural Networks (MIT Press, Cambridge, MA, USA, 1998), p. 255–258, ISBN 0262511029.
Zhou and Chellappa (1988) Y. Zhou and R. Chellappa, in IEEE 1988 International Conference on Neural Networks (1988), pp. 71–78.
Polyak (1964) B. T. Polyak, USSR Computational Mathematics and Mathematical Physics 4, 1 (1964), ISSN 0041-5553, URL http://www.sciencedirect.com/science/article/pii/0041555364901375.
Hinton (2012) G. Hinton, Neural networks for machine learning (2012).
Kingma and Ba (2017) D. P. Kingma and J. Ba (2017), eprint 1412.6980.
Rumelhart et al. (1986) D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Nature 323, 533 (1986), ISSN 1476-4687, URL https://doi.org/10.1038/323533a0.
Paszke et al. (2019) A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al., in Advances in Neural Information Processing Systems 32, edited by H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Curran Associates, Inc., 2019), pp. 8024–8035.
Dupuis and Woan (2005) R. J. Dupuis and G. Woan, Phys. Rev. D 72, 102002 (2005), eprint gr-qc/0508096.
Wette (2012) K. Wette, Phys. Rev. D 85, 042003 (2012), eprint 1111.5650.
Dreissigacker et al. (2018) C. Dreissigacker, R. Prix, and K. Wette, Phys. Rev. D 98, 084058 (2018), eprint 1808.02459.

Use of Excess Power Method and Convolutional Neural Network in All-Sky Search for Continuous Gravitational Waves

Abstract

I Introduction

II Waveform model

III Coherent search method

IV Our Method

IV.1 Subtracting the effect due to the Earth’s rotation

IV.2 Modeling the effect due to the Earth’s orbital motion

IV.3 Excess power method for finding candidates

IV.4 Neural network for localizing

IV.4.1 fundamentals

IV.4.2 setup in our analysis

IV.5 Follow-up analysis by coherent matched filtering

V Results

V.1 Computational cost

V.2 False alarm probability

V.3 Detection probability

VI Conclusion

Acknowledgements.

Appendix A Noise statistics in ℓ\ell-domain

References

Appendix A Noise statistics in $\ell$ -domain