This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Use of Excess Power Method and Convolutional Neural Network in All-Sky Search for Continuous Gravitational Waves

Takahiro S. Yamamoto Department of Physics, Kyoto University, Kyoto 606-8502, Japan    Takahiro Tanaka Department of Physics, Kyoto University, Kyoto 606-8502, Japan Center for Gravitational Physics, Yukawa Institute for Theoretical Physics, Kyoto University, Kyoto 606-8502, Japan
Abstract

The signal of continuous gravitational waves has a longer duration than the observation period. Even if the waveform in the source frame is monochromatic, we will observe the waveform with modulated frequencies due to the motion of the detector. If the source location is unknown, a lot of templates having different sky positions are required to demodulate the frequency, and the required huge computational cost restricts the applicable parameter region of coherent search. In this work, we propose and examine a new method to select candidates, which reduces the cost of coherent search by following-up only the selected candidates. As a first step, we consider an idealized situation in which only a single-detector having 100% duty cycle is available and its detector noise is approximated by the stationary Gaussian noise. Also, we assume the signal has no spindown and the polarization angle, the inclination angle, and the initial phase are fixed to be ψ=0\psi=0, cosι=1\cos\iota=1, and ϕ0=0\phi_{0}=0, and they are treated as known parameters. We combine several methods: 1) the short-time Fourier transform with the re-sampled data such that the Earth motion for the source is canceled in some reference direction, 2) the excess power search in the Fourier transform of the time series obtained by picking up the amplitude in a particular frequency bin from the short-time Fourier transform data, and 3) the deep learning method to further constrain the source sky position. The computational cost and the detection probability are estimated. The injection test is carried out to check the validity of the detection probability. We find that our method is worthy of further study for analyzing O(107)O(10^{7})sec strain data.

I Introduction

Advanced LIGO and Advanced Virgo detected the first event of gravitational waves from a binary black hole merger in 2015 Abbott et al. (2016). After the three observation runs, a lot of binary coalescence events are found Abbott et al. (2019a, 2020a). In addition to Advanced LIGO and Advanced Virgo, KAGRA Aso et al. (2013) and LIGO India Bala et al. (2011) are planning to join the gravitational wave detector network Abbott et al. (2020b). The gravitational wave astronomy is expected to get fruitful results for improving our understanding of the astronomical properties of compact objects Abbott et al. (2019b, 2020c, 2020d), the true nature of gravity Will (2014); Abbott et al. (2019c, 2020e), the origin of the Universe Maggiore (2000) and so on (see Sathyaprakash and Schutz (2009) for a review).

All gravitational wave signals which are detected so far have duration O(1002)O(10^{0-2}) sec, which is much shorter than the observation period. By contrast, we also expect gravitational waves which last longer than the observation period. Such long-lived gravitational waves are called continuous gravitational waves (see Maggiore (2007); Creighton and Anderson (2011) as textbooks). Continuous gravitational waves are defined by the following three properties: 1) small change rate of the amplitude, 2) almost constant fundamental frequency, and 3) duration longer than the observation period. Rotating anisotropic neutron stars are typical candidate sources of continuous gravitational waves. In addition, there are several exotic objects proposed as possible candidates of the sources of continuous gravitational waves (Zhu et al. (2020); Arvanitaki et al. (2015); Brito et al. (2017)).

Continuous gravitational waves are modeled by simpler waveforms than those of coalescing binaries. The parameters characterizing a typical waveform are the amplitude, the initial frequency, and the frequency derivatives with time. Although the waveform generated by the source is analytically simple, the effect of the detector’s motion makes the data analysis for continuous gravitational waves challenging. The detector’s motion causes the modulation in the frequency, and resulting in the dispersion of the power in the frequency domain. If the source location is a priori known by electromagnetic observations, the modulation can be removed precisely enough. By contrast, for the unknown target search, we need to correlate the data with a tremendous amount of templates to cover the unknown source location on the sky. This severely restricts the applicability of the all-sky coherent search to strain data of long durations. Therefore, semi-coherent methods, in which the strain data is divided into a set of segments and statistics calculated for respective segments are summed up appropriately, are often used. Various semi-coherent methods (e.g. Time-Domain \mathcal{F}-statistic Jaranowski et al. (1998), SkyHough Krishnan et al. (2004), FrequencyHough Astone et al. (2014)) were proposed so far, and they are actually used to analyze LIGO and Virgo’s data. Despite tremendous efforts, up to now no continuous gravitational wave event is detected Abbott et al. (2017, 2018, 2019d).

As another trend of the research, the deep learning method is introduced to the field of gravitational wave data analysis. After the pioneering work done by George & Huerta George and Huerta (2018), there are many proposals to use deep learning for wide purposes, e.g., parameter estimation for binary coalescence Gabbard et al. (2019); Chua and Vallisneri (2020); Nakano et al. (2019); Yamamoto and Tanaka (2020), noise classification George et al. (2018) and waveform modeling Chua et al. (2020). As for applications to the search for continuous gravitational waves, several groups already proposed deep learning methods. Dreissigacker et al. Dreissigacker et al. (2019); Dreissigacker and Prix (2020) applied neural networks to all-sky searches of signals with the duration 10510^{5} sec and 10610^{6} sec. They used Fourier transformed strain as inputs. Their methods can be applied to the signal located in broad frequency bands and to the case of multiple detectors and realistic noise. Also, it is shown that the synergies between the deep learning and standard methods or other machine learning techniques are also powerful Morawski et al. (2019); Beheshtipour and Papa (2020).

In this paper, we propose a new method designed for detecting monochromatic waves, combining several transformations and the deep learning method. In Sec. II, the waveform model and some assumptions are introduced. The coherent matched filtering and the time resampling technique are briefly reviewed in Sec. III. In Sec. IV, we explain our strategy that combines several traditional methods such as the resampling, the short-time Fourier transform, and the excess power search with the deep learning method. We show the results of the assessment of the performance of our new method in Sec. V. Sec. VI is devoted to the conclusion.

II Waveform model

We consider a monochromatic gravitational wave. We denote by fgwf_{\mathrm{gw}} its frequency constant in time. With the assumption that the source is at rest with respect to the solar system barycenter (SSB), a complex-valued gravitational waveform in the source frame hsource(τ){h}^{\mathrm{source}}(\tau) will be simply written as

hsource(τ)=h0e2πifgwτ+iϕ0,{h}^{\mathrm{source}}(\tau)=h_{0}e^{2\pi if_{\mathrm{gw}}\tau+i\phi_{0}}\,, (1)

where τ\tau is called SSB time, h0h_{0} and ϕ0\phi_{0} are the amplitude and the initial phase, respectively. In this work, for simplicity, we assume that

ϕ0=0,\phi_{0}=0\,, (2)

and regard it as a known parameter. The phase of a gravitational wave is modulated due to the detector motion and the modulation depends on the source location. The normal vector pointing from the Earth’s center to the sky position specified by a right ascension α\alpha and a declination angle δ\delta is defined by

𝒏(α,δ)=(1000cosϵsinϵ0sinϵcosϵ)(cosαcosδsinαcosδsinδ),\bm{{n}}(\alpha,\delta)=\left(\begin{matrix}1&0&0\\ 0&\cos\epsilon&\sin\epsilon\\ 0&-\sin\epsilon&\cos\epsilon\end{matrix}\right)\left(\begin{matrix}\cos\alpha\cos\delta\\ \sin\alpha\cos\delta\\ \sin\delta\end{matrix}\right)\,, (3)

with the tilt angle between the Earth’s rotation axis and the orbital angular momentum ϵ\epsilon. Here, we work in the SSB frame, in which the zz-axis is along the Earth’s orbital angular momentum and the xx-axis points towards the vernal equinox. Defining the detector time tt so as to satisfy

τ=t+𝒓(t)𝒏(αs,δs)c,\tau=t+\frac{\bm{{r}}(t)\cdot\bm{{n}}({\alpha}_{\mathrm{s}},{\delta}_{\mathrm{s}})}{c}\,, (4)

we obtain the waveform in the detector frame

h(t):=h0eiΦ(t),h(t):=h_{0}e^{i\Phi(t)}\,, (5)

with

Φ(t)=2πfgwt+2πfgw𝒓(t)𝒏(αs,δs)c.\Phi(t)=2\pi f_{\mathrm{gw}}t+2\pi f_{\mathrm{gw}}\frac{\bm{{r}}(t)\cdot\bm{{n}}(\alpha_{\mathrm{s}},\delta_{\mathrm{s}})}{c}\,. (6)

A subscript “s” indicates the quantity related to the gravitational wave source. Namely, (αs,δs)(\alpha_{\mathrm{s}},\delta_{\mathrm{s}}) means the sky position of the source. In the following, we use the notation 𝒏s:=𝒏(αs,δs)\bm{{n}}_{\mathrm{s}}:=\bm{{n}}(\alpha_{\mathrm{s}},\delta_{\mathrm{s}}).

For the modeling of the detector motion, we adopt a little simplification, which we believe will not affect our main result. We assume that the position vector of the detector can be written by a sum of the Earth’s rotation part 𝒓(t)\bm{{r}}_{\oplus}(t), and the Earth’s orbital motion part 𝒓(t)\bm{{r}}_{\odot}(t). The Earth is assumed to take a circular orbit on xyxy-plane. Then, we can write 𝒓(t)\bm{{r}}_{\odot}(t) as

𝒓(t)=RES(cos(φ+Ωt)sin(φ+Ωt)0),\bm{{r}}_{\odot}(t)=R_{\mathrm{ES}}\left(\begin{matrix}\cos(\varphi_{\odot}+\Omega_{\odot}t)\\ \sin(\varphi_{\odot}+\Omega_{\odot}t)\\ 0\end{matrix}\right)\,, (7)

where RESR_{\mathrm{ES}}, Ω\Omega_{\odot} and φ\varphi_{\odot} are the distance between the Earth and the Sun, the angular velocity of the orbital motion and the initial phase, respectively. The detector motion due to the Earth’s rotation can be described as

𝒓(t)=RE(1000cosϵsinϵ0sinϵcosϵ)(cosλcos(φ+Ωt)cosλsin(φ+Ωt)sinλ),\bm{{r}}_{\oplus}(t)=R_{\mathrm{E}}\left(\begin{matrix}1&0&0\\ 0&\cos\epsilon&\sin\epsilon\\ 0&-\sin\epsilon&\cos\epsilon\end{matrix}\right)\left(\begin{matrix}\cos\lambda\cos(\varphi_{\oplus}+\Omega_{\oplus}t)\\ \cos\lambda\sin(\varphi_{\oplus}+\Omega_{\oplus}t)\\ \sin\lambda\end{matrix}\right)\,, (8)

where RER_{\mathrm{E}}, λ\lambda, Ω\Omega_{\oplus} and φ\varphi_{\oplus} are the radius of the Earth, the latitude of the detector, the angular velocity of the Earth’s rotation and the initial phase, respectively. The modulated phase Φ(t)\Phi(t) can be decomposed into

Φ(t)=2πfgwt+Φ(t)+Φ(t),\Phi(t)=2\pi{f}_{\mathrm{gw}}t+\Phi_{\oplus}(t)+\Phi_{\odot}(t), (9)

where

Φ(t)\displaystyle\Phi_{\oplus}(t) =2πfgw𝒓(t)𝒏sc,\displaystyle=2\pi{f}_{\mathrm{gw}}\frac{\bm{{r}}_{\oplus}(t)\cdot{\bm{{n}}}_{\mathrm{s}}}{c}, (10)
Φ(t)\displaystyle\Phi_{\odot}(t) =2πfgw𝒓(t)𝒏sc.\displaystyle=2\pi{f}_{\mathrm{gw}}\frac{\bm{{r}}_{\odot}(t)\cdot{\bm{{n}}}_{\mathrm{s}}}{c}. (11)

Finally, we take into account the amplitude modulation due to the detector’s motion, which can be described by the antenna pattern function. In this work, the polarization angle and the inclination angle are, respectively, assumed to be

ψ=0,cosι=1,\psi=0\,,\qquad\cos\iota=1\,, (12)

and, similarly to ϕ0\phi_{0}, they are treated as known parameters. Then, the gravitational wave to be observed by a detector can be written as

hobs(t)=G(t)h(t)+G(t)h,{h}_{\mathrm{obs}}(t)=G(t)h(t)+G^{\ast}(t)h^{\ast}\,, (13)

with

G(t):=F+(t)+iF×(t)2.G(t):=\frac{F_{+}(t)+iF_{\times}(t)}{2}\,. (14)

The definitions of F+(t)F_{+}(t) and F×(t)F_{\times}(t) are the same as those used in Jaranowski et al., Jaranowski et al. (1998). In this work, the antenna pattern function of LIGO Hanford is employed. The strain data is written as

s(t)=hobs(t)+n(t),s(t)={h}_{\mathrm{obs}}(t)+n(t)\,, (15)

where n(t)n(t) is the detector noise. We assume that the strain data from the detector has no gaps in time and the detector noise is stationary and Gaussian.

III Coherent search method

Before explaining our method, we briefly review the coherent search method and the time resampling technique Jaranowski et al. (1998).

If the expected waveforms can be modeled precisely and the noise is Gaussian, the matched filtering is the optimal method for the detection and the parameter estimation, besides the computational cost. The noise weighted inner product is defined by

(a|b):=4Re[0𝑑fa~(f)b~(f)Sn(f)],(a|b):=4\mathrm{Re}\left[\int^{\infty}_{0}df\ \frac{\tilde{a}(f)\tilde{b}^{\ast}(f)}{{S}_{\mathrm{n}}(f)}\right]\,, (16)

where Sn(f){S}_{\mathrm{n}}(f) is the power spectral density of the detector noise. A signal-to-noise ratio (SNR) can be calculated with the inner product between a strain s(t)s(t) and a template htemp(t){h}_{\mathrm{temp}}(t) as

ρMF:=(s|htemp)(htemp|htemp).{\rho}_{\mathrm{MF}}:=\frac{(s|{h}_{\mathrm{temp}})}{\sqrt{({h}_{\mathrm{temp}}|{h}_{\mathrm{temp}})}}\,. (17)

Theoretically predicted waveforms htemp(t){h}_{\mathrm{temp}}(t) have various parameters characterizing the source properties and the geometrical information. A set of waveforms having different parameters is called a template bank. For each template in a template bank, we can assign the value of SNR calculated by Eq. (17). If the maximum value of SNR in the template set exceeds a threshold value, it is a sign that an actual signal may exist and the parameter inference is also obtained from the distribution of SNR.

Due to a long duration and a narrow frequency band of continuous gravitational waves, the inner product (16) can be recast into the time-domain expression as

(a|b)2Sn(fgw)Re[0Tobs𝑑ta(t)b(t)],(a|b)\simeq\frac{2}{{S}_{\mathrm{n}}({f}_{\mathrm{gw}})}\mathrm{Re}\left[\int^{{T}_{\mathrm{obs}}}_{0}dt\ a(t)b^{\ast}(t)\right]\,, (18)

where Tobs{T}_{\mathrm{obs}} is the observation time. For a monochromatic source, the waveform can be modeled by Eq. (1). The modulation due to the detector motion is only in the phase of the waveform. The time resampling technique nullifies the phase modulation by redefining the time coordinate. If the position of the source is a priori known, the exact relation (4) can be obtained. Therefore, the phase modulation can be completely removed and the monochromatic waveform is applicable to the time-domain matched filtering (18). Calculation of the inner product (18) between the resampled signal and a monochromatic waveform is equivalent to the Fourier transform. Thus, the fast algorithm (i.e., Fast Fourier transform) can be employed to rapidly search the gravitational wave frequency, fgw{f}_{\mathrm{gw}}.

When the source location is unknown, we need to search all-sky by placing a set of grid points {𝒏g(i)}i=1Ngrid\{{\bm{{n}}}_{\mathrm{g}}^{(i)}\}_{i=1}^{{N}_{\mathrm{grid}}} to keep the maximum loss of SNR within the acceptable range. For each grid point, we carry out the Fourier transform after the transformation

ζ:=t+𝒓(t)𝒏gc.\zeta:=t+\frac{\bm{{r}}(t)\cdot{\bm{{n}}}_{\mathrm{g}}}{c}\,. (19)

Here, we omit the superscript (i)(i), for brevity. The necessary number of grid points Ngrid{N}_{\mathrm{grid}} can be estimated by the angular resolution of gravitational wave sources. The angular resolution of gravitational wave sources can be roughly given by the ratio between the wavelength of the gravitational wave and the diameter of the Earth’s orbit, i.e.,

(δθ)cohλgw2RES105 [rad].{(\delta\theta)}_{\mathrm{coh}}\sim\frac{\lambda_{\mathrm{gw}}}{2R_{\mathrm{ES}}}\sim 10^{-5}\text{ [rad]}\,. (20)

Here, we adopt 100Hz as the fiducial value for cλgw1c{\lambda}_{\mathrm{gw}}^{-1}. Thus, the required number of grid points is, at least,

Ngrids4π(δθ)coh21.3×1011.N_{\mathrm{grids}}\sim\frac{4\pi}{{(\delta\theta)}_{\mathrm{coh}}^{2}}\sim 1.3\times 10^{11}\,. (21)

The time resampling and the Fourier transform are applied to each grid point. The number of floating point operations required for carrying out FFT is 1.7×1012\sim 1.7\times 10^{12} per grid point, with the signal of a duration 10710^{7} sec and a sampling frequency 1024 Hz. Even if we have a 1PFlops machine, the computational time becomes 2.2×1082.2\times 10^{8} sec, which is longer than the signal duration. For this reason, the coherent all-sky search is not realistic even for monochromatic sources yet.

IV Our Method

IV.1 Subtracting the effect due to the Earth’s rotation

As stated in Sec. III, the time resampling technique can demodulate the phase, but complete demodulation is not available because of the limitation of computational resources. In our work, the time resampling technique is employed to eliminate only the effect caused by the Earth’s diurnal rotation, Φ(t)\Phi_{\oplus}(t). Assuming a representative grid point 𝒏g\bm{{n}}_{\mathrm{g}}, we can rewrite the phase (6) as

Φ(t)\displaystyle\Phi(t) =2πfgwt+Φ(t)+Φ(t)\displaystyle=2\pi{f}_{\mathrm{gw}}t+\Phi_{\oplus}(t)+\Phi_{\odot}(t)
=2πfgwζ+δΦ(t)+δΦ(t),\displaystyle=2\pi{f}_{\mathrm{gw}}\zeta+\delta\Phi_{\oplus}(t)+\delta\Phi_{\odot}(t)\,, (22)

where

δΦ(t)\displaystyle\delta\Phi_{\oplus}(t) :=2πfgw𝒓(t)Δ𝒏c,\displaystyle:=2\pi f_{\mathrm{gw}}\frac{\bm{{r}}_{\oplus}(t)\cdot\Delta\bm{{n}}}{c}\,, (23)
δΦ(t)\displaystyle\delta\Phi_{\odot}(t) :=2πfgw𝒓(t)Δ𝒏c,\displaystyle:=2\pi f_{\mathrm{gw}}\frac{\bm{{r}}_{\odot}(t)\cdot\Delta\bm{{n}}}{c}\,, (24)

and Δ𝒏:=𝒏s𝒏g\Delta\bm{{n}}:=\bm{{n}}_{\mathrm{s}}-\bm{{n}}_{\mathrm{g}}. Since the residual phase varies with time, we will place grid points so that the amplitude of the residual phase in the worst case, i.e.,

min𝒏gmaxt|δΦ(t)|\min_{\bm{{n}}_{\mathrm{g}}}\max_{t}|\delta\Phi_{\oplus}(t)|

becomes smaller than a threshold δΦϵ\delta\Phi_{\epsilon} for any source direction 𝒏s\bm{{n}}_{\mathrm{s}} within the area covered by the grid point 𝒏g{\bm{{n}}}_{\mathrm{g}}. To optimize the grid placement, we employ the method proposed in Ref. Nakano et al. (2003). The residual phase δΦ\delta\Phi_{\oplus} is expanded up to the first order of Δα:=αsαg\Delta\alpha:=\alpha_{\mathrm{s}}-\alpha_{\mathrm{g}} and Δδ:=δsδg\Delta\delta:=\delta_{\mathrm{s}}-\delta_{\mathrm{g}}. Then, we get

δΦ\displaystyle\delta\Phi_{\oplus}\simeq 2πfgwcREcosλ{Δδsinδgcos(αgφΩt)\displaystyle\frac{2\pi{f}_{\mathrm{gw}}}{c}R_{\text{E}}\cos\lambda\left\{-\Delta\delta\sin\delta_{\mathrm{g}}\cos(\alpha_{\mathrm{g}}-\varphi_{\oplus}-\Omega_{\oplus}t)\right.
Δαcosδgsin(αgφΩt)}.\displaystyle\left.-\Delta\alpha\cos\delta_{\mathrm{g}}\sin(\alpha_{\mathrm{g}}-\varphi_{\oplus}-\Omega_{\oplus}t)\right\}\,. (25)

Here, the constant term is neglected because it degenerates with the initial phase ϕ0\phi_{0}. The maximum value of the residual phase is

maxt|δΦ|=\displaystyle\max_{t}|\delta\Phi_{\oplus}|= 2πfgwcRE|cosλ|\displaystyle\frac{2\pi{f}_{\mathrm{gw}}}{c}R_{\text{E}}|\cos\lambda|
×(Δδ)2sin2δg+(Δα)2cos2δg.\displaystyle\times\sqrt{(\Delta\delta)^{2}\sin^{2}{\delta}_{\mathrm{g}}+(\Delta\alpha)^{2}\cos^{2}{\delta}_{\mathrm{g}}}\,. (26)

The grid points are to be determined to satisfy maxt|δΦ|<δΦϵ\max_{t}|\delta\Phi_{\oplus}|<\delta\Phi_{\epsilon} for any source direction.

Because the residual phase (26) is symmetric under the transformation δgδg{\delta}_{\mathrm{g}}\to-{\delta}_{\mathrm{g}}, the placement of grids on the negative δ\delta side can be generated by inverting the sign of the grids on the positive δ\delta side. Therefore, we focus on the case with 0δπ/20\leq\delta\leq\pi/2.

Since the residual phase depends only on δ\delta at δg=π/2\delta_{\mathrm{g}}=\pi/2, a single template can cover the neighbor of δ=π/2\delta=\pi/2. In fact, at δ=π/2\delta=\pi/2, Eq. (26) becomes

maxt|δΦ|=2πfgwcRE|Δδ|cosλ.\max_{t}|\delta\Phi_{\oplus}|=\frac{2\pi{f}_{\mathrm{gw}}}{c}R_{\text{E}}|\Delta\delta|\cos\lambda\,. (27)

Therefore, the condition maxt|δΦ|δΦϵ\max_{t}|\delta\Phi_{\oplus}|\leq\delta\Phi_{\epsilon} gives the lower bound of δ1\delta_{1} such that the region δ1δπ/2\delta_{1}\leq\delta\leq\pi/2 can be covered by a signle patch represented by {(αg,δg)=(0,π/2)}\{({\alpha}_{\mathrm{g}},{\delta}_{\mathrm{g}})=(0,\pi/2)\}, to find

δ1:=π2δΦϵ×c2πfgw1REcosλ.\delta_{1}:=\frac{\pi}{2}-\delta\Phi_{\epsilon}\times\frac{c}{2\pi{f}_{\mathrm{gw}}}\frac{1}{R_{\text{E}}\cos\lambda}\,. (28)

Plural patches are necessary to cover the strip of a constant δ\delta in the other range. We introduce a 2-dimensional metric corresponding to the residual phase (26),

dσ2=cos2δdα2+sin2δdδ2.d\sigma^{2}=\cos^{2}\delta d\alpha^{2}+\sin^{2}\delta d\delta^{2}\,. (29)

In general, a metric in a 2-dimensional manifold can be transformed into a conformally flat metric by an appropriate coordinate transformation. When the space is conformally flat, the curve of a small constant distance measured from an arbitrary chosen point can be approximated by a circle. Therefore, a template spacing in the 2-dimensional parameter space becomes relatively easy. By defining new variables X:=αX:=\alpha and Y:=log|cosδ|Y:=-\log|\cos\delta|, the metric can be transformed into

dσ2=e2Y(dX2+dY2).d\sigma^{2}=e^{-2Y}(dX^{2}+dY^{2})\,. (30)

Along with Nakano et al. (2003), we can construct the sky patches covering the half-sky region with 0δδ10\leq\delta\leq\delta_{1}. Figure 1 shows a part of grid points constructed under the condition

δΦϵ=0.058,\delta\Phi_{\epsilon}=0.058\,, (31)

which we adopt throughout this paper. The total number of grid points to cover the whole sky is

Ngrid=352,436,N_{\rm grid}=352,436\,, (32)

for fgwf_{\mathrm{gw}} = 100Hz.

Refer to caption
Figure 1: Grid point placement on a fraction of (α,cosδ)(\alpha,\cos\delta)-plane. Blue dots are grid points and orange contours show the maxt|δΦ(t)|=δΦϵ\max_{t}|\delta\Phi_{\oplus}(t)|=\delta\Phi_{\epsilon} contours for each grid point. The region {(α,δ)|δ<δ1}\{(\alpha,\delta)|\delta<\delta_{1}\} is covered by a single template (αg,δg)=(0,π/2)(\alpha_{\mathrm{g}},\delta_{\mathrm{g}})=(0,\pi/2) and the shape of the patch is square on this plane.

IV.2 Modeling the effect due to the Earth’s orbital motion

As we choose δΦϵ\delta\Phi_{\epsilon} to be sufficiently small, we neglect δΦ\delta\Phi_{\oplus} in the following discussion. Then, after subtracting the phase modulation due to the Earth’s rotation, the phase of the gravitational wave (22) becomes

Φ(t)=2πfgwζ+δΦ(t).\Phi(t)=2\pi{f}_{\mathrm{gw}}\zeta+\delta\Phi_{\odot}(t). (33)

We apply the short-time Fourier transform (STFT) to the time-resampled strain,

s(ζ)=hobs(ζ)+n(ζ).s(\zeta)={h}_{\mathrm{obs}}(\zeta)+n(\zeta). (34)

In the rest of the paper, we treat only the time-resampled data. Therefore, without confusion, the time-resampled data in Eq. (34) can be denoted by the same character as the original one. The strain is divided into Nseg{N}_{\mathrm{seg}} segments having the duration Tseg{T}_{\mathrm{seg}} and their start times are denoted by ζj:=jTslide,(j=0,1,,Nseg1)\zeta_{j}:=jT_{\mathrm{slide}},\ (j=0,1,\cdots,{N}_{\mathrm{seg}}-1). TslideT_{\mathrm{slide}} is not necessary to be equal to TsegT_{\mathrm{seg}}. The output of STFT with the window function w(ζ)w(\zeta) is defined by

sj,kSTFT=hj,kSTFT+nj,kSTFT,s^{\mathrm{STFT}}_{j,k}=h^{\mathrm{STFT}}_{j,k}+n^{\mathrm{STFT}}_{j,k}\,, (35)

where

hj,kSTFT=1Tsegζjζj+Tseg𝑑ζw(ζζj)hobs(ζ)e2πifkζ,h^{\mathrm{STFT}}_{j,k}=\frac{1}{T_{\mathrm{seg}}}\int^{\zeta_{j}+T_{\mathrm{seg}}}_{\zeta_{j}}d\zeta^{\prime}\ w(\zeta^{\prime}-\zeta_{j}){h}_{\mathrm{obs}}(\zeta^{\prime})e^{-2\pi if_{k}\zeta^{\prime}}\,, (36)
nj,kSTFT=1Tsegζjζj+Tseg𝑑ζw(ζζj)n(ζ)e2πifkζ,n^{\mathrm{STFT}}_{j,k}=\frac{1}{T_{\mathrm{seg}}}\int^{\zeta_{j}+T_{\mathrm{seg}}}_{\zeta_{j}}d\zeta^{\prime}\ w(\zeta^{\prime}-\zeta_{j})n(\zeta^{\prime})e^{-2\pi if_{k}\zeta^{\prime}}\,, (37)

and fk:=kΔf=k/Tsegf_{k}:=k\Delta f=k/T_{\mathrm{seg}} is the frequency of the kk-th element of STFT. Let us focus on the positive frequency modes, i.e., fk>0f_{k}>0. Then, the second term of Eq. (13) can be neglected and Eq. (36) can be approximated by

hj,kSTFT\displaystyle{h}^{\mathrm{STFT}}_{j,k}\simeq 1Tsegζjζj+Tsegdζ{w(ζζj)\displaystyle\frac{1}{{T}_{\mathrm{seg}}}\int^{\zeta_{j}+{T}_{\mathrm{seg}}}_{\zeta_{j}}d\zeta^{\prime}\ \bigg{\{}w(\zeta^{\prime}-\zeta_{j})
×G(t(ζ))e2πiδfkζeiδΦ(ζ)}.\displaystyle\times G(t(\zeta^{\prime}))e^{2\pi i\delta f_{k}\zeta^{\prime}}e^{i\delta\Phi_{\odot}(\zeta^{\prime})}\bigg{\}}. (38)

with δfk:=fgwfk\delta f_{k}:={f}_{\mathrm{gw}}-f_{k}. In the expression of G(t(ζ))G(t(\zeta)), the SSB time ζ\zeta appears only through the combination Ωt(ζ)\Omega_{\oplus}t(\zeta). The difference between Ωt(ζ)\Omega_{\oplus}t(\zeta) and Ωζ\Omega_{\oplus}\zeta is negligiblly small. Therefore, in Eq. (38), G(t(ζ))G(t(\zeta)) can be replaced by G(ζ)G(\zeta). The duration Tseg{T}_{\mathrm{seg}} is chosen so that G(t(ζ))G(t(\zeta)) can be approximated by a constant in each segment. With this choice of Tseg{T}_{\mathrm{seg}}, the factor eiδΦ(t)e^{i\delta\Phi_{\odot}(t)} also can be seen as a constant in each segment because it varies slower than the antenna pattern function. Therefore, Eq. (38) can be approximated by

hj,kSTFTh0eiδΦ(ζj)G(ζj)Wk(ζj),h^{\mathrm{STFT}}_{j,k}\simeq h_{0}e^{i\delta\Phi_{\odot}(\zeta_{j})}G(\zeta_{j})W_{k}(\zeta_{j})\,, (39)

where

Wk(ζj):=1Tsegζζj+Tseg𝑑ζw(ζζj)e2πiδfkζ.W_{k}(\zeta_{j}):=\frac{1}{T_{\mathrm{seg}}}\int^{\zeta_{j}+T_{\mathrm{seg}}}_{\zeta}d\zeta^{\prime}\ w(\zeta^{\prime}-\zeta_{j})e^{2\pi i\delta f_{k}\zeta^{\prime}}\,. (40)

In this work, we use the tukey window,

w(ζ)=\displaystyle w(\zeta)=
{1212cos(2πζαTseg),(0ζTseg<α2),1,(α2ζTseg1α2),1212cos(2π(Tsegζ)αTseg),(1α2<ζTseg1).\displaystyle\begin{cases}\frac{1}{2}-\frac{1}{2}\cos\left(\frac{2\pi\zeta}{\alpha T_{\mathrm{seg}}}\right)\,,&\left(0\leq\frac{\zeta}{T_{\mathrm{seg}}}<\frac{\alpha}{2}\right)\,,\\ 1\,,&\left(\frac{\alpha}{2}\leq\frac{\zeta}{T_{\mathrm{seg}}}\leq 1-\frac{\alpha}{2}\right)\,,\\ \frac{1}{2}-\frac{1}{2}\cos\left(\frac{2\pi(T_{\mathrm{seg}}-\zeta)}{\alpha T_{\mathrm{seg}}}\right)\,,&\left(1-\frac{\alpha}{2}<\frac{\zeta}{T_{\mathrm{seg}}}\leq 1\right)\,.\end{cases} (41)

We set the parameter α\alpha to 0.125. With βk:=Tsegδfk\beta_{k}:=T_{\mathrm{seg}}\delta f_{k}, Eq. (40) can be calculated as

Wk(ζj)=\displaystyle W_{k}(\zeta_{j})= e2πiδfkζj\displaystyle\ e^{2\pi i\delta f_{k}\zeta_{j}}
×(1+eiπαβk)(1e2πiβk(1α/2))4πiβk(α2βk21).\displaystyle\times\frac{(1+e^{i\pi\alpha\beta_{k}})(1-e^{2\pi i\beta_{k}(1-\alpha/2)})}{4\pi i\beta_{k}(\alpha^{2}\beta_{k}^{2}-1)}\,. (42)

Using the Jacobi-Anger expansion, we can expand the factor eiδΦ(ζj)e^{i\delta\Phi_{\odot}(\zeta_{j})} that appears in Eq. (39) as

eiδΦ(ζj)==iJ(X)eiΩζjei(φϕX)e^{i\delta\Phi_{\odot}(\zeta_{j})}=\sum_{\ell=-\infty}^{\infty}i^{\ell}J_{\ell}(X)e^{i\ell\Omega_{\odot}\zeta_{j}}e^{i\ell(\varphi_{\odot}-\phi_{X})} (43)

where J(z)J_{\ell}(z) is the Bessel function of the first kind and

X:=2πfgwRESc(Δnx)2+(Δny)2,\displaystyle X:=\frac{2\pi{f}_{\mathrm{gw}}{R}_{\mathrm{ES}}}{c}\sqrt{(\Delta n_{x})^{2}+(\Delta n_{y})^{2}}, (44)
eiϕX:=Δnx+iΔny.\displaystyle e^{i\phi_{X}}:=\Delta n_{x}+i\Delta n_{y}. (45)

Therefore, Eq. (39) can be expressed as

hj,kSTFT\displaystyle{h}^{\mathrm{STFT}}_{j,k} h0G(ζj)Wk(0)e2πiδfkζj\displaystyle\simeq h_{0}G(\zeta_{j})W_{k}(0)e^{2\pi i\delta f_{k}\zeta_{j}}
×=iJ(X)ei(φϕX)eiΩζj.\displaystyle\times\sum_{\ell=-\infty}^{\infty}i^{\ell}J_{\ell}(X)e^{i\ell(\varphi_{\odot}-\phi_{X})}e^{i\ell\Omega_{\odot}\zeta_{j}}\,. (46)

The Fourier transform of hj,kSTFT{h}^{\mathrm{STFT}}_{j,k} with a fixed integer kk is defined by

𝖧,k:=1Nsegj=0Nseg1hj,kSTFTe2πij/Nseg.\mathsf{H}_{\ell,k}:=\frac{1}{{N}_{\mathrm{seg}}}\sum_{j=0}^{{N}_{\mathrm{seg}}-1}{h}^{\mathrm{STFT}}_{j,k}e^{-2\pi ij\ell/{N}_{\mathrm{seg}}}. (47)

We refer to 𝖧,k\mathsf{H}_{\ell,k} as the \ell-domain signal. To understand the pattern hidden in 𝖧j,k\mathsf{H}_{j,k}, we set aside the factor G(ζj)G(\zeta_{j}) for a while. Then, Eq. (47) can be estimated as

𝖧,kh0Wk(0)iJ(X)ei(φϕX),\mathsf{H}_{\ell,k}\sim h_{0}W_{k}(0)i^{\ell^{\prime}}J_{\ell^{\prime}}(X)e^{i\ell^{\prime}(\varphi_{\odot}-\phi_{X})}, (48)

with +2πΩ1δfk\ell^{\prime}\sim\ell+2\pi\Omega_{\odot}^{-1}\delta f_{k}. Because of the fact that J(z)0J_{\ell}(z)\simeq 0 for |||z||\ell|\gtrsim|z| and XO(103)X\lesssim O(10^{3}) for fgw=100{f}_{\mathrm{gw}}=100Hz, the signal is localized within the region where only few thousand bins in the \ell-domain. Putting back the antenna pattern G(ζj)G(\zeta_{j}), we expect that \ell-domain signals lose their amplitude and their localizations become worse than those for the idealized cases. Figure 2 shows an example of the \ell-domain signal.

Refer to caption
Figure 2: An example of the \ell-domain waveform. The length of the \ell-domain waveform is 2192^{19}. This figure is a zoom-in around the region at where the signal is localized. The amplitude is h0=1.0h_{0}=1.0.

IV.3 Excess power method for finding candidates

By the method shown in the previous subsection, for every grid point 𝒏g\bm{{n}}_{\mathrm{g}} and every frequency bin fkf_{k}, we obtain an \ell-domain strain defined by

𝖲,k:=𝖧,k+𝖭,k,\mathsf{S}_{\ell,k}:=\mathsf{H}_{\ell,k}+\mathsf{N}_{\ell,k}, (49)

with

𝖭,k:=1Nsegj=0Nseg1nj,kSTFTe2πij/Nseg.\mathsf{N}_{\ell,k}:=\frac{1}{{N}_{\mathrm{seg}}}\sum_{j=0}^{{N}_{\mathrm{seg}}-1}{n}^{\mathrm{STFT}}_{j,k}e^{-2\pi ij\ell/{N}_{\mathrm{seg}}}. (50)

There are Tobs/TsegO(106){T}_{\mathrm{obs}}/{T}_{\mathrm{seg}}\sim O(10^{6}) data points in a single \ell-domain strain and we know that the signal in \ell-domain will be localized within a small region O(103)\sim O(10^{3}). Thus, the excess power method Anderson et al. (2001) is useful for selecting the candidates with a minimal computational cost. We here divide an \ell-domain signal into short chunks so that each chunk has the length δ\delta\ell and neighbored segments have an overlap by δ/2\delta\ell/2, which is one of the simplest choices but not the optimal one. Then, we obtain Nchunk/signal=2(Nsegδ)/δ{N}_{\mathrm{chunk/signal}}=2({N}_{\mathrm{seg}}-\delta\ell)/\delta\ell chunks from one \ell-domain signal. The excess power statistic for the grid point 𝒏g{\bm{{n}}}_{\mathrm{g}}, the frequency bin fkf_{k}, and the cc-th chunk (c=0,1,,Nchunk/signal1c=0,1,\dots,{N}_{\mathrm{chunk/signal}}-1) is defined by

(𝒏g,fk,c):=4=cδ/2(c+2)δ/21|𝖲,k|2σ~k2,\mathcal{E}({\bm{{n}}}_{\mathrm{g}},f_{k},c):=4\sum_{\ell=c\delta\ell/2}^{(c+2)\delta\ell/2-1}\frac{|\mathsf{S}_{\ell,k}|^{2}}{\tilde{\sigma}_{k}^{2}}\,, (51)

where

𝖭,k𝖭,k=:12σ~k2δ.\langle\mathsf{N}_{\ell,k}\mathsf{N}^{\ast}_{\ell^{\prime},k}\rangle=:\frac{1}{2}\tilde{\sigma}_{k}^{2}\delta_{\ell\ell^{\prime}}\,. (52)

The variance of noise in \ell-domain, σ~k\tilde{\sigma}_{k}, is estimated as

σ~k2=Sn(fk)NsegTseg×𝒲,\tilde{\sigma}_{k}^{2}=\frac{{S}_{\mathrm{n}}(f_{k})}{{N}_{\mathrm{seg}}{T}_{\mathrm{seg}}}\times\mathcal{W}\,, (53)

where 𝒲\mathcal{W} is the factor coming from the window function and defined by

𝒲:=0.50.5𝑑x[w(x)]2.\mathcal{W}:=\int^{0.5}_{-0.5}dx[w(x)]^{2}\,. (54)

The derivation of Eq. (53) is summarized in Appendix A.

We define the SNR of the excess power by

ρEP(𝒏g,fk,c):=(𝒏g,fk,c)nσn(),{\rho}_{\mathrm{EP}}({\bm{{n}}}_{\mathrm{g}},f_{k},c):=\frac{\mathcal{E}({\bm{{n}}}_{\mathrm{g}},f_{k},c)-\langle\mathcal{E}\rangle_{\mathrm{n}}}{\sigma_{\mathrm{n}}(\mathcal{E})}\,, (55)

where

n=2δ,\langle\mathcal{E}\rangle_{\mathrm{n}}=2\delta\ell\,, (56)

and

σn():=(n)2n=2δ,\sigma_{\mathrm{n}}(\mathcal{E}):=\sqrt{\langle(\mathcal{E}-\langle\mathcal{E}\rangle_{\mathrm{n}})^{2}\rangle_{\mathrm{n}}}=2\sqrt{\delta\ell}\,, (57)

are, respectively, the expectation value and the standard deviation of \mathcal{E} when only noise exists. We select the candidate set of parameter values {𝒏g,fk,c}\{{\bm{{n}}}_{\mathrm{g}},f_{k},c\}, when

ρEP(𝒏g,fk,c)>ρ^EP{\rho}_{\mathrm{EP}}({\bm{{n}}}_{\mathrm{g}},f_{k},c)>{\hat{\rho}}_{\mathrm{EP}}

is satisfied with a threshold value ρ^EP{\hat{\rho}}_{\mathrm{EP}}. Strictly speaking, since the excess power statistic \mathcal{E} is the sum of 2δ2\delta\ell squared Gaussian random variables with the variance 1/2δ1/2\sqrt{\delta\ell}, \mathcal{E} follows a chi square distribution with the degree of freedom 2δ2\delta\ell. However, since here we choose δ\delta\ell to be large, the distribution of \mathcal{E} can be approximated by a Gaussian distribution with the average 2δ2\delta\ell and the standard deviation 2δ2\sqrt{\delta\ell}. Therefore, in the absence of gravitational wave signal, the probability distribution of ρEP{\rho}_{\mathrm{EP}} is a Gaussian distribution with zero mean and unit variance.

Also in the presence of some signal, the excess power statistics ρEP{\rho}_{\mathrm{EP}} is given by a sum of many statistical variables. Thus, the statistical distribution of ρEP{\rho}_{\mathrm{EP}} can be approximated by the Gaussian distribution whose mean and standard deviation are calculated as

μEP(𝝃)=2Pk(𝝃)σ~k2δ,{\mu}_{\mathrm{EP}}(\bm{{\xi}})=\frac{2P_{k}(\bm{{\xi}})}{\tilde{\sigma}^{2}_{k}\sqrt{\delta\ell}}\,, (58)

and

σEP(𝝃)=1+4Pk(𝝃)σ~k2δ,{\sigma}_{\mathrm{EP}}(\bm{{\xi}})=\sqrt{1+\frac{4P_{k}(\bm{{\xi}})}{\tilde{\sigma}_{k}^{2}\delta\ell}}\,, (59)

Here, we define

Pk(𝝃):=|𝖧,k(𝝃)|2,P_{k}(\bm{{\xi}}):=\sum_{\ell}|\mathsf{H}_{\ell,k}(\bm{{\xi}})|^{2}\,, (60)

and we define 𝝃\bm{{\xi}} as a set of parameters 𝝃:=(h0,ξ)=(h0,fgw,αs,δs)\bm{{\xi}}:=(h_{0},\vec{\xi})=(h_{0},{f}_{\mathrm{gw}},{\alpha}_{\mathrm{s}},{\delta}_{\mathrm{s}}). The false alarm rate and the detection efficiency will be assessed with this Gaussian approximation.

IV.4 Neural network for localizing

IV.4.1 fundamentals

Deep learning is one of the approaches for finding features being hidden in the data (see Goodfellow et al. (2016) as a textbook). Artificial neural networks (ANNs) are the architectures playing the central roll in deep learning. An ANN consists of consecutive layers and each layer is formed by a lot of units (neurons). Each layer takes inputs from the previous layer and processed data is passed to the next layer. As a simple example, the process occurring in each layer can be written as the combination of affine transformation and a non-linear transformation, i.e.,

xi(+1)=g(j=1N()wij()xj()+b())(i=1,2,,N(+1)),x^{(\ell+1)}_{i}=g\left(\sum_{j=1}^{N^{(\ell)}}w^{(\ell)}_{ij}x^{(\ell)}_{j}+b^{(\ell)}\right)\ (i=1,2,\cdots,N^{(\ell+1)})\,, (61)

where x()x^{(\ell)} is a set of input data on the \ell-th layer and gg is a nonlinear function, which is called an activation function. We use a ReLU function Jarrett et al. (2009), defined by

g(z)=max[z,0].g(z)=\max[z,0]\,. (62)

The parameters ww and bb are, respectively, called weights and biases. They are tunable parameters and optimized to capture the features of data. The process to optimize weights and biases is called training. Frequently, the affine transformation and the non-linear transformation are divided into two layers, called a linear layer and a non-linear transformation layer, respectively.

In addition to the layers as given by Eq. (61), many variants are proposed so far. In this work, we use also one-dimensional convolutional layers LeCun and Bengio (1998) and max-pooling layers Zhou and Chellappa (1988). The input of a convolutional layer, denoted by xicx^{c}_{i}, is a set of vectors. For example, in the case of color images, each pixel has three channels corresponding to three primary colors of light. Therefore, the input data is a set of three two-dimensional arrays. The discrete convolution, which is represented as

oic=c=0C1k=0K1fkc,cxi+kc+bc,o^{c^{\prime}}_{i}=\sum_{c=0}^{C-1}\sum_{k=0}^{K-1}f^{c,c^{\prime}}_{k}x^{c}_{i+k}+b^{c^{\prime}}\,, (63)

is calculated in a convolutional layer. Here, xx is the input and oo is the output data of the layer. CC and KK are, respectively, the number of channels and the width of the kernel. Each pixel of the data is specified by an index ii. The parameters ff and bb are optimized during the training. A max pooling layer, whose operation can be written as

oic=maxk=0,1,,K1[xsi+kc],o^{c}_{i}=\max_{k=0,1,\cdots,K-1}[x^{c}_{si+k}]\,, (64)

with the kernel size KK and the stride ss, reduces the length of the data and hence the computational cost.

In supervised learning, a given dataset consists of many pairs of input data and target values. An ANN learns the relation between input data and target values from the dataset and predicts values corresponding to newly given input data. In order to train an ANN, the deviation between the predicted values and the target value is quantified by a loss function. For a regression problem, the mean square loss,

L[𝒚(w),𝒕]=12i=1d|yi(w)ti|2,L[\bm{{y}}(w),\bm{{t}}]=\frac{1}{2}\sum_{i=1}^{d}|y_{i}(w)-t_{i}|^{2}\,, (65)

is often employed. Here, 𝒚\bm{{y}} and 𝒕\bm{{t}} are a set of predicted values and that of target values, respectively, and they are expressed as dd-dimensional vectors. The prediction depends on the weights of the neural network, which are denoted by a single symbol ww. An ANN is optimized so as to minimize the loss function for a given dataset, which is the sum of the loss functions for all data contained in the training dataset. Because the complete minimization using all dataset cannot be done, the iterative method is used. The weight ww is updated by the replacement algorithm given by

wwηwn=1NtrainL[𝒚n(w)𝒕n],w\to w-\eta\nabla_{w}\sum_{n=1}^{{N}_{\mathrm{train}}}L[\bm{{y}}_{n}(w)-\bm{{t}}_{n}]\,, (66)

where Ntrain{N}_{\mathrm{train}} is the number of data contained in the dataset and η\eta is called learning rate and characterizes the strength of each update. The algorithm shown in Eq. (66) is called gradient descent, which is the simplest procedure to update the weights, and many variants (e.g., momentum Polyak (1964), RMS prop Hinton (2012), Adam Kingma and Ba (2017)) are proposed so far. Regardless of the choice of the update algorithm, the gradients of a loss function is required and they can be quickly calculated by the backpropagation scheme Rumelhart et al. (1986). In Eq. (66), all data in the dataset are used for each iteration. In practice, the loss function for a subset of the dataset is calculated. The subset is called a batch and chosen randomly in every iteration. This procedure is called a mini-batch training.

In the training process, we optimize a neural network so that the loss function is minimized for a dataset. However, this strategy cannot be straightforwardly applied to practical situations. First, the trained neural network may fall in overfitting. Then, the neural network does not have an expected ability to correctly predict the label for a newly given input data which is not used for training. Second, we have to optimize the neural network model and the update procedure, too. For this purpose, we have to appropriately select the hyperparameters, such as the number of neurons of the \ell-th layer (N()N^{(\ell)}) and the learning rate (η\eta). They are not automatically tuned during the training process.

To solve these problems, we prepare a validation dataset which is independent from the training dataset. The weights of the neural network are optimized so that the loss function for the training dataset is minimized. The validation data is used for monitoring the training process and assessing which model is better for the problem that the user wants to solve. To prevent the overfitting, the training should be stopped when the loss for the validation dataset tend to deviate from that for the training dataset (early stopping). To optimize the hyperparameters, many neural network models having various structures are trained with different training schemes. Among them, we choose the one performing with the smallest loss for the validation dataset.

IV.4.2 setup in our analysis

The whole architecture of the neural network we used is shown in Table 1. The input data of the neural network is the complex valued numbers taken from a short chunk of the \ell-domain signal, and the output is the predicted sky position. The \ell-domain waveform 𝖧,k\mathsf{H}_{\ell,k} is determined mainly by the residual phase δΦ\delta\Phi_{\odot}, which depends on the sky position (αs,δs)({\alpha}_{\mathrm{s}},{\delta}_{\mathrm{s}}) through the vector Δ𝒏\Delta\bm{{n}}. Because z=0z_{\odot}=0, only xx and yy components of Δ𝒏\Delta\bm{{n}} affect δΦ\delta\Phi_{\odot}. Therefore, we label each waveform with the values of Δnx\Delta n_{x} and Δny\Delta n_{y}, which are the targets of the prediction of the neural network. The outputs of the neural network are inverted to the predicted values which are denoted by (αp,δp)({\alpha}_{\mathrm{p}},{\delta}_{\mathrm{p}}). We apply the neural network to each candidate, selected by the excess power method, in order to narrow down the possible area in which the source is likely to be located. For simplicity, the (α,δ)(\alpha,\delta)-plane is regarded as a two-dimensional Euclidean space, and the shape of the predicted region is assumed to be a disk on the (α,δ)(\alpha,\delta)-plane. For each candidate, the origin of the disk is set to the predicted point. The radius of the disk, denoted by rNN{r}_{\mathrm{NN}}, is fixed to a constant value. In the follow-up stage, the finer grids are placed to cover whole region of the disk.

Table 1: The architecture of the neural network used in this work. For convolution and max pooling layers, the input and the output are characterized by (C,N)(C,N) where CC is the number of channels and NN is the length of the data. For convolutional layers, the lengths of kernels are 16, 16, 8, 8, 4 and 4 from the earlier to the later layer. The kernel size of the max pooling layers is 4.
Layer Input output
1-d convolution (2, 2048) (64, 2033)
ReLU (64, 2033) (64, 2033)
1-d convolution (64, 2033) (64, 2018)
ReLU (64, 2018) (64, 2018)
max pooling (64, 2018) (64, 504)
1-d convolution (64, 504) (128, 497)
ReLU (128, 497) (128, 497)
1-d convolution (128, 497) (128, 490)
ReLU (128, 490) (128, 490)
max pooling (128, 490) (128, 122)
1-d convolution (128, 122) (256, 119)
ReLU (256, 119) (256, 119)
1-d convolution (256, 119) (256, 116)
ReLU (256, 116) (256, 116)
max pooling (256, 116) (256, 29)
Dense 256×\times29 64
ReLU 64 64
Dense 64 64
ReLU 64 64
Dense 64 2

In order to train the neural network, we need to prepare the training dataset and the validation dataset. We use Eq. (47) as the model waveform and pick up only a short chunk containing the signal. The length of chunk is δ=2048\delta\ell=2048. We prepare 200,000 waveforms for the training and ten thousand waveforms for validation. At that time, we set h0=1h_{0}=1. We assume that we use only a single detector and use the geometry information (e.g., the latitude of the detector) of LIGO Hanford in calculating the antenna pattern function as an example. In this work, we focus on one sky patch covered by a single grid point and a frequency bin fixed at fk=100f_{k}=100 Hz since the scaling to the search over the whole sky and the wider frequency band is straightforward. The sources are randomly distributed within the sky patch. The parameters βk\beta_{k} are randomly sampled from a uniform distribution on [0.5,0.5][-0.5,0.5]. The original strain has the duration 2242^{24} sec and the sampling frequency 10241024 Hz. We introduce the normalized gravitational wave amplitude by

h^0:=h0(Sn(fref)1Hz1)1/2.\hat{h}_{0}:=h_{0}\left(\frac{{S}_{\mathrm{n}}({f}_{\mathrm{ref}})}{\mathrm{1Hz^{-1}}}\right)^{-1/2}. (67)

Here, we set fref=fk{f}_{\mathrm{ref}}=f_{k}. At each training step, the amplitude whose logarism is randomly chosen from the uniform distribution on 2.1log10h^01.0-2.1\leq{\log}_{\mathrm{10}}\hat{h}_{0}\leq-1.0 is multiplied to the waveforms, and they are injected into the simulated noise. The different realizations of noise are sampled for every iterations. The real part and the imaginary part of the noise data mimicking 𝖭,k\mathsf{N}_{\ell,k} are generated from a Gaussian distribution with a zero mean and a variance

𝒲4NsegTseg,\frac{\mathcal{W}}{4{N}_{\mathrm{seg}}{T}_{\mathrm{seg}}}\,, (68)

(see Eqs. (52) and (53)).

We employ the mini-batch training. We set the batch size to 256. The Adam Kingma and Ba (2017) is used for the update algorithm. We implement with the Python library PyTorch Paszke et al. (2019) and use a GPU GeForce 1080Ti. The parameter values we used are listed in Table 2.

Table 2: The values of the parameters we used in this work.
Symbol Parameters Value
Tobs{T}_{\mathrm{obs}} Observation period 2242^{24} sec
fs{f}_{\mathrm{s}} Sampling frequency 1024 Hz
Ngrid{N}_{\mathrm{grid}} # of grids 352436
Nbin{N}_{\mathrm{bin}} # of frequency bins of STFT 3200
Tseg{T}_{\mathrm{seg}} Duration of a STFT segment 32 sec
Tslide{T}_{\mathrm{slide}} Dilation of STFT segment 32 sec
δ\delta\ell Length of chunk 2048
(δ)slide{(\delta\ell)}_{\mathrm{slide}} Dilation of chunk 128
fkf_{k} Fixed frequency bin 100 Hz
αg{\alpha}_{\mathrm{g}} Right ascension of grid -0.158649 rad
δg{\delta}_{\mathrm{g}} Declination of grid 1.02631 rad

IV.5 Follow-up analysis by coherent matched filtering

After selecting candidates and narrowing down the possible area at which the source is likely to be located, we apply the coherent matched filtering for the follow-up analysis. The grid points with the resolution shown in Eq. (20) are placed to cover the selected area. Assuming a grid point, we can carry out the demodulation of the phase by using the time resampling technique. If the deviation between the directions of the grid point and the source is smaller than the resolution, the residual phase remaining after the time resampling is sufficiently small to avoid the loss of SNR.

In this operation, heterodyning and downsampling can significantly reduce the data length and hence the computational cost Dupuis and Woan (2005). Let us assume that we have a candidate labeled with {𝒏g,fk,c}\{{\bm{{n}}}_{\mathrm{g}},f_{k},c\}. If the candidate is the true event, the gravitational wave frequency fgw{f}_{\mathrm{gw}} should take the value in the narrow frequency band indicated by

fk12Tsegfgwfk+12Tseg.f_{k}-\frac{1}{2{T}_{\mathrm{seg}}}\leq{f}_{\mathrm{gw}}\leq f_{k}+\frac{1}{2{T}_{\mathrm{seg}}}\,. (69)

By multiplying the factor e2πifkζe^{-2\pi if_{k}\zeta} to the resampled strain, we can convert the gravitational wave signal frequency to near DC components (heterodyning). After that, the gravitational wave signal has a lower frequency than 1/2Tseg1/2{T}_{\mathrm{seg}} Hz. Therefore, downsampling by appropriately averaging the resampled strain data with a sampling frequency 1/Tseg\sim 1/{T}_{\mathrm{seg}} reduces the number of data points without loss of the significance of the gravitational wave signal.

The coherent matched filtering follows the heterodyning and the downsampling processes. As stated in Eqs. (2) and (12), we fix ψ=0\psi=0, cosι=1\cos\iota=1, and ϕ0=0\phi_{0}=0 and treat them as known parameters. Also, we assume that the signal waveform and the template completely match. The definition of a match is already given in Eq. (17). The gravitational waveform is written as

h(𝝃)=h0htemp(ξ).h(\bm{{\xi}})=h_{0}\cdot{h}_{\mathrm{temp}}(\vec{\xi})\,. (70)

Among these parameters, the amplitude h0h_{0} can be analytically marginalized to maximize the likelihood. Then, we obtain the signal-to-noise ratio in Eq. (17) and use it as the detection statistic. When only the detector noise dominates the strain data, ρMF{\rho}_{\mathrm{MF}} follows the standard normal distribution. On the other hand, if the signal exists, the SNR follows a Gaussian distribution with a mean

μMF(𝝃)=h0(htemp(ξ)|htemp(ξ)),{\mu}_{\mathrm{MF}}(\bm{{\xi}})=h_{0}\cdot\sqrt{({h}_{\mathrm{temp}}(\vec{\xi})|{h}_{\mathrm{temp}}(\vec{\xi}))}\,, (71)

and a unit variance.

V Results

V.1 Computational cost

Our procedure is characterized by three parameters 𝝆:=(FAPEP,rNN,ρ^MF)\bm{{\rho}}:=(\mathrm{FAP}_{\mathrm{EP}},{r}_{\mathrm{NN}},{\hat{\rho}}_{\mathrm{MF}}), i.e.,

  • FAPEP\mathrm{FAP}_{\mathrm{EP}}, false alarm probability for each chunk,

  • rNN{r}_{\mathrm{NN}}, the radius of the predicted region to which the follow-up analysis is applied,

  • ρ^MF{\hat{\rho}}_{\mathrm{MF}}, the threshold of the SNR of the coherent matched filtering.

𝒩EP{\mathcal{N}}_{\mathrm{EP}} denotes the computational cost of the excess power method. 2Nseg2{N}_{\mathrm{seg}} multiplications and 2Nseg2{N}_{\mathrm{seg}} additions of real numbers are required to calculate the excess powers for all chunks in one \ell-domain signal. The computational cost for calculating the excess powers for all chunks can be estimated as

𝒩EP=4Nseg×Ngrid×Nbin4.7×1015,{\mathcal{N}}_{\mathrm{EP}}=4{N}_{\mathrm{seg}}\times{N}_{\mathrm{grid}}\times{N}_{\mathrm{bin}}\sim 4.7\times 10^{15}\,, (72)

in the unit of the number of floating point operations. As we see in the following, this cost can be neglected. Next, we check the computational time of the neural network analysis. We estimate the computational time of the neural network by measuring the elapsed time for analyzing ten thousand data. Because the elapsed time is 1.4sec, the total computational time of the neural network is estimated as

𝒯NN1.4sec104data×Ncandidate,{\mathcal{T}}_{\mathrm{NN}}\simeq\frac{1.4\mathrm{sec}}{10^{4}\mathrm{data}}\times{N}_{\mathrm{candidate}}\,, (73)

where Ncandidate{N}_{\mathrm{candidate}} is the number of candidates which are selected by the excess power method and is estimated as

Ncandidate\displaystyle{N}_{\mathrm{candidate}} =Nchunk×FAPEP\displaystyle={N}_{\mathrm{chunk}}\times{\mathrm{FAP}}_{\mathrm{EP}}
=NgridNbinNseg(δ)slide×FAPEP\displaystyle={N}_{\mathrm{grid}}\cdot{N}_{\mathrm{bin}}\cdot\frac{{N}_{\mathrm{seg}}}{{(\delta\ell)}_{\mathrm{slide}}}\times{\mathrm{FAP}}_{\mathrm{EP}}
4.6×1010(FAPEP102).\displaystyle\simeq 4.6\times 10^{10}\left(\frac{\mathrm{FAP}_{\mathrm{EP}}}{10^{-2}}\right)\,. (74)

Substituting it, we obtain

𝒯NN6.4×106sec(FAPEP102).{\mathcal{T}}_{\mathrm{NN}}\simeq 6.4\times 10^{6}\mathrm{sec}\left(\frac{\mathrm{FAP}_{\mathrm{EP}}}{10^{-2}}\right)\,. (75)

Therefore, we focus on the case FAPEP102\mathrm{FAP}_{\mathrm{EP}}\leq 10^{-2}. The computational cost of our analysis is dominated mainly by the preprocessing of the observed strain data and the follow-up analysis. The computational cost of the entire analysis is denoted by 𝒩comp{\mathcal{N}}_{\mathrm{comp}} and is approximately calculated by

𝒩comp=𝒩preprocess+𝒩followup,{\mathcal{N}}_{\mathrm{comp}}={\mathcal{N}}_{\mathrm{preprocess}}+{\mathcal{N}}_{\mathrm{follow-up}}\,, (76)

where 𝒩preprocess{\mathcal{N}}_{\mathrm{preprocess}} and 𝒩followup{\mathcal{N}}_{\mathrm{follow-up}} are the computational cost of the preprocessing and the follow-up analysis, respectively. In this work, we fix the STFT segment duration and the length of the chunk. Thus, the computational cost of the preprocessing is a constant.

𝒩preprocess=Ngrid×(𝒩STFT+𝒩FFT×Nbins).{\mathcal{N}}_{\mathrm{preprocess}}={N}_{\mathrm{grid}}\times({\mathcal{N}}_{\mathrm{STFT}}+{\mathcal{N}}_{\mathrm{FFT}}\times{N}_{\mathrm{bins}})\,. (77)

The computational cost of the STFT is

𝒩STFT=Nseg5Tsegfslog2Tsegfs,{\mathcal{N}}_{\mathrm{STFT}}={N}_{\mathrm{seg}}\cdot 5{T}_{\mathrm{seg}}{f}_{\mathrm{s}}\log_{2}{T}_{\mathrm{seg}}{f}_{\mathrm{s}}\,, (78)

and the computational cost of FFT is

𝒩FFT=5Nseglog2Nseg.{\mathcal{N}}_{\mathrm{FFT}}=5{N}_{\mathrm{seg}}\log_{2}{N}_{\mathrm{seg}}\,. (79)

With Tobs=224{T}_{\mathrm{obs}}=2^{24}sec, Tseg=25{T}_{\mathrm{seg}}=2^{5}sec, fs=210{f}_{\mathrm{s}}=2^{10}Hz, and Ngrid3.5×105{N}_{\mathrm{grid}}\sim 3.5\times 10^{5}, the computational cost of the preprocess is estimated as

𝒩preprocess2.3×1018.{\mathcal{N}}_{\mathrm{preprocess}}\simeq 2.3\times 10^{18}\,. (80)

On the other hand, the computational cost of the follow-up analysis is determined by the combination of FAPEP\mathrm{FAP}_{\mathrm{EP}} and rNN{r}_{\mathrm{NN}}. The computational cost of the follow-up analysis is

𝒩followup=Ncandidate×π(rNN)2(δθ)coh2×𝒩FFT,coh.{\mathcal{N}}_{\mathrm{follow-up}}={N}_{\mathrm{candidate}}\times\frac{\pi({r}_{\mathrm{NN}})^{2}}{{(\delta\theta)}_{\mathrm{coh}}^{2}}\times{\mathcal{N}}_{\mathrm{FFT,coh}}\,. (81)

Here, (δθ)coh2{(\delta\theta)}_{\mathrm{coh}}^{2} is the typical area of region where each grid point of the coherent analysis covers (see Eq. (20)). The computational cost of taking match is dominated by the Fourier transform and calculated as

𝒩FFT,coh=5(Tobsfs,coh)log2(Tobsfs,coh)5.0×107,{\mathcal{N}}_{\mathrm{FFT,coh}}=5({T}_{\mathrm{obs}}{f}_{\mathrm{s,coh}})\log_{2}({T}_{\mathrm{obs}}{f}_{\mathrm{s,coh}})\simeq 5.0\times 10^{7}\,, (82)

where fs,coh=1/Tseg=25{f}_{\mathrm{s,coh}}=1/{T}_{\mathrm{seg}}=2^{-5}Hz. Therefore, we estimate the computational cost of the follow-up analysis as

𝒩followup=9.0×1022(rNN103rad)2(FAPEP102).{\mathcal{N}}_{\mathrm{follow-up}}=9.0\times 10^{22}\left(\frac{{r}_{\mathrm{NN}}}{10^{-3}\mathrm{rad}}\right)^{2}\left(\frac{\mathrm{FAP}_{\mathrm{EP}}}{10^{-2}}\right)\,. (83)

Substituting Eqs. (80) and (83) into Eq. (76), we can assess the computational cost of the entire analysis as a function of rNN{r}_{\mathrm{NN}} and FAPEP\mathrm{FAP}_{\mathrm{EP}}. Figure. 3 shows the computational cost for various combinations of FAPEP\mathrm{FAP}_{\mathrm{EP}} and rNN{r}_{\mathrm{NN}}. One can read a feasible combination of FAPEP\mathrm{FAP}_{\mathrm{EP}} and rNN{r}_{\mathrm{NN}} depending on one’s available computational resources.

Refer to caption
Figure 3: The logarithm of the evaluated computational cost in the unit of the number of floating point operations. In the white hatched region, the computational cost is dominated by that of the preprocessing, i.e., 𝒩followup𝒩preprocess{\mathcal{N}}_{\mathrm{follow-up}}\leq{\mathcal{N}}_{\mathrm{preprocess}}. As the false alarm probability of excess power is set to be smaller, the computational cost is reduced because the number of candidates decreases. Also, the computational cost becomes smaller as the parameter rNN{r}_{\mathrm{NN}} is shrunk.

V.2 False alarm probability

The false alarm probability of the entire process (see Wette (2012); Dreissigacker et al. (2018)) is

pfa(𝝆)=\displaystyle{p}_{\mathrm{fa}}(\bm{{\rho}})= {1(Prob[ρEP<ρ^EPρEP𝒩(0,1)])Nchunk}\displaystyle\left\{1-\left(\mathrm{Prob}\left[{\rho}_{\mathrm{EP}}<{\hat{\rho}}_{\mathrm{EP}}\mid{\rho}_{\mathrm{EP}}\sim\mathcal{N}(0,1)\right]\right)^{{N}_{\mathrm{chunk}}}\right\}
×{1(Prob[ρMF<ρ^MFρMF𝒩(0,1)])Nt},\displaystyle\times\left\{1-\left(\mathrm{Prob}\left[{\rho}_{\mathrm{MF}}<{\hat{\rho}}_{\mathrm{MF}}\mid{\rho}_{\mathrm{MF}}\sim\mathcal{N}(0,1)\right]\right)^{{N}_{\mathrm{t}}}\right\}\,, (84)

where Nt{N}_{\mathrm{t}} is the number of required templates for the coherent search. It can be estimated by

Nt=Ncandidate×π(rNN)2(δθ2)coh×Nbin,coh,{N}_{\mathrm{t}}={N}_{\mathrm{candidate}}\times\frac{\pi({r}_{\mathrm{NN}})^{2}}{(\delta\theta^{2})_{\mathrm{coh}}}\times{N}_{\mathrm{bin,coh}}, (85)

where Nbin,coh{N}_{\mathrm{bin,coh}} is the number of the frequency bins of the coherent search. Using the value listed in Table. 2, we obtain

Nt5.6×1021(rNN103rad)2(FAPEP102).{N}_{\mathrm{t}}\simeq 5.6\times 10^{21}\left(\frac{{r}_{\mathrm{NN}}}{10^{-3}\mathrm{rad}}\right)^{2}\left(\frac{\mathrm{FAP}_{\mathrm{EP}}}{10^{-2}}\right)\,. (86)

Because the false alarm probability of the follow-up stage determines that of the entire process, we can approximate it as

pfa(𝝆){1(Prob[ρMF<ρ^MFρMF𝒩(0,1)])Nt}.{p}_{\mathrm{fa}}(\bm{{\rho}})\simeq\left\{1-\left(\mathrm{Prob}\left[{\rho}_{\mathrm{MF}}<{\hat{\rho}}_{\mathrm{MF}}\mid{\rho}_{\mathrm{MF}}\sim\mathcal{N}(0,1)\right]\right)^{{N}_{\mathrm{t}}}\right\}\,. (87)

Furthermore, because Nt1{N}_{\mathrm{t}}\gg 1, we can approximate

pfa(𝝆)NtProb[ρMF>ρ^MFρMF𝒩(0,1)].{p}_{\mathrm{fa}}(\bm{{\rho}})\simeq{N}_{\mathrm{t}}\cdot\mathrm{Prob}\left[{\rho}_{\mathrm{MF}}>{\hat{\rho}}_{\mathrm{MF}}\mid{\rho}_{\mathrm{MF}}\sim\mathcal{N}(0,1)\right]\,. (88)

In this work, the threshold ρ^MF{\hat{\rho}}_{\mathrm{MF}} is chosen so that the false alarm probability of the entire process is 0.01. As shown in Eq. (86), the number of templates depends on (rNN,FAPEP)({r}_{\mathrm{NN}},\mathrm{FAP}_{\mathrm{EP}}), and the same is true for ρ^MF{\hat{\rho}}_{\mathrm{MF}}.

The false alarm probability of the matched filtering has already been studied in the literature. Therefore, in this work, we check only the validity of the statistical properties of the excess power method. It is computationally difficult to treat a whole signal of a duration Tobs=224{T}_{\mathrm{obs}}=2^{24}sec. Therefore, we generate Nseg{N}_{\mathrm{seg}} short noise data of a duration Tseg{T}_{\mathrm{seg}} assuming Sn(f)=1{S}_{\mathrm{n}}(f)=1. After applying a window function, FFT is carried out to each short strain. We pick up a FFT of kk-th frequency bin from each FFT data and regard them as {nj,kSTFT}j=1Nseg\{{n}^{\mathrm{STFT}}_{j,k}\}_{j=1}^{{N}_{\mathrm{seg}}}. We obtain 𝖭,k\mathsf{N}_{\ell,k} by taking the Fourier transform of {nj,kSTFT}j=1Nseg\{{n}^{\mathrm{STFT}}_{j,k}\}_{j=1}^{{N}_{\mathrm{seg}}} and divide it into Nseg/δ=128{N}_{\mathrm{seg}}/\delta\ell=128 chunks. After repeating above procedures for 80 times, 10,240 chunks are generated. For each chunk, the excess power statistics \mathcal{E} and SNRs ρEP{\rho}_{\mathrm{EP}} are calculated. The histogram of the simulated values of ρEP{\rho}_{\mathrm{EP}} is shown in Fig. 4. It seems to match the standard normal distribution. Additionally, we carry out the Kolmogorov-Smirnov test and obtain a p-value of 0.753254. It is numerically confirmed that the SNR of noise data follows the standard normal distribution.

Refer to caption
Figure 4: The histogram of the simulated ρEP{\rho}_{\mathrm{EP}}. The blue line is the histogram, and the orange line indicates the standard normal distribution. They match well. The p-value of the Kolmogorov-Smirnov test is 0.753254.

V.3 Detection probability

The detection probability of the signal with an amplitude h0h_{0} can be estimated by

pdet(h0;𝝆)\displaystyle\ {p}_{\mathrm{det}}(h_{0};\bm{{\rho}})
=pdet(𝝃;𝝆)ξ\displaystyle=\left\langle{p}_{\mathrm{det}}(\bm{{\xi}};\bm{{\rho}})\right\rangle_{\vec{\xi}}
=\displaystyle= Prob[ρEP>ρ^EPρEP𝒩(μEP(𝝃),σEP(𝝃))]\displaystyle\bigg{\langle}\mathrm{Prob}\left[{\rho}_{\mathrm{EP}}>{\hat{\rho}}_{\mathrm{EP}}\mid{\rho}_{\mathrm{EP}}\sim\mathcal{N}({\mu}_{\mathrm{EP}}(\bm{{\xi}}),{\sigma}_{\mathrm{EP}}(\bm{{\xi}}))\right]
×Prob[source is located in a predicted region𝝃,rNN]\displaystyle\times\mathrm{Prob}\left[\text{source is located in a predicted region}\mid\bm{{\xi}},{r}_{\mathrm{NN}}\right]
×Prob[ρMF>ρ^MFρMF𝒩(μMF(𝝃),1)]ξ,\displaystyle\times\mathrm{Prob}\left[{\rho}_{\mathrm{MF}}>{\hat{\rho}}_{\mathrm{MF}}\mid{\rho}_{\mathrm{MF}}\sim\mathcal{N}({\mu}_{\mathrm{MF}}(\bm{{\xi}}),1)\right]\bigg{\rangle}_{\vec{\xi}}\,, (89)

where

ξ:=𝑑ξ()π(ξ),\langle\cdots\rangle_{\vec{\xi}}:=\int d\vec{\xi}\ (\cdots)\pi(\vec{\xi})\,, (90)

is the average over the source parameters ξ\vec{\xi} with the probability density function π(ξ)\pi(\vec{\xi}). As explained in Sec. IV.4, the neural network is trained with the waveforms sampled only from the vicinity of the reference grid point and the narrow frequency band. It is envisaged that the trained neural network does not work well for signals outside of the reference patch and the frequency band. Therefore, we only test for the limited parameter region. Correspondingly, the average operation is also taken over such narrow parameter space. To quantify the detection power, the amplitude parameter h095%{h}^{\mathrm{95\%}}_{0} is defined by

pdet(h095%;𝝆)=0.95,{p}_{\mathrm{det}}({h}^{\mathrm{95\%}}_{0};\bm{{\rho}})=0.95\,, (91)

and correspondingly,

h^095%:=h095%(Sn(fref)1Hz1)1/2.{\hat{h}}^{\mathrm{95\%}}_{0}:={h}^{\mathrm{95\%}}_{0}\left(\frac{{S}_{\mathrm{n}}({f}_{\mathrm{ref}})}{\mathrm{1Hz^{-1}}}\right)^{-1/2}. (92)

The parameters, FAPEP\mathrm{FAP}_{\mathrm{EP}} and rNN{r}_{\mathrm{NN}}, are optimized so that h095%{h}^{\mathrm{95\%}}_{0} takes the smallest value under the condition of the computational power.

To explorer the parameter space of (FAPEP,rNN\mathrm{FAP}_{\mathrm{EP}},{r}_{\mathrm{NN}}), we place the regular grid on log10FAPEP\log_{10}\mathrm{FAP}_{\mathrm{EP}} from 8-8 to 2-2 by a step of 1, and the regular grid on log10rNN\log_{10}{r}_{\mathrm{NN}} from -4.5 to -3.0 by a step of 0.05. For every pair of FAPEP\mathrm{FAP}_{\mathrm{EP}} and rNN{r}_{\mathrm{NN}}, we calculate h^095%{\hat{h}}^{\mathrm{95\%}}_{0} by the following procedure. First, we place a regular grid on log10h^0\log_{10}\hat{h}_{0} from -2.3 to -1.0 by a step of 0.05. For one sample of the amplitude, the parameters ξ\vec{\xi} are randomly sampled. The sampled parameters are denoted by {ξ(i)}i=1M\{\vec{\xi}^{(i)}\}_{i=1}^{M}. The waveforms are generated with the sampled parameters. Each waveform is injected into different noise data in the same manner as the method explained in Sec. IV.4. The fraction of the events detected is employed as the estimator of the detection probability of the signal with parameter (h0,ξ(i))(h_{0},\vec{\xi}^{(i)}). Repeating these procedure for every sampled parameters {ξ(i)}i=1M\{\vec{\xi}^{(i)}\}_{i=1}^{M}, we obtain the set of the estimated detection probabilities. Then, the detection probability pdet(h0;𝝆){p}_{\mathrm{det}}(h_{0};\bm{{\rho}}) is estimated by

pdet(h0;𝝆)1Mi=1Mpdet(h0,ξ(i);𝝆).{p}_{\mathrm{det}}(h_{0};\bm{{\rho}})\simeq\frac{1}{M}\sum_{i=1}^{M}{p}_{\mathrm{det}}(h_{0},\vec{\xi}^{(i)};\bm{{\rho}})\,. (93)

Changing the value of the amplitude h^0\hat{h}_{0}, we get the estimated detection probability as a function of h0h_{0} for a certain values of FAPEP\mathrm{FAP}_{\mathrm{EP}} and rNN{r}_{\mathrm{NN}}. If the estimated detection probability exceeds 95% for one or more samples of the amplitude, the obtained detection probabilities are fitted by a sigmoid-like function,

ς(𝒟;a,b)=11+e(𝒟a)/b,\varsigma(\mathcal{D};a,b)=\frac{1}{1+e^{(\mathcal{D}-a)/b}}\,, (94)

where

𝒟:=(h^0)1,\mathcal{D}:=(\hat{h}_{0})^{-1}\,, (95)

is called the sensitivity depth, and the parameters (a,b)(a,b) of a sigmoid function is to be optimized. Using the optimized parameters (a,b)(a^{\ast},b^{\ast}), the estimated value of 𝒟95%:=(h^095%)1{\mathcal{D}}^{\mathrm{95\%}}:=({\hat{h}}^{\mathrm{95\%}}_{0})^{-1} can be obtained as

𝒟95%(FAPEP,rNN)=abln10.950.95,{\mathcal{D}}^{\mathrm{95\%}}(\mathrm{FAP}_{\mathrm{EP}},{r}_{\mathrm{NN}})=a^{\ast}-b^{\ast}\ln\frac{1-0.95}{0.95}\,, (96)

which is the inverse of ς(𝒟)\varsigma(\mathcal{D}). In this work, we set M=1024M=1024 and the number of noise realization for each parameter set to be 512. Figure. 5 shows an example of the fitting. The estimated values of 𝒟95%{\mathcal{D}}^{\mathrm{95\%}} is shown in Fig. 6.

Refer to caption
Figure 5: Example of fitting of detection probability. We set FAPEP=103\mathrm{FAP}_{\mathrm{EP}}=10^{-3} and rNN=103.8{r}_{\mathrm{NN}}=10^{-3.8}rad. Blue dots are estimated values of pdet{p}_{\mathrm{det}}, and orange solid line is the fitted sigmoid curve.
Refer to caption
Figure 6: Estimated sensitivity depths 𝒟95%{\mathcal{D}}^{\mathrm{95\%}}. For parameters within a blank region, the detection probabilities do not reach 95% for a surveyed range of amplitude h^0\hat{h}_{0}. For rNN104rad{r}_{\mathrm{NN}}\lesssim 10^{-4}\mathrm{rad}, the neural network controls the detection probability. On the other hand, for rNN103.6rad{r}_{\mathrm{NN}}\gtrsim 10^{-3.6}\mathrm{rad}, the excess power method controls the detection probability because the predicted region is large enough to contain the true location of the source.

To confirm that the signals with the amplitude h095%{h}^{\mathrm{95\%}}_{0} are detected with 95% detection probability, we perform the injection test. To save the computational cost, we skip the follow-up stage and assume that the detection probability is determined by the excess power selection and the neural network analysis. We only use a short chunk centered at the support of the signal as an injection waveform. Ten thousand chunks with various signal parameters ξ\vec{\xi} are prepared and injected into Gaussian noise data. The waveform model and the noise property are the same as those of the training dataset of the neural network. The excess power is calculated for each chunk, and the neural network analysis is carried out if a chunk is selected as a candidate. Counting the number of detected events, we obtain the recovered value of the detection probability. The procedure shown above is repeated for each combination of (FAPEP,rNN)(\mathrm{FAP}_{\mathrm{EP}},{r}_{\mathrm{NN}}). Figure. 7 shows the result of the injection test. For all combinations of (FAPEP,rNN)(\mathrm{FAP}_{\mathrm{EP}},{r}_{\mathrm{NN}}), the detection probabilities are close to 95%. Therefore, our estimation of the detectable amplitude h^095%{\hat{h}}^{\mathrm{95\%}}_{0} is convincing.

Refer to caption
Figure 7: Recovered values of the detection probability. For all parameters, the detection probabilities are recovered to 95% with the error of only a few percent.

VI Conclusion

We proposed a new method of an all-sky search for continuous gravitational waves, combining the excess power and the deep learning methods. The time resampling and the STFT are used for localizing the signal into a relatively small number of elements in the whole data. Then, the excess power method selects the candidates of the grid point in the sky and the frequency bins where the signal likely exists. The deep neural network narrows down the region to be explored by the follow-up search by two orders of magnitude than the original area of the sky patch. Before the follow-up coherent search, the heterodyning and the downsampling can reduce the computational cost. We calculated the computational cost of our method. Most of the computational costs are spent by preprocessing the strain data and the follow-up coherent matched filtering search. The computational costs of the excess power method are negligibly small, and the computational time of the neural network can be suppressed to an acceptable level by setting FAPEP102\mathrm{FAP}_{\mathrm{EP}}\leq 10^{-2}. We estimated the detection abilities of our method with the limited setup where the polarization angle, the inclination angle, and the initial phase are fixed and assumed as known parameters. The dataset for training the neural network and testing is generated from a very narrow parameter space of (fgw,αs,δs)({f}_{\mathrm{gw}},{\alpha}_{\mathrm{s}},{\delta}_{\mathrm{s}}). With a reasonable computational power, the sensitivity depth can be achieved 𝒟95%80{\mathcal{D}}^{\mathrm{95\%}}\gtrsim 80.

Our training data, which is used for training the neural network, span the restricted parameter region. Namely, the gravitational wave frequencies of the training data are distributed within the small frequency band centered at 100 Hz of width ±1/(2Tseg)\pm 1/(2{T}_{\mathrm{seg}}) and the source locations are sampled from very narrow regions around the fixed grid point. Nevertheless, we can expect our method can be applied to the all-sky search and the frequency band below 100 Hz. If the gravitational wave frequency becomes lower than 100Hz, the strength of the phase modulation becomes weaker (see Eq. (6)). Therefore, even if fgw<100{f}_{\mathrm{gw}}<100 Hz, the signal power in \ell-domain would still be concentrated in a narrow region, and it can be expected that the efficiency of the excess power method is maintained. We can employ a similar discussion also for the dependency on the source location. The power concentration in \ell-domain is still valid even if we take into account the dependency of the source location, while it causes the variation of the signal amplitude. From the above discussion, only slight modifications of the construction of the training data and our neural network structure are enough to apply our strategies to an all-sky search of monochromatic sources having a frequency lower than 100\sim 100 Hz.

In addition to the above points, there are several rooms for improving our method. We fixed various parameters such as the width of the STFT Tseg{T}_{\mathrm{seg}} and the length of each chunk δ\delta\ell in a little hand-waving manner. Surveying and optimizing these parameters may improve the detection efficiency of our method. Especially, the sampling frequency when downsampling might reduce the computational cost significantly. As can be seen from Eq. (48), the deviation δfk\delta f_{k} causes the translation of the signal in the \ell-domain. It is expected that we can further constrain the gravitational wave frequency than Tseg1\sim{T}_{\mathrm{seg}}^{-1}. Considering this effect, we can set the sampling frequencies of downsampled strains to a lower value than our current choice. This optimization would result in the further reduction of the computational time of the follow-up coherent search.

In the present paper, we assumed that the stationary Gaussian detector noise and 100% duty cycle. We also simplified the waveform model, e.g., the frequency change df/dtdf/dt is not incorporated. In spite of these simplifications, the obtained results can be regarded as a proof-of-principle and are enough to convince that our method has the potential for improving the all-sky search for continuous gravitational waves with the duration of O(107)O(10^{7}) sec. Relaxing these assumptions is beyond the scope of this paper and left as future work.

Acknowledgements.
We thank Yousuke Itoh for fruitful discussions. This work was supported by JSPS KAKENHI Grant Number JP17H06358 (and also JP17H06357), A01: Testing gravity theories using gravitational waves, as a part of the innovative research area, “Gravitational wave physics and astronomy: Genesis”, and JP20K03928. Some part of calculation has been performed by using a GeForce 1080Ti GPU at Nagaoka University of Technology.

Appendix A Noise statistics in \ell-domain

In general, the power spectral density of a stochastic process n(t)n(t) is defined by

n~(f)n~(f)=:12Sn(f)δ(ff),\langle\tilde{n}(f)\tilde{n}^{\ast}(f^{\prime})\rangle=:\frac{1}{2}{S}_{\mathrm{n}}(f)\delta(f-f^{\prime})\,, (97)

where the Fourier transform of n(t)n(t) is defined by

n~(f)=𝑑tn(t)e2πift,\tilde{n}(f)=\int^{\infty}_{-\infty}dt\ n(t)e^{-2\pi ift}\,, (98)

while we define the STFT by Eq. (37). Ignoring the effect of the window function, the variance of nj,kSTFT{n}^{\mathrm{STFT}}_{j,k} can be approximated by

(nj,kSTFT)(nj,kSTFT)=12TsegSn(fk)δkkδjj.\langle({n}^{\mathrm{STFT}}_{j,k})({n}^{\mathrm{STFT}}_{j^{\prime},k^{\prime}})^{\ast}\rangle=\frac{1}{2{T}_{\mathrm{seg}}}{S}_{\mathrm{n}}(f_{k})\delta_{kk^{\prime}}\delta_{jj^{\prime}}\,. (99)

Here, we assume that different STFT bins are statistically independent. The variance of 𝖭,k\mathsf{N}_{\ell,k} is

𝖭,k𝖭,k\displaystyle\langle\mathsf{N}_{\ell,k}\mathsf{N}^{\ast}_{\ell^{\prime},k^{\prime}}\rangle
=1Nseg2j=1Nsegj=1Nseg(nj,kSTFT)(nj,kSTFT)e2πi(jj)/Nseg\displaystyle=\frac{1}{{N}_{\mathrm{seg}}^{2}}\sum_{j=1}^{{N}_{\mathrm{seg}}}\sum_{j^{\prime}=1}^{{N}_{\mathrm{seg}}}\langle({n}^{\mathrm{STFT}}_{j,k})({n}^{\mathrm{STFT}}_{j^{\prime},k^{\prime}})^{\ast}\rangle e^{-2\pi i(j\ell-j^{\prime}\ell^{\prime})/{N}_{\mathrm{seg}}}
=Sn(fk)2TsegNsegδδkk.\displaystyle=\frac{{S}_{\mathrm{n}}(f_{k})}{2{T}_{\mathrm{seg}}{N}_{\mathrm{seg}}}\delta_{\ell\ell^{\prime}}\delta_{kk^{\prime}}\,. (100)

Therefore, we get

σ~k2=2𝖭,k𝖭,k=Sn(fk)TsegNseg.\tilde{\sigma}_{k}^{2}=2\langle\mathsf{N}_{\ell,k}\mathsf{N}^{\ast}_{\ell,k}\rangle=\frac{{S}_{\mathrm{n}}(f_{k})}{{T}_{\mathrm{seg}}{N}_{\mathrm{seg}}}\,. (101)

References

  • Abbott et al. (2016) B. P. Abbott et al. (LIGO Scientific, Virgo), Phys. Rev. Lett. 116, 131103 (2016), eprint 1602.03838.
  • Abbott et al. (2019a) B. P. Abbott et al. (LIGO Scientific, Virgo), Phys. Rev. X 9, 031040 (2019a), eprint 1811.12907.
  • Abbott et al. (2020a) R. Abbott et al. (LIGO Scientific, Virgo) (2020a), eprint 2010.14527.
  • Aso et al. (2013) Y. Aso, Y. Michimura, K. Somiya, M. Ando, O. Miyakawa, T. Sekiguchi, D. Tatsumi, and H. Yamamoto (KAGRA), Phys. Rev. D 88, 043007 (2013), eprint 1306.6747.
  • Bala et al. (2011) I. Bala, S. Tarun, U. CS, D. Sanjeev, R. Sendhil, and S. Anand (2011), URL https://dcc.ligo.org/LIGO-M1100296/public.
  • Abbott et al. (2020b) B. P. Abbott et al. (KAGRA, LIGO Scientific, Virgo), Living Rev. Rel. 23, 3 (2020b).
  • Abbott et al. (2019b) B. P. Abbott et al. (LIGO Scientific, Virgo), Phys. Rev. X 9, 011001 (2019b), eprint 1805.11579.
  • Abbott et al. (2020c) R. Abbott et al. (LIGO Scientific, Virgo), Astrophys. J. Lett. 900, L13 (2020c), eprint 2009.01190.
  • Abbott et al. (2020d) R. Abbott et al. (LIGO Scientific, Virgo), Population Properties of Compact Objects from the Second LIGO-Virgo Gravitational-Wave Transient Catalog (2020d), eprint 2010.14533.
  • Will (2014) C. M. Will, Living Rev. Rel. 17, 4 (2014), eprint 1403.7377.
  • Abbott et al. (2019c) B. P. Abbott et al. (LIGO Scientific, Virgo), Phys. Rev. D 100, 104036 (2019c), eprint 1903.04467.
  • Abbott et al. (2020e) R. Abbott et al. (LIGO Scientific, Virgo) (2020e), eprint 2010.14529.
  • Maggiore (2000) M. Maggiore, Phys. Rept. 331, 283 (2000), eprint gr-qc/9909001.
  • Sathyaprakash and Schutz (2009) B. S. Sathyaprakash and B. F. Schutz, Living Rev. Rel. 12, 2 (2009), eprint 0903.0338.
  • Maggiore (2007) M. Maggiore, Gravitational Waves. Vol. 1: Theory and Experiments, Oxford Master Series in Physics (Oxford University Press, 2007), ISBN 978-0-19-857074-5, 978-0-19-852074-0.
  • Creighton and Anderson (2011) J. D. E. Creighton and W. G. Anderson, Gravitational-wave physics and astronomy: An introduction to theory, experiment and data analysis (2011).
  • Zhu et al. (2020) S. J. Zhu, M. Baryakhtar, M. A. Papa, D. Tsuna, N. Kawanaka, and H.-B. Eggenstein, Phys. Rev. D 102, 063020 (2020), eprint 2003.03359.
  • Arvanitaki et al. (2015) A. Arvanitaki, M. Baryakhtar, and X. Huang, Phys. Rev. D 91, 084011 (2015), eprint 1411.2263.
  • Brito et al. (2017) R. Brito, S. Ghosh, E. Barausse, E. Berti, V. Cardoso, I. Dvorkin, A. Klein, and P. Pani, Phys. Rev. D 96, 064050 (2017), eprint 1706.06311.
  • Jaranowski et al. (1998) P. Jaranowski, A. Krolak, and B. F. Schutz, Phys. Rev. D 58, 063001 (1998), eprint gr-qc/9804014.
  • Krishnan et al. (2004) B. Krishnan, A. M. Sintes, M. A. Papa, B. F. Schutz, S. Frasca, and C. Palomba, Phys. Rev. D 70, 082001 (2004), eprint gr-qc/0407001.
  • Astone et al. (2014) P. Astone, A. Colla, S. D’Antonio, S. Frasca, and C. Palomba, Phys. Rev. D 90, 042002 (2014), eprint 1407.8333.
  • Abbott et al. (2017) B. P. Abbott et al. (LIGO Scientific, Virgo), Phys. Rev. D 96, 062002 (2017), eprint 1707.02667.
  • Abbott et al. (2018) B. P. Abbott et al. (LIGO Scientific, Virgo), Phys. Rev. D 97, 102003 (2018), eprint 1802.05241.
  • Abbott et al. (2019d) B. P. Abbott et al. (LIGO Scientific, Virgo), Phys. Rev. D 100, 024004 (2019d), eprint 1903.01901.
  • George and Huerta (2018) D. George and E. A. Huerta, Phys. Rev. D 97, 044039 (2018), eprint 1701.00008.
  • Gabbard et al. (2019) H. Gabbard, C. Messenger, I. S. Heng, F. Tonolini, and R. Murray-Smith (2019), eprint 1909.06296.
  • Chua and Vallisneri (2020) A. J. K. Chua and M. Vallisneri, Phys. Rev. Lett. 124, 041102 (2020), eprint 1909.05966.
  • Nakano et al. (2019) H. Nakano, T. Narikawa, K.-i. Oohara, K. Sakai, H.-a. Shinkai, H. Takahashi, T. Tanaka, N. Uchikata, S. Yamamoto, and T. S. Yamamoto, Phys. Rev. D 99, 124032 (2019), eprint 1811.06443.
  • Yamamoto and Tanaka (2020) T. S. Yamamoto and T. Tanaka (2020), eprint 2002.12095.
  • George et al. (2018) D. George, H. Shen, and E. A. Huerta, Phys. Rev. D 97, 101501(R) (2018).
  • Chua et al. (2020) A. J. K. Chua, M. L. Katz, N. Warburton, and S. A. Hughes (2020), eprint 2008.06071.
  • Dreissigacker et al. (2019) C. Dreissigacker, R. Sharma, C. Messenger, R. Zhao, and R. Prix, Phys. Rev. D 100, 044009 (2019), eprint 1904.13291.
  • Dreissigacker and Prix (2020) C. Dreissigacker and R. Prix, Phys. Rev. D 102, 022005 (2020), eprint 2005.04140.
  • Morawski et al. (2019) F. Morawski, M. Bejger, and P. Ciecieląg (2019), eprint 1907.06917.
  • Beheshtipour and Papa (2020) B. Beheshtipour and M. A. Papa, Phys. Rev. D 101, 064009 (2020), eprint 2001.03116.
  • Nakano et al. (2003) H. Nakano, H. Takahashi, H. Tagoshi, and M. Sasaki, Phys. Rev. D 68, 102003 (2003), eprint gr-qc/0306082.
  • Anderson et al. (2001) W. G. Anderson, P. R. Brady, J. D. E. Creighton, and E. E. Flanagan, Phys. Rev. D 63, 042003 (2001), eprint gr-qc/0008066.
  • Goodfellow et al. (2016) I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning (MIT Press, 2016), http://www.deeplearningbook.org.
  • Jarrett et al. (2009) K. Jarrett, K. Kavukcuoglu, M. Ranzato, and Y. LeCun, in 2009 IEEE 12th International Conference on Computer Vision (ICCV) (IEEE Computer Society, Los Alamitos, CA, USA, 2009), pp. 2146–2153, ISSN 1550-5499, URL https://doi.ieeecomputersociety.org/10.1109/ICCV.2009.5459469.
  • LeCun and Bengio (1998) Y. LeCun and Y. Bengio, in The Handbook of Brain Theory and Neural Networks (MIT Press, Cambridge, MA, USA, 1998), p. 255–258, ISBN 0262511029.
  • Zhou and Chellappa (1988) Y. Zhou and R. Chellappa, in IEEE 1988 International Conference on Neural Networks (1988), pp. 71–78.
  • Polyak (1964) B. T. Polyak, USSR Computational Mathematics and Mathematical Physics 4, 1 (1964), ISSN 0041-5553, URL http://www.sciencedirect.com/science/article/pii/0041555364901375.
  • Hinton (2012) G. Hinton, Neural networks for machine learning (2012).
  • Kingma and Ba (2017) D. P. Kingma and J. Ba (2017), eprint 1412.6980.
  • Rumelhart et al. (1986) D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Nature 323, 533 (1986), ISSN 1476-4687, URL https://doi.org/10.1038/323533a0.
  • Paszke et al. (2019) A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al., in Advances in Neural Information Processing Systems 32, edited by H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Curran Associates, Inc., 2019), pp. 8024–8035.
  • Dupuis and Woan (2005) R. J. Dupuis and G. Woan, Phys. Rev. D 72, 102002 (2005), eprint gr-qc/0508096.
  • Wette (2012) K. Wette, Phys. Rev. D 85, 042003 (2012), eprint 1111.5650.
  • Dreissigacker et al. (2018) C. Dreissigacker, R. Prix, and K. Wette, Phys. Rev. D 98, 084058 (2018), eprint 1808.02459.