This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Multi-task multi-station earthquake monitoring: An all-in-one seismic Phase picking, Location, and Association Network (PLAN)

Xu Si1, Xinming Wu1∗, Zefeng Li1∗, Shenghou Wang2 and Jun Zhu1
1School of Earth and Space Sciences, University of Science and Technology of China,
Hefei, China.
2Key Laboratory of Tectonics and Petroleum Resources of Ministry of Education,
China University of Geosciences, Wuhan, China.

To whom correspondence should be addressed:
E-mail: xinmwu@ustc.edu.cn and zefengli@ustc.edu.cn
Abstract

Earthquake monitoring is vital for understanding the physics of earthquakes and assessing seismic hazards. A standard monitoring workflow includes the interrelated and interdependent tasks of phase picking, association, and location. Although deep learning methods have been successfully applied to earthquake monitoring, they mostly address the tasks separately and ignore the geographic relationships among stations. Here, we propose a graph neural network that operates directly on multi-station seismic data and achieves simultaneous phase picking, association, and location. Particularly, the inter-station and inter-task physical relationships are informed in the network architecture to promote accuracy, interpretability, and physical consistency among cross-station and cross-task predictions. When applied to data from the Ridgecrest region and Japan regions, this method showed superior performance over previous deep learning-based phase-picking and localization methods. Overall, our study provides for the first time a prototype self-consistent all-in-one system of simultaneous seismic phase picking, association, and location, which has the potential for next-generation autonomous earthquake monitoring.

Introduction

Earthquake monitoring is one of the most fundamental operations in seismology. A standard earthquake monitoring workflow involves a series of steps to detect and characterize earthquakes, including phase picking, association, and event location (Beroza et al., 2021; Mousavi and Beroza, 2022; Zhu et al., 2022c). Phase picking, a conceptually simple task which is akin to detection problems in computer vision, has recently been improved through deep learning (Zhu et al., 2022c; Ross et al., 2018b, a; Zhu and Beroza, 2018; Mousavi et al., 2019; Zhu et al., 2019; Pardo et al., 2019; Wang et al., 2019; Liu et al., 2020; Mousavi et al., 2020; Yang et al., 2021; Yano et al., 2021; Zhu et al., 2022a; Bilal et al., 2022; Feng et al., 2022; Münchmeyer et al., 2022), where convolutional neural networks (CNNs) (Krizhevsky et al., 2017) are typically used. After the phase picking, traditional (Arora et al., 2013; Gibbons et al., 2016; Zhang et al., 2019; Zhu et al., 2022b) and deep-learning-based (Ross et al., 2019; McBrearty et al., 2019a; Yu and Wang, 2022) phase association algorithms have been used to link seismic phases at multiple stations from the same events. Finally, location algorithms (Bakun and Wentworth, 1997) utilize the associated phases to obtain the earthquake hypocenters, although some deep-learning-based methods directly process raw data to locate earthquakes (Zhang et al., 2014; DeVries et al., 2018; Perol et al., 2018; Lomax et al., 2019; Mousavi and Beroza, 2019; Zhang et al., 2020; Münchmeyer et al., 2021).

These three tasks (phase picking, association, and location) are closely interdependent. The accuracy of multi-station phase picking affects the accuracy of association and location. Conversely, association and location impose constraints on multi-station phase picking. Additionally, phase picking with multi-station data can further utilize the geographic relationships and waveform similarities among multiple stations. To achieve more efficient and accurate earthquake monitoring, a suitable earthquake monitoring workflow should impose inter-task and inter-station constraints and preferrably perform all three tasks simultaneously at all stations. However, most existing earthquake monitoring methods perform phase picking, association, and earthquake location separately. In addition, most of the current phase-picking methods process seismic data on a station-by-station basis. While some recent graph-based approaches (McBrearty et al., 2019b; van den Ende and Ampuero, 2020; McBrearty and Beroza, 2022b, 2023; Zhang et al., 2022) have demonstrated the ability to handle irregularly spaced stations for phase association and event location, it remains a challenging task to develop a method that effectively leverages inter-task and inter-station constraints, and ideally performs all three tasks simultaneously.

Here, we propose an all-in-one earthquake monitoring system called seismic Phase picking, Location, and Association Network (PLAN) that achieves for the first time the simultaneous implementation of the three tasks with multi-station data and inter-task constraints. PLAN consists of four interdependent neural network modules. Specifically, the first module of waveform feature extraction utilizes an encorder-decoder architecture to extract relevant features from multi-station seismic data. The second module of earthquake location encodes station locations (i.e., longitude, latitude, and elevation) and merges them with waveform features from the first module to predict the earthquake depth and epicentral distance for each station. The third module of phase association utilizes the predicted earthquake location information to estimate the time shifts required to align multi-station waveform features. Finally, the fourth module of phase picking aggregates the aligned features for simultaneous multi-station phase picking. We applied PLAN in the Ridgecrest and Japan regions and compared its efficiency and accuracy with that of state-of-the-art phase-picking and event location methods, demonstrating the merits of inter-station and inter-task constraints for accurate earthquake monitoring.

Results

Network architecture

Refer to caption
Figure 1: The flowchart of the proposed multi-task and multi-station PLAN for earthquake monitoring. The input data for the model comprise seismic waveforms recorded by multiple stations and the locations of these stations. PLAN consists of four sub-modules: waveform feature extraction encoder and decoder, an earthquake location module, a multi-station association module and a physics-informed multi-station phase-picking module. All of these sub-modules are optimized simultaneously and constrained by each other during training to improve performance in earthquake detection, association, and location.

The proposed multi-station multi-task PLAN (Fig. 1) employs a Graph neural network (GNN) (Kipf and Welling, 2016) as the backbone to integrate the four functional modules of waveform feature extraction, earthquake location, multi-station association, and a physics-informed multi-station phase picking (further details are provided in the Methods section). Compared with CNN, GNN is naturally suited for handling seismic data acquired from irregularly spaced stations whose quantity and location may vary (McBrearty and Beroza, 2023). In constructing our GNN, we define the graph nodes with seismic stations and define the feature vector for each node with the location and recoded seismic signal of the corresponding station. All nodes (corresponding to the seismic stations) are linked together and the linking weights are learnt during training to infer the relationships among the stations. We construct the GNN layers with TransformGConvs (Shi et al., 2020), which are designed based on an attention mechanism (Vaswani et al., 2017) to learn the dynamically linking weights among different stations. (details about TransformGConvs are provided in the Methods section). In addition, the graph nodes are not fixed so that the GNN could be adapted to variations in the station number and location.

Raw three-component seismic signals are input into the graph nodes. The front-end waveform feature extraction module, constructed as an encoder-decoder CNN and shared among the nodes, extracts their corresponding key features. The station feature extraction block, constructed as two MLPs and shared among the nodes, extract geographic features from the normalized input longitudes, latitudes, and elevations of the stations. The earthquake location module then concatenates the extracted waveform and geographic features and employs multiple TransformGConvs to aggregate these features from multiple nodes to predict the event depth and station-event offset. The predicted offsets and depth are further used to determine the event location by a triangulation algorithm (Yu and Segall, 1996). In the earthquake location module, we do not directly predict the event location but the station-event offsets and depth instead, because we will use them as input into a followed multi-station association module to estimate the time shifts needed to align the P and S arrivals.

This multi-station association module plays a key role in bridging the tasks of earthquake location and multi-station phase picking and introduces physical constraints between the two tasks. Prior to aggregating the waveform features from different stations for multi-station phase picking, the features corresponding to the same earthquake are required to be initially aligned or associated; otherwise, aggregation of unaligned features could mutually interfere and ultimately degrade the picking results. The multi-station phase-picking module includes a non-trainable physical layer, implemented with the Pytorch (Paszke et al., 2019) roll function, to shift and align the waveform features (from the decoder of the waveform feature extraction module) using the time shifts. Subsequently, multiple TransformGConvs in the phase-picking module aggregate the aligned waveform features to enhance the phase-picking features in the aligned space. Eventually, another physical layer unshifts the aggregated features back to the original space, followed by two convolutional layers to obtain the P/S-wave picks at all the stations.

Three regression loss functions are defined for the three modules of phase picking, association, and earthquake location and then combined to jointly train the entire network. Because all the modules are interconnected within the entire network, the training process finds an optimal network that could perform all the tasks both accurately and consistently. Moreover, after training, the multi-station association module could be detached from the network and utilized to calculate the S-P differential travel time with inputs of offsets and event depth. Further details on this module are provided in the section titled “Multi-station association module”.

Data preparation

We tested the proposed PLAN in two regions of Ridgecrest and Japan. For the Ridgecrest region (Fig. 2A), seismic recordings from 16 California Integrated Seismic Network stations within an epicentral distance of < 80 km were collected from July 4, 2019, to October 4, 2019, for a total of more than 71,000 M > 0 earthquakes. The data for Japan (Fig. 3A) included M > 2 earthquakes that occurred between January 1, 2011, and December 31, 2011, including the Mw 9.1 Tohoku sequence. We collected the 3-component High Sensitivity Seismograph Network (NIED Hi-net) (Obara et al., 2005; Aoi et al., 2020) data from over 35,000 events. Subsequently, the data were randomly divided into training, validation, and test sets (85%, 5%, and 10%, respectively) in both regions.

The number of stations corresponding to each event in the training samples varied, and the trained network can flexibly handle situations where the number of stations changes in actual data. Further, the distributions of the number of stations per event in the training and test sets were balanced. The results for the test sets in the two regions are presented in Figs. 5 and 6, respectively. To accommodate different range scales in the two study regions, we used different window lengths in two regions (30.72 s for Ridgecrest and 61.44 s for Japan) with the same sampling frequency (100 Hz).

To ensure a fair comparison with the existing phase-picking methods, we followed the same data preprocessing procedures used in previous studies (Zhu and Beroza, 2018; Mousavi et al., 2020). This involved normalizing the data by removing the mean and dividing by the standard deviation, and using a Gaussian-shaped target function as training labels for the P/S-phase arrival times. The use of Gaussian-shaped targets is effective in phase picking  (Zhu and Beroza, 2018; Mousavi et al., 2020), and in our study, the probabilities of P wave and S wave arrival times were set to 1 at the first arrival time and decreased to 0 within a 20 sample window before and after each phase arrival.

Application to the 2019 Ridgecrest sequence

We compared the performance of PLAN with that of other established deep learning methods for earthquake phase picking (PhaseNet (Zhu and Beroza, 2018) and EQTransformer (Mousavi et al., 2020)) and location (Aggreated-GNN (van den Ende and Ampuero, 2020)). All of the methods were retrained on the same training set and evaluated on a common test set. As shown in the Ridgecrest application in Fig. 2B-2C, the performance of PLAN in phase picking was superior to that of the other two deep learning-based methods. Specifically, the residual distribution of the P-wave picks for PLAN was more concentrated than that of the other methods, indicating a higher overall accuracy. For S-wave picks, PLAN performed significantly better than EQTransformer because the distribution of PLAN was narrower whereas the difference in performance between PLAN and PhaseNet was relatively minor.

Refer to caption
Figure 2: Distributions of phase picking and location residuals in Ridgecrest region. (A) Distribution of 16 stations (black triangles) and training event locations (red circles) used in our study, where the events occurred between July 4, 2019 and October 4, 2019. (B and C) The results of P-wave and S-wave arrival time residual, respectively. The blue, green, and orange lines in (B and C) represent the arrival time residuals for PLAN, PhaseNet, and EQTransform, respectively. The proposed method yields the most accurate results in P/S-wave picking. (D and E) The offset and depth residuals between model predictions and Southern California Seismic Network (SCSN) catalog of the located events. Regardless of the offset or depth, the residual distribution of PLAN (blue line) is more concentrated at zero than that of Aggregated-GNN (orange line).

In terms of localization, our method (PLAN) outperformed the other approaches (Fig. 2D-2E; Table 1). The distribution of PLAN was notably more concentrated than that of Aggregated-GNN, particularly in terms of offset prediction. To further demonstrate the effectiveness of TransformGConv, we replaced the TransformGConv layers (Fig. S1) with GCN (Kipf and Welling, 2016), SAGE (Hamilton et al., 2017), and GATv2 (Brody et al., 2021) respectively. Among the various methods compared, our method yielded the best results in terms of offset, with an average error of 1.09 km and a standard deviation of 1.41 km. Furthermore, PLAN also outperformed Aggregated-GNN in terms of depth localization, regardless of whether it was based on GCN, GATv2, or TransformGConv. These results demonstrated the superiority of the proposed PLAN in location estimations.

Furthermore, we used three metrics of mPrecision, mRecall, and mF1 (described in the Methods section), to quantitatively evaluate the performance of the five methods (Table 2). In five of the six metric scores for the P-wave and S-wave picking results, our attention mechanism-based GNN method outperformed the other methods. The only exception was the mPrecision metric of P-wave picking, where the EQTransformer showed slightly higher scores than PLAN. Notably, even the simplified version of the multi-station phase-picking method, such as the SAGE-based PLAN, outperformed both the single station-based picking methods of EQTransformer and PhaseNet in mF1 scores for S-wave picking. This indicated that the phase-picking accuracy is significantly improved by multi-station picking, which effectively utilizes inter-station contextual information. We not only adjusted the time threshold while maintaining a constant picking probability for evaluation but also fixed the time threshold and changed the probability of picking to calculate and plot the precision-recall curves for four models (Fig. S2). Consistent with the aforementioned results, the PLAN model, based on TransformGConv, demonstrated superior performance in terms of F1, encompassing both P-wave and S-wave.

Table 1: Location Performance in Ridgecrest and Japan regions.
Offset MAE (km) Depth MAE (km)
Region Method Mean Std Mean Std
Aggregated-GNN 2.30 2.30 1.98 1.55
GCN 8.96 6.94 1.43 1.34
GATv2 8.95 6.90 1.42 1.33
SAGE 1.28 2.09 1.68 1.42
Ridgecrest PLAN Trans 1.09 1.41 1.43 1.33
Aggregated-GNN 27.92 26.78 10.30 8.95
GCN 21.30 16.47 13.50 9.95
GATv2 10.79 10.82 5.59 6.86
SAGE 4.87 6.11 12.85 9.41
Hinet PLAN Trans 4.81 5.83 10.62 8.89
00footnotetext: Note: Red and bold values represent the best performance.
Table 2: Detection Performance in Ridgecrest region.
P Picking Metrics S Picking Metrics
Method mPrecision mRecall mF1 mPrecision mRecall mF1
PhaseNet 94.83 92.78 93.79 84.50 80.65 82.53
EQtransformer 95.43 91.17 93.25 86.77 78.21 82.27
PLAN-GATv2 95.05 93.02 94.03 85.55 80.49 82.95
PLAN-SAGE 94.99 93.07 94.02 85.65 81.48 83.51
PLAN-Trans 94.65 94.90 94.77 86.88 84.94 85.90
00footnotetext: Note: The picking probability is 0.3. Red and bold values represent the best performance.

Application to seismicity in Japan

We retrained all the methods on the Japan training set for the evaluation. Compared to its performance in the Ridgecrest region, PLAN exhibited an even better performance in Japan (Fig. 3B-3C). Further, PLAN demonstrated a remarkably better performance than PhaseNet and EQTransformer for both P- and S-wave picks. The offset predicted by PLAN was notably more accurate than that predicted by Aggregated-GNN, with higher and narrower residuals (Fig. 3D-3E). In terms of depth estimation, although PLAN maintained a narrower residual distribution, the highest point of the distribution was shifted systematically, compared with the Aggregated-GNN method. Table 1 presents the comprehensive quantitative comparison of the results. Although the TransformGConv-based PLAN method did not demonstrate particular superiority in depth estimation, it excelled in offset estimation. Further, the GATv2-based PLAN showed the lowest depth error, indicating potential improvement of localization capabilities of the proposed PLAN.

Refer to caption
Figure 3: Distributions of phase picking and location residuals in Japan. (A) Distribution of stations (black triangles) and training event locations (red circles) used in our study, where the events occurred between January 1, 2011 and December 31, 2011. (B and C) Similar to Fig. 2, (B and C) show the results of P-wave and S-wave arrival time residuals, respectively. (D and C) The offset and depth residuals between model predictions and the Japan Meteorological Agency (JMA) catalog of located events, respectively.
Table 3: Detection Performance in Japan.
P Picking Metrics S Picking Metrics
Method mPrecision mRecall mF1 mPrecision mRecall mF1
PhaseNet 94.87 94.97 94.92 88.26 84.38 86.28
EQtransformer 95.91 94.79 95.35 89.08 84.40 86.68
PLAN-GATv2 95.14 93.81 94.47 88.35 81.87 84.98
PLAN-SAGE 95.65 94.90 95.27 88.45 84.19 86.27
PLAN-Trans 95.79 95.14 95.46 88.41 85.09 86.72
00footnotetext: Note: The picking probability is 0.3. Red and bold values represent the best performance.

Similar to the Ridgecrest example, we assessed the phase-picking performance of various models applied to the test data from Japan using mPrecision, mRecall, and mF1 metrics (Table 3) and precision-recall curves (Fig. S3). The TransformGConv-based PLAN model achieved superior results in terms of mRecall (95.14 for P-waves and 85.09 for S-waves) and mF1 (95.46 for P-waves and 86.72 for S-waves), whereas EQTransformer performed best in terms of mPrecision of P-waves and S-waves. TransformGConv-based PLAN demonstrated high mRecall scores, indicating that a large proportion of the samples containing P/S-waves were correctly detected. However, this was achieved at the expense of a slightly lower mPrecision compared to that of the EQTransformer, with some non-P/S-waves incorrectly classified as P/S-waves. The mF1 score provided a more comprehensive evaluation of the model performance, considering both the reduction in missed detections and the increase in correct detections. In this context, TransformGConv-based PLAN had the highest F1 score, indicating that it effectively reduced the missed detections of P/S-waves and increased the proportion of correct detections.

Multi-station association module

To simultaneously pick P/S-waves from multiple stations, PLAN utilizes GNNs, which typically aggregates raw signals received at different stations. Feature aggregation across multiple stations introduces inter-station constraints and enhances the features at each station. However, because of different travel times of the same source across different stations, directly aggregating the signals from multiple stations would deteriorate multi-station picking.

To address this issue, the proposed method employes a multi-station association module to estimate the time shifts as illustrated in Fig. 1. The input to this module is the offset of each station with respect to the event and its depth. The module output is the corrected time shifts of the P/S-wave for each station. To align the P/S-wave features, the correction criteria for the P/S-wave features were set at the 10 and 15 s in the Ridgecrest region and at the 20 and 32 s in Japan. Using these criteria, the multi-station association module was trained to estimate the corrections of P/S-wave for each station. These corrections are then used to align the waveform features, enabling the graph convolution to aggregate the features in a temporally aligned space. Consequently, the method could enhance or compensate for the features at each station by fusing the aligned features from other stations, allowing simultaneous and accurate multi-station picking.

The multi-station association module can be utilized independently after training. It converts the distance and depth information into arrival information and calculates the S-P differential travel time (Crotwell et al., 1999). Figure 4 illustrates the arrival time differences of the P/S-waves at various stations in Japan. Although the training process utilizes a maximum of 37 stations for a single event, the module can be adapted to cases with any number of stations (e.g., hundreds of stations shown in Fig. 4A - Fig. 4D) to estimate the P/S-wave arrival time differences for all stations. These results indicated that the module effectively enforced physical constraints based on time shifts within the overall network.

Refer to caption
Figure 4: Arrival time residuals of P/S-waves at various stations in the test dataset of Japan. (A to D) Arrival time differences for each event calculated by multi-station association module, respectively, while (E to H) show manually picked arrival time differences for the same events. The numbers on the top of the figures represent the identification numbers of the events in the Japan Meteorological Agency (JMA) catalog. Above results indicate that the S-P differential arrival time predicted by our multi-station association module are consistent with the manually picking differences. They also demonstrate that our module could calculate the arrival time difference for a larger number of stations.

Discussion

Refer to caption
Figure 5: Comparison of prediction results using different numbers of stations in the Ridgecrest region. (A to D) The distributions of prediction errors for P-wave, S-wave, offset, and depth, respectively. The x-axis represents the number of stations, the primary y-axis denotes the number of events recorded by a specific number of stations, and the secondary y-axis represents the prediction errors for phase picking and event localization. Note that for phase picking, prediction residuals of PLAN (blue curves) decrease evidently as the number of stations increases. Moreover, the location errors of PLAN are significantly smaller than those of the Aggregated-GNN method.

PLAN is scalable for accommodating various numbers of stations per event. As PLAN is a network level picking and location model, we further investigated the effect of different numbers of stations on the network performance for phase picking and earthquake location with a test set of the Ridgecrest region (Fig. 5). We calculated the P- and S-wave picking residuals of the three different methods relative to the manual picking results, respectively (Fig. 5A-5B). The residuals of the single-station-based picking methods, PhaseNet and EQTransformer, exhibited oscillations for samples with station numbers 3-13 as the number of stations increased. Contrastingly, the residuals of our simultaneous multi-station picking method, PLAN, exhibited a significant residual decrease as the number of stations increased. Although the prediction residuals of the single-station-based methods should not be significantly associated with the number of stations, their prediction residuals still decreased when the number of stations was 13-16. This was probably because the events recorded by more stations tended to be larger and easier to pick.

A comparison of the distribution of prediction errors for earthquake offsets and depths with respect to the number of stations indicated that the errors in PLAN were significantly smaller than those in the Aggregated-GNN method (Fig. 5C-5D). However, the errors in offset prediction did not exhibit a significant decrease with an increase in the number of stations. This was likely because a large number of stations would include more distant ones that tended to have large offset prediction errors. As the offset error metric is defined as the average value acquired from multiple stations, an increase in the number of stations can lead to a slightly higher average error for a single event.

Furthermore, the statistical results for Japan (Fig. 6) were similar to those for the Ridgecrest region, with the PLAN method exhibiting smaller phase-picking errors than EQTransformer and PhaseNet, especially for S-wave picking. In addition, as the number of stations increased, the offset prediction error of PLAN became significantly smaller than that of the Aggregated-GNN.

Refer to caption
Figure 6: Comparison of prediction results using different number of stations in Japan. (A to D) The four distributions are similar to those described in Fig. 5 and the only difference is that we have used logarithmic coordinates for the secondary y-axis in (A) and (B).

The ability of our network to handle varying numbers of stations can be attributed to the multi-station association module, which can be separated from the entire network and utilized in a manner similar to the Taup algorithm for estimating the arrival time of earthquakes at stations. Differing from the Taup algorithm, our association module does not depend on an input velocity model. Instead, it empowers the network to comprehend the concept of velocity, enabling the conversion of offsets into relative time shifts. Additionally, unlike the sequential processing of one station at a time in the Taup algorithm, our module simultaneously calculates the time shifts for multiple stations associated with a single event. In essence, our association module can be considered a computationally efficient 3D Taup algorithm that operates without requiring a velocity model. To evaluate the estimation accuracy of the arrival time using this module, we applied the estimated time shifts to align different stations (Fig. S4). Because the multi-station association module can accurately estimate the arrival time, the original waveforms from all stations were aligned accordingly.

Refer to caption
Figure 7: Comparison of the P/S-wave arrival time estimation using different methods. (A and B) The crossplots of the S-P differential arrival times computed by the TauP model with the input of offsets and depths from manual labels and network predications, respectively. The x-axis represents the manually picked S-P differential arrival times. The y-axis represents the S-P differential arrival times obtained by different methods. (C and D) The crossplots of the S-P differential arrival times predicted by our multi-station association module, and they are consistent with those predicted by the TauP model. This indicates that the multi-station association module, detached from the entire trained PLAN model, works physically reasonable compared to the commonly used TauP model.

To further evaluate the estimation accuracy of the arrival time using this module, we employed the TauP algorithm based on the PREM model (Dziewonski and Anderson, 1981) for comparison (Fig. 7). We also calculated the correlation coefficient (R) between the output of each method and the manually picked P/S-wave time differences. During the training process, this module used the offsets and depths obtained from the earthquake location module as inputs. Therefore, inputting the predicted offsets and depths into this module (Fig. 7D) could yield better P/S-wave time differences than inputting the labeled offsets and depths (Fig. 7C). The multi-station association module with manually labeled offsets and depths yielded less consistent results than the TauP algorithm. This discrepancy may not be solely attributable to errors in the deep learning estimation. Label inaccuracies may have also contributed to this outcome. This assertion was supported by the observation that using the neural network output as the input for the TauP algorithm resulted in greater correlation coefficients than when label was employed (Fig. 7A-7B). Among all the evaluated methods, the estimation results in Fig. 7D show the highest correlation coefficients. Generally, the multi-station association module and the TauP algorithm based on the PREM model have the same level of accuracy in calculating P/S-wave time differences.

In addition, the out-of-distribution data is utilized to assess the network’s temporal generalization ability and application potential. We applied PLAN and PhaseNet to pick the aftershock sequence of the 2022 M 7.4 Fukushima earthquake in Japan (Fig. S5). Traditional deep learning-based phase picking methods, such as PhaseNet, typically focus on picking seismic phases at a single station and perform pick and association processes step-by-step. In such a sequentially independent processing flow, the accuracy of phase picking determines the precision of subsequent earthquake association and location. Additionally, the consistency constraints among multiple stations for the same seismic event in the subsequent association process cannot be fed back to the previous step of phase picking to improve its accuracy. Without the consistency constraints among multiple stations, the phase picking station-by-station may be sensitive to noise and fail to pick reasonably pick multiple P/S waves at stations with short epicental distances as shown in Fig. S5A-S5B. In contrast, PLAN is a simultaneous multi-station and multi-task processing approach that allows the aggregation of information from all stations and tend to pick P/S waves that can be consistently associated to the same events recorded by all the stations, leading to more reliable picking results. Furthermore, during the network training process, the picking, association, and location modules in the PLAN network mutually constrain and provide feedback to each other. As a result, a coherent and adaptive organic system can be achieved, simultaneously accomplishing these three tasks in a cohesive and integrated manner.

Some limitations remain in our method, particularly in handling continuous waveform data. This is mainly related to the construction of our training dataset and the corresponding training strategy. In our training data, although we allow for varying numbers of stations and missing phase picking labels for some stations, we assume that each training sample contains only one earthquake event. We overlook the scenario where the sample data does not contain an earthquake event (only noise), and when the data contains multiple events, we only provide labels for the main event. These considerations simplify the process of constructing our training samples but limit the diversity of the samples and the ability of the trained network model to handle continuous waveform data. However, we believe that these limitations can be addressed by using a more diverse and comprehensive training dataset.

In summary, we present a novel all-in-one multi-task multi-station system called PLAN for earthquake monitoring, which is capable of simultaneous phase picking, phase association, and earthquake location. Unlike current CNN-based methods that perform phase picking station-by-station, phase association and location separately, our proposed GNN-based multi-station multi-task system best utilizes the inherent inter-task and inter-station constraints. The multi-station association module estimates the phase shift and improves the robustness and accuracy of the phase association process. Eventually, the resulting offsets and depth enables accurate event localization. Our method demonstrates the need to factor mutual constraints among tasks and stations into next-generation earthquake monitoring systems.

Methods

Graph based neural network

Several studies have shown that GNNs have the potential to deal with irregularly spaced stations for phase association and event localization (McBrearty et al., 2019b; van den Ende and Ampuero, 2020; Yano et al., 2021; McBrearty and Beroza, 2022b, 2023; Zhang et al., 2022; Bilal et al., 2022). Here, we build a graph-based network (Fig. 1) for mult-station earthquake monitoring. To utilize the GNN, we first need to change the data from the matrix format to the graph format and employ a graph-based representation of the stations, where each station is represented as a node in the graph and the three-channel data and the station location are used as the features of each node. In contrast to the current single-station processing methods (Ross et al., 2018b, a; Zhu and Beroza, 2018; Mousavi et al., 2019; Zhu et al., 2019; Pardo et al., 2019; Wang et al., 2019; Liu et al., 2020; Mousavi et al., 2020; Zhu et al., 2022a), which treat each three-channel data as an individual input sample, our approach inputs all the three-channel data received from multiple stations per event as a single sample. This allows for efficient aggregation of information from multiple stations during network training. As a result, the features of different stations could be effectively integrated using GNNs during the aggregation process.

In this study, we have evaluated various graph aggregation methods, including GCN (Kipf and Welling, 2016), GraphSAGE (Hamilton et al., 2017), GAT (Veličković et al., 2017), GATv2 (Brody et al., 2021), and TransformGCONV (Shi et al., 2020). Through this evaluation, we have determined that TransformGCONV, which is based on attention mechanism (Vaswani et al., 2017), is the most suitable module for the proposed PLAN. The message aggregation of TransformGCONV could be represented as:

𝐱i=𝐖1𝐱i+j𝒩(i)αi,j𝐖2𝐱j,\mathbf{x}_{i}^{\prime}=\mathbf{W}_{1}\mathbf{x}_{i}+\sum_{j\in\mathcal{N}(i)}\alpha_{i,j}\mathbf{W}_{2}\mathbf{x}_{j}, (1)

where 𝐱i\mathbf{x}_{i}^{\prime} represents the aggregated features at the source node, and 𝐱i\mathbf{x}_{i} and 𝐱j\mathbf{x}_{j} represent the features of the source and distant nodes before aggregation, respectively. 𝐖1\mathbf{W}_{1} and 𝐖2\mathbf{W}_{2} are the trainable matrices. In addition, the attention coefficients αi,j\alpha_{i,j} are computed via dot-product attention as follows:

αi,j=softmax((𝐖3𝐱i)(𝐖4𝐱j)d),\alpha_{i,j}={softmax}\left(\frac{\left(\mathbf{W}_{3}\mathbf{x}_{i}\right)^{\top}\left(\mathbf{W}_{4}\mathbf{x}_{j}\right)}{\sqrt{d}}\right), (2)

where 𝐖3\mathbf{W}_{3} and 𝐖4\mathbf{W}_{4} are the trainable matrices. Similar to the attention mechanism(Vaswani et al., 2017), the source feature 𝐱i\mathbf{x}_{i} and distant feature 𝐱j\mathbf{x}_{j} are transformed into query vector and key vector, respectively, using 𝐖3\mathbf{W}_{3} and 𝐖4\mathbf{W}_{4}. Compared to other graph aggregation methods, the use of the attention-based mechanism (equation 1) in TransformGCONV allows for a more fine-grained representation of the relationship between different stations, thereby improving the accuracy and efficiency of the proposed method.

Network Architecture

Here, we design a multi-station multi-task network for simultaneous phase picking, association, and location. The network (Fig. S1) comprises four components: a waveform feature extraction module, an earthquake location module, a multi-station association module, and a physics-informed multi-station phase-picking module. Similar to previous deep-learning-based phase-picking approaches (Zhu and Beroza, 2018; Mousavi et al., 2020; Zhu et al., 2022a, c), we design an encoder to extract waveform features and a decoder to produce phase-picking results. However, to address the multi-station phase-picking problem, we introduce the GNN-based TransformGCONV for aggregating features from multiple stations.

Because aligned waveform features are easily used and aggregated in GNNs for multi-station phase picking, we do not employ it in the waveform feature extraction module (Fig. S1A), where the features are relatively shifted in time. Although we use a U-shape neural network for feature extraction to solve the phase-picking problem, it could be replaced with other single-station-based phase-picking networks, such as EQTransformer. No matter what type of network architecture is used, the features extracted from the middle of the network are input into the earthquake location module, and the structure of the final few layers of the network are modified for the purpose of multi-station phase picking. Additionally, the kernel size of all convolutional layers in the waveform feature extraction network is set to 7.

For the earthquake location module (Fig. S1B), we first extract features from the normalized coordinate information of the stations within the range of [0,1] through two fully connected layers (3-48-96). Simultaneously, the waveform features extracted from each station are further processed through several convolutional layers and then flattened. Subsequently, the position and waveform features are concatenated and passed through two fully connected layers (192-192-96). This fuses the position and waveform features at each station. The fused features are further aggregated among multiple stations by several GNN layers to predict the offsets of each station with respect to the event and its depth. Because there is only one depth parameter for each sample, we add a global average pooling before the output. In summary, this module allows the integration of both location and waveform information into the feature extraction process, which is crucial for accurate event localization.

Finally, in the physics-informed multi-station phase-picking module (Fig. S1D), we incorporate physics-motivated constraints of time alignment among waveforms corresponding to the same earthquake event. We first utilize a mulit-station association module (Fig. S1C) to calculate the relative alignment shifts between stations using the estimated offsets and the depth of the event. We then use the shifts to align the waveforms to a common time standard and aggregate the features across multiple stations in the phase-picking module. Subsequently, the aggregated features at each station are unaligned and fed to two layers of convolution to yield final P/S-wave picking results. This process leverages the physical information of the event location to improve the robustness and accuracy of the multi-station phase picking.

Loss function and training details

Our multi-task learning network model has three output results corresponding to phase picking, phase association, and earthquake localization. To train the model, we define three different loss functions for these three different tasks. For phase picking, instead of using the commonly used cross-entropy, we choose the mean square error (MSE) as the loss function, which is suitable for training in multi-task problems. To estimate offsets and depth, which is similar to event localization, we also use MSE as the loss function, as suggested by previous studies (van den Ende and Ampuero, 2020; Zhang et al., 2022). Finally, to calculate the P/S-wave shift, we define the loss function as follows:

Δp=i=1nCTimep(labelpi+Δtpi),\mathcal{L}_{\Delta p}=\sum_{i=1}^{n}\mid{CTime}_{p}-\left({label}_{p_{i}}+\Delta t_{p_{i}}\right)\mid, (3)
Δs=i=1nCTimes(labelsi+Δtsi),\mathcal{L}_{\Delta s}=\sum_{i=1}^{n}\mid{CTime}_{s}-\left({label}_{s_{i}}+\Delta t_{s_{i}}\right)\mid, (4)

where CTimep{CTime}_{p} and CTimes{CTime}_{s} represents the reference times where the P- and S-wave picks are aligned, respectively. Moreover, labelpi{label}_{p_{i}} or labelsi{label}_{s_{i}} represents the manually picked P/S-wave arrival time for each station, and Δt\Delta t represents the predicted P/S-wave shift. Finally, we combine the three types of loss functions to form the overall objective function:

Ltotal=λ1pickingp+λ1pickings+λ2Δp+λ2Δs+λ3offset+λ3depth.L_{{total}}=\lambda_{1}\mathcal{L}_{{picking-p}}+\lambda_{1}\mathcal{L}_{{picking-s}}+\lambda_{2}\mathcal{L}_{\Delta p}+\lambda_{2}\mathcal{L}_{\Delta s}+\lambda_{3}\mathcal{L}_{{offset}}+\lambda_{3}\mathcal{L}_{{depth}}. (5)

Here, we set the coefficients λ1\lambda_{1}, λ2\lambda_{2}, and λ3\lambda_{3} to 1.

During the training process, the model was optimized using the ADAM (Kingma and Ba, 2014) method with an initial learning rate of 0.001, which is gradually decreased with a decay rate of 0.9 every 100 epochs. To enhance the training efficiency, we randomly selected 2048 events from the training set for each epoch, rather than using the entire data. The model was trained for a total of 2000 epochs with a batch size of 16, and the training process required approximately 24 h using 1 NVIDIA Tesla A100 GPU.

Evaluation metrics

In previous studies (Mousavi et al., 2020; Zhu et al., 2022c), true positive phase picks were defined as those within 0.5 s of the predicted pick. The rest were counted as false positives. Nevertheless, owing to potential errors in the labels of the dataset, such statistical results based on a single threshold may not be reliable. Thus, to better evaluate the performance of algorithms, we introduce new metrics, mPrecision, mRecall, and mF1, which are calculated using multiple thresholds, following previous research(Zheng et al., 2022). The metrics are defined as:

mPrecision=(Precision@11+Precision@12++Precision1@50)/40,\mathrm{mPrecision}=(\mathrm{Precision}@11+\mathrm{Precision}@12+\cdots+\mathrm{Precision}1@50)/40, (6)
mRecall=(Recall@11+Recall@12++Recall1@50)/40,\mathrm{mRecall}=(\mathrm{Recall}@11+\mathrm{Recall}@12+\cdots+\mathrm{Recall}1@50)/40, (7)
mF1=(F1@11+F1@12++F1@50)/40.\mathrm{mF}1=(\mathrm{F}1@11+\mathrm{F}1@12+\cdots+\mathrm{F}1@50)/40. (8)

where x@11, x@12, \cdots , x@50 are Precision, Recall, or F1 metrics when the thresholds are 11, 12, \cdots, 50 samples (corresponding to 0.11 s, 0.12 s, \cdots, 0.5 s of time), respectively. These metrics, mPrecision, mRecall, and mF1 reward detectors with better picking results and, therefore, can more reasonably or fairly assess the performance of the different methods.

References

  • Allen (1978) Rex V Allen. Automatic earthquake recognition and timing from single traces. Bulletin of the seismological society of America, 68(5):1521–1532, 1978.
  • Aoi et al. (2020) Shin Aoi, Youichi Asano, Takashi Kunugi, Takeshi Kimura, Kenji Uehira, Narumi Takahashi, Hideki Ueda, Katsuhiko Shiomi, Takumi Matsumoto, and Hiroyuki Fujiwara. Mowlas: Nied observation network for earthquake, tsunami and volcano. Earth, Planets and Space, 72(1):1–31, 2020.
  • Arora et al. (2013) Nimar S Arora, Stuart Russell, and Erik Sudderth. Net-visa: Network processing vertically integrated seismic analysis. Bulletin of the Seismological Society of America, 103(2A):709–729, 2013.
  • Baer and Kradolfer (1987) M Baer and U Kradolfer. An automatic phase picker for local and teleseismic events. Bulletin of the Seismological Society of America, 77(4):1437–1445, 1987.
  • Bakun and Wentworth (1997) WH u Bakun and CM Wentworth. Estimating earthquake location and magnitude from seismic intensity data. Bulletin of the Seismological Society of America, 87(6):1502–1521, 1997.
  • Beroza et al. (2021) Gregory C Beroza, Margarita Segou, and S Mostafa Mousavi. Machine learning and earthquake forecasting—next steps. Nature communications, 12(1):4761, 2021.
  • Bilal et al. (2022) Muhammad Atif Bilal, Yanju Ji, Yongzhi Wang, Muhammad Pervez Akhter, and Muhammad Yaqub. Early earthquake detection using batch normalization graph convolutional neural network (bngcnn). Applied Sciences, 12(15):7548, 2022.
  • Bloemheuvel et al. (2022a) Stefan Bloemheuvel, Jurgen van den Hoogen, Dario Jozinović, Alberto Michelini, and Martin Atzmueller. Multivariate time series regression with graph neural networks. arXiv preprint arXiv:2201.00818, 2022a.
  • Bloemheuvel et al. (2022b) Stefan Bloemheuvel, Jurgen van den Hoogen, Dario Jozinović, Alberto Michelini, and Martin Atzmueller. Graph neural networks for multivariate time series regression with application to seismic data. International Journal of Data Science and Analytics, pages 1–16, 2022b.
  • Brody et al. (2021) Shaked Brody, Uri Alon, and Eran Yahav. How attentive are graph attention networks? arXiv preprint arXiv:2105.14491, 2021.
  • Crotwell et al. (1999) H Philip Crotwell, Thomas J Owens, Jeroen Ritsema, et al. The taup toolkit: Flexible seismic travel-time and ray-path utilities. Seismological Research Letters, 70:154–160, 1999.
  • DeVries et al. (2018) Phoebe MR DeVries, Fernanda Viégas, Martin Wattenberg, and Brendan J Meade. Deep learning of aftershock patterns following large earthquakes. Nature, 560(7720):632–634, 2018.
  • Dziewonski and Anderson (1981) Adam M Dziewonski and Don L Anderson. Preliminary reference earth model. Physics of the earth and planetary interiors, 25(4):297–356, 1981.
  • Feng et al. (2022) Tian Feng, Saeed Mohanna, and Lingsen Meng. Edgephase: A deep learning model for multi-station seismic phase picking. Geochemistry, Geophysics, Geosystems, 23(11):e2022GC010453, 2022.
  • Fey and Lenssen (2019) Matthias Fey and Jan Eric Lenssen. Fast graph representation learning with pytorch geometric. arXiv preprint arXiv:1903.02428, 2019.
  • Gao and Ji (2019) Hongyang Gao and Shuiwang Ji. Graph u-nets. In international conference on machine learning, pages 2083–2092. PMLR, 2019.
  • Gibbons et al. (2016) Steven J Gibbons, Tormod Kværna, David B Harris, and Douglas A Dodge. Iterative strategies for aftershock classification in automatic seismic processing pipelines. Seismological Research Letters, 87(4):919–929, 2016.
  • Hamilton et al. (2017) William L Hamilton, Rex Ying, and Jure Leskovec. Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pages 1025–1035, 2017.
  • Hara et al. (2019) Shota Hara, Yukitoshi Fukahata, and Yoshihisa Iio. P-wave first-motion polarity determination of waveform data in western japan using deep learning. Earth, Planets and Space, 71(1):1–11, 2019.
  • Hu et al. (2019) Weihua Hu, Bowen Liu, Joseph Gomes, Marinka Zitnik, Percy Liang, Vijay Pande, and Jure Leskovec. Strategies for pre-training graph neural networks. arXiv preprint arXiv:1905.12265, 2019.
  • Huang et al. (2018) JP Huang, XA Wang, Y Zhao, C Xin, and Han Xiang. Large earthquake magnitude prediction in taiwan based on deep learning neural network. Neural Network World, 28(2):149–160, 2018.
  • Hunter (2007) John D Hunter. Matplotlib: A 2d graphics environment. Computing in science & engineering, 9(03):90–95, 2007.
  • Jain et al. (1996) Anil K Jain, Jianchang Mao, and K Moidin Mohiuddin. Artificial neural networks: A tutorial. Computer, 29(3):31–44, 1996.
  • Kim et al. (2021) Gwantae Kim, Bonhwa Ku, Jae-Kwang Ahn, and Hanseok Ko. Graph convolution networks for seismic events classification using raw waveform data from multiple stations. IEEE Geoscience and Remote Sensing Letters, 19:1–5, 2021.
  • Kingma and Ba (2014) Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  • Kipf and Welling (2016) Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
  • Kong et al. (2019) Qingkai Kong, Daniel T Trugman, Zachary E Ross, Michael J Bianco, Brendan J Meade, and Peter Gerstoft. Machine learning in seismology: Turning data into insights. Seismological Research Letters, 90(1):3–14, 2019.
  • Krizhevsky et al. (2017) Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60(6):84–90, 2017.
  • Kuang et al. (2021a) Wenhuan Kuang, Congcong Yuan, and Jie Zhang. Network-based earthquake magnitude determination via deep learning. Seismological Research Letters, 92(4):2245–2254, 2021a.
  • Kuang et al. (2021b) Wenhuan Kuang, Congcong Yuan, and Jie Zhang. Real-time determination of earthquake focal mechanism via deep learning. Nature communications, 12(1):1–8, 2021b.
  • LeCun et al. (2015) Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, 521(7553):436–444, 2015.
  • Li (2021) Zefeng Li. Recent advances in earthquake monitoring ii: Emergence of next-generation intelligent systems. Earthquake Science, 34(6):531–540, 2021.
  • Li et al. (2018) Zefeng Li, Zhigang Peng, Dan Hollis, Lijun Zhu, and James McClellan. High-resolution seismic event detection using local similarity for large-n arrays. Scientific reports, 8(1):1–10, 2018.
  • Liu et al. (2020) Min Liu, Miao Zhang, Weiqiang Zhu, William L Ellsworth, and Hongyi Li. Rapid characterization of the july 2019 ridgecrest, california, earthquake sequence from raw seismic data using machine-learning phase picker. Geophysical Research Letters, 47(4):e2019GL086189, 2020.
  • Lomax et al. (2019) Anthony Lomax, Alberto Michelini, and Dario Jozinović. An investigation of rapid earthquake characterization using single-station waveforms and a convolutional neural network. Seismological Research Letters, 90(2A):517–529, 2019.
  • McBrearty and Beroza (2022a) Ian W McBrearty and Gregory C Beroza. Earthquake location and magnitude estimation with graph neural networks. arXiv preprint arXiv:2203.05144, 2022a.
  • McBrearty and Beroza (2022b) Ian W McBrearty and Gregory C Beroza. Earthquake location and magnitude estimation with graph neural networks. arXiv preprint arXiv:2203.05144, 2022b.
  • McBrearty and Beroza (2022c) Ian W McBrearty and Gregory C Beroza. Earthquake phase association with graph neural networks. arXiv preprint arXiv:2209.07086, 2022c.
  • McBrearty and Beroza (2023) Ian W McBrearty and Gregory C Beroza. Earthquake phase association with graph neural networks. Bulletin of the Seismological Society of America, 113(2):524–547, 2023.
  • McBrearty et al. (2019a) Ian W McBrearty, Andrew A Delorey, and Paul A Johnson. Pairwise association of seismic arrivals with convolutional neural networks. Seismological Research Letters, 90(2A):503–509, 2019a.
  • McBrearty et al. (2019b) Ian W McBrearty, Joan Gomberg, Andrew A Delorey, and Paul A Johnson. Earthquake arrival association with backprojection and graph theoryearthquake arrival association with backprojection and graph theory. Bulletin of the Seismological Society of America, 109(6):2510–2531, 2019b.
  • Mousavi and Beroza (2019) S Mostafa Mousavi and Gregory C Beroza. Bayesian-deep-learning estimation of earthquake location from single-station observations. arXiv preprint arXiv:1912.01144, 2019.
  • Mousavi and Beroza (2020) S Mostafa Mousavi and Gregory C Beroza. A machine-learning approach for earthquake magnitude estimation. Geophysical Research Letters, 47(1):e2019GL085976, 2020.
  • Mousavi and Beroza (2022) S Mostafa Mousavi and Gregory C Beroza. Deep-learning seismology. Science, 377(6607):eabm4470, 2022.
  • Mousavi et al. (2019) S Mostafa Mousavi, Weiqiang Zhu, Yixiao Sheng, and Gregory C Beroza. Cred: A deep residual network of convolutional and recurrent units for earthquake signal detection. Scientific reports, 9(1):1–14, 2019.
  • Mousavi et al. (2020) S Mostafa Mousavi, William L Ellsworth, Weiqiang Zhu, Lindsay Y Chuang, and Gregory C Beroza. Earthquake transformer—an attentive deep-learning model for simultaneous earthquake detection and phase picking. Nature communications, 11(1):1–12, 2020.
  • Münchmeyer et al. (2021) Jannes Münchmeyer, Dino Bindi, Ulf Leser, and Frederik Tilmann. Earthquake magnitude and location estimation from real time seismic waveforms with a transformer network. Geophysical Journal International, 226(2):1086–1104, 2021.
  • Münchmeyer et al. (2022) Jannes Münchmeyer, Jack Woollam, Andreas Rietbrock, Frederik Tilmann, Dietrich Lange, Thomas Bornstein, Tobias Diehl, Carlo Giunchi, Florian Haslinger, Dario Jozinović, et al. Which picker fits my data? a quantitative evaluation of deep learning based seismic pickers. Journal of Geophysical Research: Solid Earth, 127(1):e2021JB023499, 2022.
  • NIED and KiK-net (2019) K NIED and NET KiK-net. National research institute for earth science and disaster resilience. National research institute of earth science and disaster resilience, 2019.
  • Obara et al. (2005) Kazushige Obara, Keiji Kasahara, Sadaki Hori, and Yoshimitsu Okada. A densely distributed high-sensitivity seismograph network in japan: Hi-net by national research institute for earth science and disasterprevention. Review of scientific instruments, 76(2):021301, 2005.
  • Oh et al. (2020) Jin Woo Oh, Jack Ngarambe, Patrick Nzivugira Duhirwe, Geun Young Yun, and Mattheos Santamouris. Using deep-learning to forecast the magnitude and characteristics of urban heat island in seoul korea. Scientific reports, 10(1):1–13, 2020.
  • Pardo et al. (2019) Esteban Pardo, Carmen Garfias, and Norberto Malpica. Seismic phase picking using convolutional networks. IEEE Transactions on Geoscience and Remote Sensing, 57(9):7086–7092, 2019.
  • Paszke et al. (2019) Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
  • Perol et al. (2018) Thibaut Perol, Michaël Gharbi, and Marine Denolle. Convolutional neural network for earthquake detection and location. Science Advances, 4(2):e1700578, 2018.
  • Reichstein et al. (2019) Markus Reichstein, Gustau Camps-Valls, Bjorn Stevens, Martin Jung, Joachim Denzler, and Nuno Carvalhais. Deep learning and process understanding for data-driven earth system science. Nature, 566(7743):195–204, 2019.
  • Ronneberger et al. (2015) Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015.
  • Ross et al. (2018a) Zachary E Ross, Men-Andrin Meier, and Egill Hauksson. P wave arrival picking and first-motion polarity determination with deep learning. Journal of Geophysical Research: Solid Earth, 123(6):5120–5129, 2018a.
  • Ross et al. (2018b) Zachary E Ross, Men-Andrin Meier, Egill Hauksson, and Thomas H Heaton. Generalized seismic phase detection with deep learningshort note. Bulletin of the Seismological Society of America, 108(5A):2894–2901, 2018b.
  • Ross et al. (2019) Zachary E Ross, Yisong Yue, Men-Andrin Meier, Egill Hauksson, and Thomas H Heaton. Phaselink: A deep learning approach to seismic phase association. Journal of Geophysical Research: Solid Earth, 124(1):856–869, 2019.
  • Seydoux et al. (2020) Léonard Seydoux, Randall Balestriero, Piero Poli, Maarten de Hoop, Michel Campillo, and Richard Baraniuk. Clustering earthquake signals and background noises in continuous seismic data with unsupervised deep learning. Nature communications, 11(1):1–12, 2020.
  • Shen and Shen (2021) Heather Shen and Yang Shen. Array-based convolutional neural networks for automatic detection and 4d localization of earthquakes in hawai ‘i. Seismological Research Letters, 92(5):2961–2971, 2021.
  • Shi et al. (2020) Yunsheng Shi, Zhengjie Huang, Shikun Feng, Hui Zhong, Wenjin Wang, and Yu Sun. Masked label prediction: Unified message passing model for semi-supervised classification. arXiv preprint arXiv:2009.03509, 2020.
  • Sleeman and Van Eck (1999) Reinoud Sleeman and Torild Van Eck. Robust automatic p-phase picking: an on-line implementation in the analysis of broadband seismogram recordings. Physics of the earth and planetary interiors, 113(1-4):265–275, 1999.
  • Tian et al. (2020) Xiao Tian, Wei Zhang, Xiong Zhang, Jie Zhang, Qingshan Zhang, Xiangteng Wang, and Quanshi Guo. Comparison of single-trace and multiple-trace polarity determination for surface microseismic data using deep learning. Seismological Research Letters, 91(3):1794–1803, 2020.
  • Uieda et al. (2021) Leonardo Uieda, Dongdong Tian, Wei Ji Leong, L Toney, William Schlitzer, Michael Grund, D Newton, M Ziebarth, M Jones, and P Wessel. Pygmt: A python interface for the generic mapping tools. 2021.
  • van den Ende and Ampuero (2020) Martijn PA van den Ende and J-P Ampuero. Automated seismic source characterization using deep graph neural networks. Geophysical Research Letters, 47(17):e2020GL088690, 2020.
  • Vaswani et al. (2017) Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  • Veličković et al. (2017) Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. Graph attention networks. arXiv preprint arXiv:1710.10903, 2017.
  • Wang et al. (2019) Jian Wang, Zhuowei Xiao, Chang Liu, Dapeng Zhao, and Zhenxing Yao. Deep learning for picking seismic arrival times. Journal of Geophysical Research: Solid Earth, 124(7):6612–6624, 2019.
  • Wu et al. (2020) Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S Yu Philip. A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems, 32(1):4–24, 2020.
  • Xu et al. (2018) Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks? arXiv preprint arXiv:1810.00826, 2018.
  • Yang et al. (2022) Lei Yang, Xin Liu, Weiqiang Zhu, Liang Zhao, and Gregory C Beroza. Toward improved urban earthquake monitoring through deep-learning-based noise suppression. Science advances, 8(15):eabl3564, 2022.
  • Yang et al. (2021) Shaobo Yang, Jing Hu, Haijiang Zhang, and Guiquan Liu. Simultaneous earthquake detection on multiple stations via a convolutional neural network. Seismological Research Letters, 92(1):246–260, 2021.
  • Yano et al. (2021) Keisuke Yano, Takahiro Shiina, Sumito Kurata, Aitaro Kato, Fumiyasu Komaki, Shin’ichi Sakai, and Naoshi Hirata. Graph-partitioning based convolutional neural network for earthquake detection using a seismic array. Journal of Geophysical Research: Solid Earth, 126(5):e2020JB020269, 2021.
  • Yoon et al. (2015) Clara E Yoon, Ossian O’Reilly, Karianne J Bergen, and Gregory C Beroza. Earthquake detection through computationally efficient similarity search. Science advances, 1(11):e1501057, 2015.
  • Yu and Segall (1996) Ellen Yu and Paul Segall. Slip in the 1868 hayward earthquake from the analysis of historical triangulation data. Journal of Geophysical Research: Solid Earth, 101(B7):16101–16118, 1996.
  • Yu and Wang (2022) Ziye Yu and Weitao Wang. Fastlink: a machine learning and gpu-based fast phase association method and its application to yangbi m s 6.4 aftershock sequences. Geophysical Journal International, 230(1):673–683, 2022.
  • Zhang et al. (2014) Jie Zhang, Haijiang Zhang, Enhong Chen, Yi Zheng, Wenhuan Kuang, and Xiong Zhang. Real-time earthquake monitoring using a search engine method. Nature communications, 5(1):5664, 2014.
  • Zhang et al. (2019) Miao Zhang, William L Ellsworth, and Gregory C Beroza. Rapid earthquake association and location. Seismological Research Letters, 90(6):2276–2284, 2019.
  • Zhang et al. (2020) Xiong Zhang, Jie Zhang, Congcong Yuan, Sen Liu, Zhibo Chen, and Weiping Li. Locating induced earthquakes with a network of seismic stations in oklahoma via a deep learning method. Scientific reports, 10(1):1–12, 2020.
  • Zhang et al. (2022) Xitong Zhang, Will Reichard-Flynn, Miao Zhang, Matthew Hirn, and Youzuo Lin. Spatio-temporal graph convolutional networks for earthquake source characterization. Journal of Geophysical Research: Solid Earth, page e2022JB024401, 2022.
  • Zheng et al. (2022) Tu Zheng, Yifei Huang, Yang Liu, Wenjian Tang, Zheng Yang, Deng Cai, and Xiaofei He. Clrnet: Cross layer refinement network for lane detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 898–907, 2022.
  • Zhou et al. (2020) Jie Zhou, Ganqu Cui, Shengding Hu, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, Lifeng Wang, Changcheng Li, and Maosong Sun. Graph neural networks: A review of methods and applications. AI Open, 1:57–81, 2020.
  • Zhu et al. (2021) Jingbao Zhu, Shanyou Li, Jindong Song, and Yuan Wang. Magnitude estimation for earthquake early warning using a deep convolutional neural network. Frontiers in Earth Science, page 341, 2021.
  • Zhu et al. (2022a) Jun Zhu, Zefeng Li, and Lihua Fang. Ustc-pickers: a unified set of seismic phase pickers transfer learned for china. Earthquake Science, 36:1–11, 2022a.
  • Zhu et al. (2019) Lijun Zhu, Zhigang Peng, James McClellan, Chenyu Li, Dongdong Yao, Zefeng Li, and Lihua Fang. Deep learning for seismic phase detection and picking in the aftershock zone of 2008 mw7. 9 wenchuan earthquake. Physics of the Earth and Planetary Interiors, 293:106261, 2019.
  • Zhu and Beroza (2018) Weiqiang Zhu and Gregory C Beroza. Phasenet: a deep-neural-network-based seismic arrival-time picking method. Geophysical Journal International, 216(1):261–273, 10 2018.
  • Zhu et al. (2022b) Weiqiang Zhu, Ian W McBrearty, S Mostafa Mousavi, William L Ellsworth, and Gregory C Beroza. Earthquake phase association using a bayesian gaussian mixture model. Journal of Geophysical Research: Solid Earth, 127(5):e2021JB023249, 2022b.
  • Zhu et al. (2022c) Weiqiang Zhu, Kai Sheng Tai, S Mostafa Mousavi, Peter Bailis, and Gregory C Beroza. An end-to-end earthquake detection method for joint phase picking and association using deep learning. Journal of Geophysical Research: Solid Earth, 127(3):e2021JB023283, 2022c.

Supplementary Information

Refer to caption
Figure S1: Network architecture of PLAN. a waveform feature extraction encoder and decoder extract features from raw seismic waveforms using a simplified 1D U-net. These features, along with information about the locations of the stations, are input into the earthquake location module b, which is a two-branch module. It extracts and combines waveform and location information for each station and calculates the offsets of the stations relative to the event and the depth of the event. c multi-station association module then uses the offsets and depth information to estimate the relative time shifts between different stations which are further used to temporally align the waveform features from different stations. d finally, the phase-picking module aggregates features (among multiple stations) in the aligned space and then predicts P/S-wave phase arrivals in the original space. Furthermore, in 1DConv blocks, the preceding number indicates the size of the convolutional kernel, while the subsequent numbers respectively denote the input and output channels. Regarding the TransformerGConv Blocks, the following numbers correspondingly represent the input and output features.
Refer to caption
Figure S2: Comparison of the precision-recall curves of picking results in the Ridgecrest region. True positive phase picks were defined as those within 0.5 s of the predicted pick. The numbers in legend represent the maximum F1 scores of each model.
Refer to caption
Figure S3: Comparison of the precision-recall curves of picking results in Japan. True positive phase picks were defined as those within 0.5 s of the predicted pick. The numbers in legend represent the maximum F1 scores of each model.
Refer to caption
Figure S4: Example of raw waveforms (first line) and aligning raw waveforms of P waves (second line) and S waves (third line) using the time shifts predicted by the multi-station association module. The numbers on the top of the figures represent the identification number of the events in the Japan Meteorological Agency (JMA) catalog. The alignment results indicate that the multi-station association module can accurately estimate the time shifts and further utilize it to align the waveform features at the same time temporal position.
Refer to caption
Figure S5: Comparison of waveform picking results for the aftershock sequence of the 2022 M 7.4 Fukushima earthquake in the Japan using PLAN and PhaseNet. The red, blue, and green markers represent the manual labels, PLAN results, and PhaseNet results. The lines in b, c, e and f are obtained by linear fitting based on the picking results. PhaseNet picks may become mixed in near-offset stations, making it challenging to obtain a reliable result without phase association. In contrast, due to its nature as a multi-task method that simultaneously performs phase picking and phase association, PLAN exhibits stronger consistency in picking results across stations.