\newfloatcommand

capbtabboxtable[][\FBwidth]

CardioGAN: Attentive Generative Adversarial Network with Dual Discriminators for Synthesis of ECG from PPG

Pritam Sarkar, Ali Etemad

Abstract

Electrocardiogram (ECG) is the electrical measurement of cardiac activity, whereas Photoplethysmogram (PPG) is the optical measurement of volumetric changes in blood circulation. While both signals are used for heart rate monitoring, from a medical perspective, ECG is more useful as it carries additional cardiac information. Despite many attempts toward incorporating ECG sensing in smartwatches or similar wearable devices for continuous and reliable cardiac monitoring, PPG sensors are the main feasible sensing solution available. In order to tackle this problem, we propose CardioGAN, an adversarial model which takes PPG as input and generates ECG as output. The proposed network utilizes an attention-based generator to learn local salient features, as well as dual discriminators to preserve the integrity of generated data in both time and frequency domains. Our experiments show that the ECG generated by CardioGAN provides more reliable heart rate measurements compared to the original input PPG, reducing the error from $9.74$ beats per minute (measured from the PPG) to $2.89$ (measured from the generated ECG).

1 Introduction

According to the World Health Organization (WHO) in 2017, Cardiovascular Deceases (CVDs) are reported as the leading causes of death worldwide (WHO 2017). The report indicates that CVDs cause $31\%$ of global deaths, out of which at least three-quarters of deaths occur in the low or medium-income countries. One of the primary reasons behind this is the lack of primary healthcare support and the inaccessible on-demand health monitoring infrastructure. Electrocardiogram (ECG) is considered as one of the most important attributes for continuous health monitoring required for identifying those who are at serious risk of future cardiovascular events or death. Vast amount of research is being conducted with the goal of developing wearable devices capable of continuous ECG monitoring and feasible for daily life use, largely to no avail. Currently, very few wearable devices provide wrist-based ECG monitoring, and those who do, require the user to stand still and touch the watch with both hands in order to close the circuit in order to record an ECG segment of limited duration (usually 30 seconds), making these solutions non-continuous and sporadic.

Photoplethysmogram (PPG), an optical method for measuring blood volume changes at the surface of the skin, is considered as a close alternative to ECG, which contains valuable cardiovascular information (Gil et al. 2010; Schäfer and Vagedes 2013). For instance, studies have shown that a number of features extracted from PPG (e.g., pulse rate variability) are highly correlated with corresponding metrics extracted from ECG (e.g., heart rate variability) (Gil et al. 2010), further illustrating the mutual information between these two modalities. Yet, through recent advancements in smartwatches, smartphones, and other similar wearable and mobile devices, PPG has become the industry standard as a simple, wearable-friendly, and low-cost solution for continuous heart rate (HR) monitoring for everyday use. Nonetheless, PPG suffers from inaccurate HR estimation and several other limitations in comparison to conventional ECG monitoring devices (Bent et al. 2020) due to factors like skin tone, diverse skin types, motion artefacts, and signal crossovers among others. Moreover, the ECG waveform carries important information about cardiac activity. For instance, the P-wave indicates the sinus rhythm, whereas a long PR interval is generally indicative of a first-degree heart blockage (Ashley and Niebauer 2004). As a result, ECG is consistently being used by cardiologists for assessing the condition and performance of the heart.

Based on the above, there is a clear discrepancy between the need for continuous wearable ECG monitoring and the available solutions in the market. To address this, we propose CardioGAN, a generative adversarial network (GAN) (Goodfellow et al. 2014), which takes PPG as inputs and generates ECG. Our model is based on the CycleGAN architecture (Zhu et al. 2017) which enables the system to be trained in an unpaired manner. Unlike CycleGAN, CardioGAN is designed with attention-based generators and equipped with multiple discriminators. We utilize attention mechanisms in the generators to better learn to focus on specific local regions such as the QRS complexes of ECG. To generate high fidelity ECG signals in terms of both time and frequency information, we utilize a dual discriminator strategy where one discriminator operates on signals in the time domain while the other uses frequency-domain spectrograms of the signals. We show that the generated ECG outputs are very similar to the corresponding real ECG signals. Finally, we perform HR estimation using our generated ECG as well as the input PPG signals. By comparing these values to the HR measured from the ground-truth ECG signals, we observe a clear advantage in our proposed method. While to demonstrate the efficacy of our solution we focus on single-lead ECG, we believe our approach can be used for multi-lead ECG through training the system on other desired leads. Our contributions in this paper are summarised below:

•

We propose a novel framework called CardioGAN for generating ECG signals from PPG inputs. We utilize attention-based generators and dual time and frequency domain discriminators along with a CycleGAN backbone to obtain realistic ECG signals. To the best of our knowledge, no other studies have attempted to generate ECG from PPG (or in fact any cross-modality signal-to-signal translation in the biosignal domain) using GANs or other deep learning techniques.
•

We perform a multi-corpus subject-independent study, which proves the generalizability of our model to data from unseen subjects and acquired in different conditions.
•

The generated ECG obtained from the CardioGAN provides more accurate HR estimation compared to HR values calculated from the original PPG, demonstrating some of the benefits of our model in the healthcare domain. We make the final trained model publicly available¹¹1https://code.engineering.queensu.ca/17ps21/ppg2ecg-cardiogan.

The rest of this paper is organized as follows. Section 2 briefly mentions the prior studies on ECG signal generation. Next, our proposed method is discussed in Section 3. Section 4 discusses the details of our experiments, including datasets and training procedures. Finally, the results and analyses are presented in Section 5, followed by a summary of our work in Section 6.

2 Related Work

2.1 Generating synthetic ECG Signal

The idea of synthesizing ECG has been explored in the past, utilizing both model-driven (e.g. signal processing or mathematical modelling) and data-driven (machine learning and deep learning) techniques. As examples of earlier works, (McSharry et al. 2003; Sayadi, Shamsollahi, and Clifford 2010) proposed solutions based on differential equations and Gaussian models for generating ECG segments.

Despite deep learning being employed to process ECG for a wide variety of different applications, for instance biometrics (Zhang, Zhou, and Zeng 2017), arrhythmia detection (Hannun et al. 2019), emotion recognition (Sarkar and Etemad 2020a, b), cognitive load analysis (Sarkar et al. 2019; Ross et al. 2019), and others, very few studies have tackled synthesis of ECG signals with deep neural networks (Zhu et al. 2019a; Golany and Radinsky 2019; Golany et al. 2020). Synthesizing ECG with GANs was first studied in (Zhu et al. 2019a), where a bidirectional LSTM-CNN architecture was proposed to generate ECG from Gaussian noise. The study performed by (Golany and Radinsky 2019), proposed PGAN or Personalized GAN to generate patient-specific synthetic ECG signals from input noise. A special loss function was proposed to mimic the morphology of ECG waveforms, which was a combination of cross-entropy loss and mean squared error between real and fake ECG waveforms.

A few other studies have targeted this area, for example, EmotionalGAN was proposed in (Chen et al. 2019), where synthetic ECG was used to augment the available ECG data in order to improve emotion classification accuracy. The proposed GAN generated the new ECG based on input noise. Lastly, in a similar study performed by (Golany et al. 2020), ECG was generated from input noise to augment the available ECG training set, improving the performance for arrhythmia detection.

2.2 ECG Synthesis from PPG

With respect to the very specific problem of PPG-to-ECG translation, to the best of our knowledge, only (Zhu et al. 2019b) has been published. This work did not use deep learning, instead used discrete cosine transformation (DCT) technique to map each PPG cycle to its corresponding ECG cycle. First, onsets of the PPG signals were aligned to the R-peaks of the ECG signals, followed by a de-trending operation in order to reduce noise. Next, each cycle of ECG and PPG was segmented, followed by temporal scaling using linear interpolation in order to maintain a fixed segment length. Finally, a linear regression model was trained to learn the relation between DCT coefficients of PPG segments and corresponding ECG segments. In spite of several contributions, this study suffers from few limitations. First, the model failed to produce reliable ECG in a subject-independent manner, which limits its application to only previously seen subject’s data. Second, often the relation between PPG segments and ECG segments are not linear, therefore in several cases, this model failed to capture the non-linear relationships between these $2$ domains. Lastly, no experiments have been performed to indicate any performance enhancement gained from using the generated ECG as opposed to the available PPG (for example a comparison of measured HR).

3 Method

3.1 Objective and Proposed Architecture

In order to not be constrained by paired training where both types of data are needed from the same instance in order to train the system, we are interested in an unpaired GAN, i.e. CycleGAN-based architectures. We propose CardioGAN whose main objective is to learn to estimate the mapping between PPG ( $P$ ) and ECG ( $E$ ) domains. In order to force the generator to focus on regions of the data with significant importance, we incorporate an attention mechanism into the generator. We implement generator $G_{E}:P\rightarrow E$ to learn forward mapping, and $G_{P}:E\rightarrow P$ to learn the inverse mapping. We denote generated ECG and generated PPG from CardioGAN as $E^{\prime}$ and $P^{\prime}$ respectively, where $E^{\prime}=G_{E}(P)$ and $P^{\prime}=G_{P}(E)$ . According to (Penttilä et al. 2001) and a large number of other studies, cardiac activity is manifested in both time and frequency domains. Therefore, in order to preserve the integrity of the generated ECG in both domains, we propose the use of a dual discriminator strategy, where $D^{t}$ is employed to classify time domain and $D^{f}$ is used to classify the frequency domain response of real and generated data.

Figure 1 shows our proposed architecture, where $G_{E}$ takes $P$ as an input and generates $E^{\prime}$ as the output. Similarly, $E$ is given as an input to $G_{P}$ where $P^{\prime}$ is generated as the output. We employ $D^{t}_{E}$ and $D^{t}_{P}$ to discriminate $E$ versus $E^{\prime}$ , and $P$ versus $P^{\prime}$ , respectively. Similarly, $D^{f}_{E}$ and $D^{f}_{P}$ are developed to discriminate $f(E)$ versus $f(E^{\prime})$ , as well as $f(P)$ versus $f(P^{\prime})$ , respectively, where $f$ denotes the spectrogram of the input signal. Finally, $E^{\prime}$ and $P^{\prime}$ are given as inputs to $G_{P}$ and $G_{E}$ respectively, in order to complete the cyclic training process.

In the following subsections, we expand on the dual discriminator, the notion of integrating an attention mechanism into the generator, and the loss functions used to train the overall architecture. The details and architectures of each of the networks used in our proposed solution are provided in Section 4.3.

Refer to caption — Figure 1: The architecture of the proposed CardioGAN is presented. The original ECG ( $E$ ) and PPG ( $P$ ) signals are shown in the color ‘orange’; the generated outputs ( $E^{\prime}$ and $P^{\prime}$ ) are represented with the color ‘green’; and the reconstructed or cyclic outputs ( $E^{\prime\prime}$ and $P^{\prime\prime}$ ) are marked with the color ‘black’ for better visibility. Moreover, connections to the generators are marked with solid lines, whereas, connections to the discriminators are marked with dashed lines.

3.2 Dual Discriminators

As mentioned above, to preserve both time and frequency information in the generated ECG, we use a dual discriminator approach. Dual discriminators have been used earlier in (Nguyen et al. 2017), showing improvements in dealing with mode collapse problems. To leverage the concept of dual discriminators, we perform Short-Time Fourier Transformation (STFT) on the ECG/PPG time series data. Let’s denote $x[n]$ as a time-series, then $STFT(x[n])$ can be denoted as $X(m,\omega)=\sum^{\infty}_{n=-\infty}x[n]w[n-m]e^{-j\omega n}$ , where $m$ is the step size and $w[n]$ denotes Hann window function. Finally, the spectrogram is obtained by $f(x[n])=log(|X(m,\omega)|+\theta)$ , where we use $\theta=1e^{-10}$ to avoid infinite condition. As shown in Figure 1 the time-domain and frequency-domain discriminators operate in parallel, and as we will discuss in Section 3.4, to aggregate the outcomes of these two networks, the loss terms of both of these networks are incorporated into the adversarial loss.

3.3 Attention-based Generators

We adopt Attention U-Net as our generator architecture, which has been recently proposed and used for image classification (Oktay et al. 2018; Jetley et al. 2018). We chose attention-based generators to learn to better focus on salient features passing through the skip connections. Let’s assume $x^{l}$ are features obtained from the skip connection originating from layer $l$ , and $g$ is the gating vector that determines the region of focus. First, $x^{l}$ and $g$ are mapped to an intermediate-dimensional space $\mathbb{R}^{F_{int}}$ where $F_{int}$ corresponds to the dimensions of the intermediate-dimensional space. Our objective is to determine the scalar attention values ( $\alpha_{i}^{l}$ ) for each temporal unit $x^{l}_{i}\in\mathbb{R}^{F_{l}}$ , utilizing gating vector $g_{i}\in\mathbb{R}^{F_{g}}$ , where $F_{l}$ and $F_{g}$ are the number of feature maps in $x^{l}$ and $g$ respectively. Linear transformations are performed on $x^{l}$ and $g$ as $\theta_{x}=W_{x}x^{l}_{i}+b_{x}$ and $\theta_{g}=W_{g}g_{i}+b_{g}$ respectively, where $W_{x}\in\mathbb{R}^{F_{l}\times F_{int}}$ , $W_{g}\in\mathbb{R}^{F_{g}\times F_{int}}$ , and $b_{x}$ , $b_{g}$ refer to the bias terms. Next, non-linear activation function ReLu (denoted by $\sigma_{1}$ ) is applied to obtain the sum feature activation $f=\sigma_{1}(\theta_{x}+\theta_{g})$ , where $\sigma_{1}(y)$ is formulated as $\max(0,y)$ . Next we perform a linear mapping of $f$ onto the $\mathbb{R}^{F_{int}}$ dimensional space by performing channel-wise $1\times 1$ convolutions, followed by passing through a sigmoid activation function $(\sigma_{2})$ in order to obtain the attention weights in the range of $[0,1]$ . The attention map corresponding to $x^{l}$ is obtained by $\alpha_{i}^{l}=\sigma_{2}(\psi*f)$ where $\sigma_{2}(y)$ can be formulated as $\frac{1}{1+exp^{-y}}$ , $\psi\in\mathbb{R}^{F_{int}}$ and $*$ denotes convolution. Next, we perform element-wise multiplication between $x^{l}_{i}$ and $\alpha^{l}_{i}$ to obtain the final output from the attention layer.

3.4 Loss

Our final objective function is a combination of an adversarial loss and a cyclic consistency loss as presented below.

Adversarial Loss

We apply adversarial loss in both forward and inverse mappings. Let’s denote individual PPG segments as $p$ and the corresponding ground-truth ECG segments as $e$ . For the mapping function $G_{E}:P\rightarrow E$ , and discriminators $D^{t}_{E}$ and $D^{f}_{E}$ , the adversarial losses are defined as:

\begin{split}\mathcal{L}_{adv}(G_{E},D^{t}_{E})&=\mathbb{E}_{e\sim E}[\log{(D^{t}_{E}(e))}]\\ &+\mathbb{E}_{p\sim P}[\log{(1-D^{t}_{E}(G_{E}(p)))}]\end{split}

(1)

\begin{split}\mathcal{L}_{adv}(G_{E},D^{f}_{E})&=\mathbb{E}_{e\sim E}[\log{(D^{f}_{E}(f(e)))}]\\ &+\mathbb{E}_{p\sim P}[\log{(1-D^{f}_{E}(f(G_{E}(p))))}]\end{split}

(2)

Similarly, for the inverse mapping function $G_{P}:E\rightarrow P$ , and discriminators $D^{t}_{P}$ and $D^{f}_{P}$ , the adversarial losses are defined as:

\begin{split}\mathcal{L}_{adv}(G_{P},D^{t}_{P})&=\mathbb{E}_{p\sim P}[\log{(D^{t}_{P}(p))}]\\ &+\mathbb{E}_{e\sim E}[\log{(1-D^{t}_{P}(G_{P}(e)))}]\end{split}

(3)

\begin{split}\mathcal{L}_{adv}(G_{P},D^{f}_{P})&=\mathbb{E}_{p\sim P}[\log{(D^{f}_{P}(f(p)))}]\\ &+\mathbb{E}_{e\sim E}[\log{(1-D^{f}_{P}(f(G_{P}(e))))}]\end{split}

(4)

Finally, the adversarial objective function for the mapping $G_{E}:P\rightarrow E$ is obtained as $\min_{G_{E}}\max_{D^{t}_{E}}\mathcal{L}_{adv}(G_{E},D^{t}_{E})$ and $\min_{G_{E}}\max_{D^{f}_{E}}\mathcal{L}_{adv}(G_{E},D^{f}_{E})$ . Similarly, for the mapping $G_{P}:E\rightarrow P$ , can be calculated as $\min_{G_{P}}\max_{D^{t}_{P}}\mathcal{L}_{adv}(G_{P},D^{t}_{P})$ and $\min_{G_{P}}\max_{D^{f}_{P}}\mathcal{L}_{adv}(G_{P},D^{f}_{P})$ .

Cyclic Consistency Loss

The other component of our objective function is the cyclic consistency loss or reconstruction loss as proposed by (Zhu et al. 2017). In order to ensure that forward mappings and inverse mappings are consistent, i.e., $p\rightarrow G_{E}(p)\rightarrow G_{P}(G_{E}(p))\approx p$ , as well as $e\rightarrow G_{P}(e)\rightarrow G_{E}(G_{P}(e))\approx e$ , we minimize the cycle consistency loss calculated as:

\begin{split}\mathcal{L}_{cyclic}(G_{E},G_{P})&=\mathbb{E}_{e\sim E}[||G_{E}(G_{P}(e))-e||_{1}]\\ &+\mathbb{E}_{p\sim P}[||G_{P}(G_{E}(p))-p||_{1}]\end{split}

(5)

Final Loss

The final objective function of CardioGAN is computed as:

\begin{split}\mathcal{L}_{CardioGAN}&=\alpha\mathcal{L}_{adv}(G_{E},D^{t}_{E})+\alpha\mathcal{L}_{adv}(G_{P},D^{t}_{P})\\ &+\beta\mathcal{L}_{adv}(G_{E},D^{f}_{E})+\beta\mathcal{L}_{adv}(G_{P},D^{f}_{P})\\ &+\lambda\mathcal{L}_{cyclic}(G_{E},G_{P}),\end{split}

(6)

where $\alpha$ and $\beta$ are adversarial loss coefficients corresponding to $D^{t}$ and $D^{f}$ respectively, and $\lambda$ is the cyclic consistency loss coefficient.

4 Experiments

In this section, we first introduce the datasets used in this study, followed by the description of the data preparation steps. Next, we present our implementation and architecture details.

4.1 Datasets

We use $4$ very popular ECG-PPG datasets, namely BIDMC (Pimentel et al. 2016), CAPNO (Karlen et al. 2013), DALIA (Reiss et al. 2019), and WESAD (Schmidt et al. 2018). We combine these $4$ datasets in order to enable a multi-corpus approach leveraging large and diverse distributions of data for different factors such as activity (e.g. working, driving, walking, resting), age (e.g. $29$ children, $96$ adults), and others. The aggregate dataset contains a total of $125$ participants with a balanced male-female ratio.

BIDMC

(Pimentel et al. 2016) was obtained from $53$ adult ICU patients ( $32$ females, $21$ males, mean age of $64.81$ ) where each recording was $8$ minutes long. PPG and ECG were both sampled at a frequency of $125$ Hz. It should be noted this dataset consists of three leads of ECG (II, V, AVR). However, we only use lead II in this study.

CAPNO

(Karlen et al. 2013) consists of data from $42$ participants, out of which $29$ were children (median age of $8.7$ ) and $13$ were adults (median age of $52.4$ ). The recordings were collected while the participants were under medical observation. Single-lead ECG and PPG recordings were sampled at a frequency of $300$ Hz and were $8$ minutes in length.

DALIA

(Reiss et al. 2019) was recorded from $15$ participants ( $8$ females, $7$ males, mean age of $30.60$ ), where each recording was approximately $2$ hours long. ECG and PPG signals were recorded while participants went through different daily life activities, for instance sitting, walking, driving, cycling, working and so on. Single-lead ECG signals were recorded at a sampling frequency of $700$ Hz while the PPG signals were recorded at a sampling rate of $64$ Hz.

WESAD

(Schmidt et al. 2018) was created using data from $15$ participants ( $12$ male, $3$ female, mean age of $27.5$ ), while performing activities such as solving arithmetic tasks, watching video clips, and others. Each recording was over $1$ hour in duration. Single-lead ECG was recorded at a sampling rate of $700$ Hz while PPG was recorded at a sampling rate of $64$ Hz.

4.2 Data Preparation

Since the above-mentioned datasets have been collected at different sampling frequencies, as a first step we re-sampled (using interpolation) both the ECG and PPG signals with a sampling rate of $128$ Hz. As the raw physiological signals contain a varying amounts and types of noise (e.g. power line interference, baseline wandering, motion artefacts), we perform very common filtering techniques on both the ECG and PPG signals. We apply a band-pass FIR filter with a pass-band frequency of $3$ Hz and stop-band frequency of $45$ Hz on the ECG signals. Similarly, a band-pass Butterworth filter with a pass-band frequency of $1$ Hz and a stop-band frequency of $8$ Hz is applied on the PPG signals. Next, person-specific z-score normalization is performed on both ECG and PPG. Then, the normalized ECG and PPG signals are segmented into $4$ -second windows ( $128$ Hz $\times 4$ seconds $=512$ samples), with a $10\%$ overlap to avoid missing any peaks. Finally, we perform min-max $[-1,1]$ normalization on both ECG and PPG segments to ensure all the input data are in a specific range.

4.3 Architecture

Generator

As mentioned earlier an Attention U-Net architecture is used as our generator, where self-gated soft-attention units are used to filter the features passing through the skip connections. $G_{E}$ and $G_{P}$ take $1\times 512$ data points as input. The encoder consists of $6$ blocks, where the number of filters is gradually increased ( $64,128,256,512,512,512$ ) with a fixed kernel size of $1\times 16$ and a stride of $2$ . We apply layer normalization and leaky-ReLu activation after each convolution layers except the first layer, where no normalization is used. A similar architecture is used in the decoder, except de-convolutional layers with ReLu activation functions are used and the number of filters is gradually decreased in the same manner. The final output is then obtained from a de-convolutional layer with a single-channel output followed by tanh activation.

Discriminator

Dual discriminators are used to classify real and fake data in time and frequency domains. $D^{t}_{E}$ and $D^{t}_{P}$ take time-series signals of size $1\times 512$ as inputs, whereas, spectrograms of size $128\times 128$ are given as inputs to $D^{f}_{E}$ and $D^{f}_{P}$ . Both $D^{t}$ and $D^{f}$ use $4$ convolution layers, where the number of filters are gradually increased ( $64,128,256,512$ ) with a fixed kernel of $1\times 16$ for $D^{t}$ and $7\times 7$ for $D^{f}$ . Both networks use a stride of 2. Each convolution layer is followed by layer normalization and leaky ReLu activation, except the first layer where no normalization is used. Finally, the output is obtained from a single-channel convolutional layer.

4.4 Training

Our proposed CardioGAN network is trained from scratch on an Nvidia Titan RTX GPU, using TensorFlow 2.2. We divide the aggregated dataset into a training set and test set. We randomly select $80\%$ of the users from each dataset (a total of $101$ participants, equivalent to $58$ K segments) for training, and the remaining $20\%$ of users from each dataset (a total of $24$ participants, equivalent to $15$ K segments) for testing. The training time was approximately $50$ hours. To enable CardioGAN to be trained in an unpaired fashion, we shuffle the ECG and PPG segments from each dataset separately eliminating the couplings between ECG and PPG followed by a shuffling of the order of datasets themselves for ECG and PPG separately. We use a batch size of $128$ , unlike the original CycleGAN where a batch size of $1$ is used. We notice performance gain with a larger batch size. Adam optimizer is used to train both the generators and discriminators. We train our model for $15$ epochs, where the learning rate ( $1e^{-4}$ ) is kept constant for the initial $10$ epochs and then linearly decayed to $0$ . The values of $\alpha$ , $\beta$ , and $\lambda$ are empirically set to $3$ , $1$ and $30$ respectively.

5 Performance

CardioGAN produces two main signal outputs, generated ECG ( $E^{\prime}$ ) and generated PPG ( $P^{\prime}$ ). As our goal is to generate the more important and elusive ECG, we utilize $E^{\prime}$ and ignore $P^{\prime}$ in the following experiments. In this section, we present the quantitative and qualitative results of our proposed CardioGAN network. Next, we perform an ablation study in order to understand the effects of the different components of the model. Further, we perform several analyses, followed by a discussion of potential applications using our proposed solution.

5.1 Quantitative Results

Heart rate is measured as number of beats per minutes (BPM) by dividing the length of ECG or PPG segments in seconds by the average of the peak intervals multiplied by 60 (seconds). Let’s define the mean absolute error (MAE) metric for the heart rate (in BPM) obtained from a given ECG or PPG signal ( $HR^{Q}$ ) with respect to a ground-truth HR ( $HR^{GT}$ ) as $MAE_{HR}(Q)={\frac{1}{N}}\sum_{i=1}^{N}|HR^{GT}_{i}-HR^{Q}_{i}|$ , where $N$ is the number of segments for which the HR measurements have been obtained. In order to investigate the merits of CardioGAN, we measure $MAE_{HR}(E^{\prime})$ , where $E^{\prime}$ is the ECG generated by CardioGAN. We compare these MAE values to $MAE_{HR}(P)$ (where $P$ denotes the available input PPG) as reported by other studies on the 4 datasets. The results are presented in Table 1 where we observe that for $3$ of the $4$ datasets, the HR measured from the ECG generated by CardioGAN is more accurate than the HR measured from the input PPG signals. For CAPNO dataset in which our ECG shows higher error compared to other works based on PPG, the difference is quite marginal, especially in comparison to the performance gains achieved across the other datasets.

Different studies in this area have used different window sizes for HR measurement which we report in Table 1. To evaluate the impact of our solution based on different window sizes, we measure $MAE_{HR}(E^{\prime})$ over different $4,8,16,32$ , and $64$ second windows and present the results in comparison to $MAE_{HR}(P)$ across all the subjects available in the $4$ datasets in Table 2. In these experiments, we utilize two popular algorithms for detecting peaks from ECG (Hamilton 2002) and PPG (Elgendi et al. 2013) signals. We observe a clear advantage in measuring HR from $E^{\prime}$ as opposed to $P$ . We notice a very consistent performance gain across different window sizes, which further demonstrates the stability of the results produce by CardioGAN.

Dataset	Method	Window (sec.)	$MAE_{HR}$
BIDMC	(Nilsson et al. 2005)	$64$	$4.6$
	(Shelley et al. 2006)		$2.3$
	(Fleming et al. 2007)		$5.5$
	(Karlen et al. 2013)		$5.7$
	(Pimentel et al. 2016)		$2.7$
	CardioGAN		0.7
CAPNO	(Nilsson et al. 2005)	$64$	$10.2$
	(Shelley et al. 2006)		$2.2$
	(Fleming et al. 2007)		$1.4$
	(Karlen et al. 2013)		1.2
	(Pimentel et al. 2016)		$1.9$
	CardioGAN		$2.0$
Dalia	(Schäck et al. 2017)	$8$	$20.5$
	(Reiss et al. 2019)		$15.6$
	(Reiss et al. 2019)		$11.1$
	CardioGAN		8.3
WESAD	(Schäck et al. 2017)	$8$	$19.9$
	(Reiss et al. 2019)		$11.5$
	(Reiss et al. 2019)		$9.5$
	CardioGAN		8.6

Table 1: We compare the

MAE_{HR}

calculated from the generated ECG with

MAE_{HR}

calculated from the real input PPG.

Window (sec.)	$MAE_{HR}(E^{\prime})$	$MAE_{HR}(P)$
4	$4.86$	$10.67$
8	$3.54$	$10.23$
16	$3.27$	$10.00$
32	$3.08$	$9.77$
64	$2.89$	$9.74$

Table 2: A comparison of

MAE_{HR}

between generated ECG and real PPG is presented for different window sizes.

5.2 Qualitative Results

In Figure 2 we present a number of samples of ECG signals generated by CardioGAN, clearly showing that our proposed network is able to learn to reconstruct the shape of the original ECG signals from corresponding PPG inputs. Careful observation shows that in some cases, the generated ECG signals exhibit a small time lag with respect to the original ECG signals. The root cause of this time delay is the Pulse Arrival Time (PAT), which is defined as the time taken by the PPG pulse to travel from the heart to a distal site (from where PPG is collected, for example, wrist, fingertip, ear, or others) (Elgendi et al. 2019). Nonetheless, this time-lag is consistent for all the beats across a single generated ECG signal as a simple offset, and therefore does not impact HR measurements or other cardiovascular-related metrics. This is further evidenced by the accurate HR measurements presented earlier in Tables 1 and 2.

5.3 Ablation Study

The proposed CardioGAN consists of attention-based generators and dual discriminators, as discussed earlier. In order to investigate the usefulness of the attention mechanisms and dual discriminators, we perform an ablation study of $2$ variations of the network by removing each of these components individually. To evaluate these components, we perform the same $MAE_{HR}$ along with a number of other metrics to quantify the quality of ECG waveforms. We use metrics similar to those used in (Zhu et al. 2019a), which are Root Mean Squared Error (RMSE), Percentage Root Mean Squared Difference (PRD), and Fréchet Distance (FD). We briefly defined these metrics as follows:

RMSE: In order to understand the stability between $E$ and $E^{\prime}$ , we calculate $RMSE=\sqrt{{\frac{1}{N}}\sum_{i=1}^{N}(E_{i}-E^{\prime}_{i})^{2}}$ where $E_{i}$ and $E^{\prime}_{i}$ refer to the $i^{th}$ point of $E$ and $E^{\prime}$ respectively.

PRD: To quantify the distortion between $E$ and $E^{\prime}$ , we calculate $PRD=\sqrt{\frac{\sum_{i=1}^{N}(E_{i}-E^{\prime}_{i})^{2}}{\sum_{i=1}^{N}(E_{i})^{2}}\times 100}$ .

FD: Fréchet distance (Alt and Godau 1995) is calculated to measure the similarity between the $E$ and $E^{\prime}$ . While calculating the distance between two curves, this distance considers the location and order of the data points, hence, giving a more accurate measure of similarity between two time-series signals. Let’s assume $E$ , a discrete signal, can be expressed as a sequence of $\{e_{1},e_{2},e_{3},\dots,e_{N}\}$ , and similarly $E^{\prime}$ can be expressed as $\{e^{\prime}_{1},e^{\prime}_{2},e^{\prime}_{3},\dots,e^{\prime}_{N}\}$ . We can create a $2$ -D matrix $M$ of corresponding data points by preserving the order of sequence $E$ and $E^{\prime}$ , where $M\subseteq\{(e,e^{\prime})|e\in E,e^{\prime}\in E^{\prime}\}$ . The discrete Fréchet distance of $E$ and $E^{\prime}$ is calculated as $FD=\min_{M}\max_{(e,e^{\prime})\in M}d(e,e^{\prime})$ , where $d(e,e^{\prime})$ denotes the Euclidean distance between corresponding samples of $e$ and $e^{\prime}$ .

The results of our ablation study are presented in Table 3. We present the performance of different variants of CardioGAN for all the subjects across all $4$ datasets. CardioGAN w/o DD is the variant with only the time domain discriminator and no change in the generator architecture. CardioGAN w/o attn is the variant where the generator does not contain an attention mechanism. The results presented in the table evidently show the benefit of using the proposed CardioGAN over it’s ablation variants.

Method	RMSE	PRD	FD	$MAE_{HR}$
CardioGAN w/o DD	$0.396$	$8.742$	$0.717$	$9.57$
CardioGAN w/o Attn	$0.386$	$8.393$	$0.773$	$9.67$
CardioGAN (proposed)	0.364	8.356	0.694	4.77

Table 3: Performance comparison of CardioGAN and it’s ablation variations across all the subjects of the

4

datasets are presented.

5.4 Analysis

Attention Map

In order to better understand what has been learned through the attention mechanism in the generators, we visualize the attention maps applied to the very last skip connection of the generator ( $G_{E}$ ). We choose the attention applied to the last skip connection since this layer is the closest to the final output and there more interpretable. For better visualization, we superimpose the attention map on top of the output of the generator as shown in Figure 3. This shows that our model learns to generally focus on the PQRST complexes, which in turn helps the generator to learn the shapes of ECG waveform better as evident from qualitative and quantitative results presented earlier.

Unpaired Training vs. Paired Training

We further investigate the performance of CardioGAN while training with paired ECG-PPG inputs as opposed to our original approach which is based on unpaired training. To train CardioGAN in a paired manner, we follow the same training process mentioned in Section 4.4, except we keep the coupling between the ECG and PPG pairs intact in the input data. The results are presented in Table 4, and a few samples of generated ECG are shown in Figure 4. By comparing these results to those presented in Table 4, we observe that unpaired training of CardioGAN shows superior performance compared to paired training. In particular, we notice that while CardioGAN-Paired does learns well to generate ECG beats from PPG inputs, it fails to learn the exact shape of the original ECG waveforms. This might be because an unpaired training scheme forces the network to learn stronger user-independent mappings between PPG and ECG, compared to user-dependant paired training. While it can be argued that utilizing paired data using other GAN architectures might perform well, it should be noted that the goal of this experiment is to evaluate the performance when paired training is performed without any fundamental changes to the architecture. We design CardioGAN with the aim of being able to leverage datasets that do not necessarily contain both ECG and PPG, hence, unpaired training, even though we resort to datasets that do contain both (ECG and PPG) so that ground-truth measurements can be used for evaluation purposes.

Table 4: The results obtained from CardioGAN-Paired are presented.

Method	RMSE	PRD	FD	$MAE_{HR}$
CardioGAN-Paired	$0.437$	$9.315$	$0.748$	$5.04$

Failed Cases

We notice there are instances where CardioGan fails to generate ECG samples that resemble the original ECG data very closely. Such cases arise only when the PPG input signals are of very poor quality. We show a few examples in Figure 5 where for highly noisy PPG inputs, the generated ECG samples also exhibit very low quality.

5.5 Potential Applications and Demonstration

Apart from the interest to the AI community, we believe our proposed solution has the potential to make a larger impact in the healthcare and wearable domains, notably for continuous health monitoring. Monitoring cardiac activity is an essential part of continuous health monitoring systems, which could enable early diagnosis of cardiovascular diseases, and in turn, early preventative measures that can lead to overcoming severe cardiac problems. Nonetheless, as discussed earlier, there are no suitable solutions for every-day continuous ECG monitoring. In this study we bridge this gap by utilizing PPG signals (which can be easily collected from almost every wearable devices available in the market) in our proposed CardioGAN to capture the cardiac information of users and generate accurate ECG signals. We perform a multi-corpus subject-independent study, where the subjects have gone through a wide range of activities including daily-life tasks, which assures us of the usability of our proposed solution in practical settings. Most importantly, our proposed solution can be integrated into an existing PPG-based wearable device to extract ECG data without any required additional hardware. To demonstrate this concept, we have implemented our model to perform in real-time and used a wrist-based wearable device to feed it with PPG data. The video²²2https://youtu.be/z0Dr4k24t7U presented as supplementary material demonstrates CardioGAN producing realistic ECG from wearable PPG in real-time.

6 Summary and Future Work

In this paper, we propose CardioGAN, a solution for generating ECG signals from input PPG signals to aid with continuous and reliable cardiac monitoring. Our proposed method takes $4$ -second PPG segments and generates corresponding ECG segments of equal length. Self-gated soft-attention is used in the generator to learn important regions, for example the QRS complexes of ECG waveforms. Moreover, a dual discriminator strategy is used to learn the mapping in both time and frequency domains. Further, we evaluate the merits of the generated ECG by calculating HR and comparing the results to HR obtained from the real PPG. The analysis shows a clear advantage of using CardioGAN as more accurate HR values are obtained as a result of using the model.

For future work, the advantages of using the generated ECG data in other areas where the use of PPG is limited may be evaluated. These areas include identification of cardiovascular diseases, detection of abnormal heart rhythms, and others. Furthermore, generating multi-lead ECG can also be studied in order to extract more useful cardiac information often missing in single-channel ECG recordings. Finally, we hope our research can open a new path towards cross-modality signal-to-signal translation in the biosignal domain, allowing for less available physiological recording to be generated from more affordable and readily available signals.

References

Alt and Godau (1995) Alt, H.; and Godau, M. 1995. Computing the Fréchet distance between two polygonal curves. International Journal of Computational Geometry & Applications 5(01n02): 75–91.
Ashley and Niebauer (2004) Ashley, E.; and Niebauer, J. 2004. Conquering the ECG. London: Remedica.
Bent et al. (2020) Bent, B.; Goldstein, B. A.; Kibbe, W. A.; and Dunn, J. P. 2020. Investigating sources of inaccuracy in wearable optical heart rate sensors. NPJ Digital Medicine 3(1): 1–9.
Chen et al. (2019) Chen, G.; Zhu, Y.; Hong, Z.; and Yang, Z. 2019. EmotionalGAN: Generating ECG to Enhance Emotion State Classification. In Proceedings of the International Conference on Artificial Intelligence and Computer Science, 309–313.
Elgendi et al. (2019) Elgendi, M.; Fletcher, R.; Liang, Y.; Howard, N.; Lovell, N. H.; Abbott, D.; Lim, K.; and Ward, R. 2019. The use of photoplethysmography for assessing hypertension. NPJ Digital Medicine 2(1): 1–11.
Elgendi et al. (2013) Elgendi, M.; Norton, I.; Brearley, M.; Abbott, D.; and Schuurmans, D. 2013. Systolic peak detection in acceleration photoplethysmograms measured from emergency responders in tropical conditions. PLoS One 8(10): e76585.
Fleming et al. (2007) Fleming, S. G.; et al. 2007. A comparison of signal processing techniques for the extraction of breathing rate from the photoplethysmogram. International Journal of Biological and Medical Sciences 2(4): 232–236.
Gil et al. (2010) Gil, E.; Orini, M.; Bailon, R.; Vergara, J. M.; Mainardi, L.; and Laguna, P. 2010. Photoplethysmography pulse rate variability as a surrogate measurement of heart rate variability during non-stationary conditions. Physiological Measurement 31(9): 1271.
Golany et al. (2020) Golany, T.; Lavee, G.; Yarden, S. T.; and Radinsky, K. 2020. Improving ECG Classification Using Generative Adversarial Networks. In Proceedings of the AAAI Conference on Artificial Intelligence, 13280–13285.
Golany and Radinsky (2019) Golany, T.; and Radinsky, K. 2019. PGANs: Personalized generative adversarial networks for ECG synthesis to improve patient-specific deep ECG classification. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, 557–564.
Goodfellow et al. (2014) Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; and Bengio, Y. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems, 2672–2680.
Hamilton (2002) Hamilton, P. 2002. Open source ECG analysis. In Computers in Cardiology, 101–104. IEEE.
Hannun et al. (2019) Hannun, A. Y.; Rajpurkar, P.; Haghpanahi, M.; Tison, G. H.; Bourn, C.; Turakhia, M. P.; and Ng, A. Y. 2019. Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nature Medicine 25(1): 65.
Jetley et al. (2018) Jetley, S.; Lord, N. A.; Lee, N.; and Torr, P. 2018. Learn to Pay Attention. In International Conference on Learning Representations.
Karlen et al. (2013) Karlen, W.; Raman, S.; Ansermino, J. M.; and Dumont, G. A. 2013. Multiparameter respiratory rate estimation from the photoplethysmogram. IEEE Transactions on Biomedical Engineering 60(7): 1946–1953.
McSharry et al. (2003) McSharry, P. E.; Clifford, G. D.; Tarassenko, L.; and Smith, L. A. 2003. A dynamical model for generating synthetic electrocardiogram signals. IEEE Transactions on Biomedical Engineering 50(3): 289–294.
Nguyen et al. (2017) Nguyen, T.; Le, T.; Vu, H.; and Phung, D. 2017. Dual discriminator generative adversarial nets. In Advances in Neural Information Processing Systems, 2670–2680.
Nilsson et al. (2005) Nilsson, L.; et al. 2005. Respiration can be monitored by photoplethysmography with high sensitivity and specificity regardless of anaesthesia and ventilatory mode. Acta Anaesthesiologica Scandinavica 49(8): 1157–1162.
Oktay et al. (2018) Oktay, O.; Schlemper, J.; Folgoc, L. L.; Lee, M.; Heinrich, M.; Misawa, K.; Mori, K.; McDonagh, S.; Hammerla, N. Y.; Kainz, B.; et al. 2018. Attention u-net: Learning where to look for the pancreas. arXiv preprint arXiv:1804.03999 .
Penttilä et al. (2001) Penttilä, J.; Helminen, A.; Jartti, T.; Kuusela, T.; Huikuri, H. V.; Tulppo, M. P.; Coffeng, R.; and Scheinin, H. 2001. Time domain, geometrical and frequency domain analysis of cardiac vagal outflow: effects of various respiratory patterns. Clinical Physiology 21(3): 365–376.
Pimentel et al. (2016) Pimentel, M. A.; Johnson, A. E.; Charlton, P. H.; Birrenkott, D.; Watkinson, P. J.; Tarassenko, L.; and Clifton, D. A. 2016. Toward a robust estimation of respiratory rate from pulse oximeters. IEEE Transactions on Biomedical Engineering 64(8): 1914–1923.
Reiss et al. (2019) Reiss, A.; Indlekofer, I.; Schmidt, P.; and Van Laerhoven, K. 2019. Deep PPG: large-scale heart rate estimation with convolutional neural networks. Sensors 19(14): 3079.
Ross et al. (2019) Ross, K.; Sarkar, P.; Rodenburg, D.; Ruberto, A.; Hungler, P.; Szulewski, A.; Howes, D.; and Etemad, A. 2019. Toward Dynamically Adaptive Simulation: Multimodal Classification of User Expertise Using Wearable Devices. Sensors 19(19): 4270.
Sarkar and Etemad (2020a) Sarkar, P.; and Etemad, A. 2020a. Self-supervised ECG Representation Learning for Emotion Recognition. IEEE Transactions on Affective Computing 1–1.
Sarkar and Etemad (2020b) Sarkar, P.; and Etemad, A. 2020b. Self-Supervised Learning for ECG-Based Emotion Recognition. In IEEE International Conference on Acoustics, Speech and Signal Processing, 3217–3221.
Sarkar et al. (2019) Sarkar, P.; Ross, K.; Ruberto, A. J.; Rodenbura, D.; Hungler, P.; and Etemad, A. 2019. Classification of Cognitive Load and Expertise for Adaptive Simulation using Deep Multitask Learning. In IEEE International Conference on Affective Computing and Intelligent Interaction, 1–7.
Sayadi, Shamsollahi, and Clifford (2010) Sayadi, O.; Shamsollahi, M. B.; and Clifford, G. D. 2010. Synthetic ECG generation and Bayesian filtering using a Gaussian wave-based dynamical model. Physiological Measurement 31(10): 1309.
Schäck et al. (2017) Schäck, T.; et al. 2017. Computationally efficient heart rate estimation during physical exercise using photoplethysmographic signals. In European Signal Processing Conference, 2478–2481.
Schäfer and Vagedes (2013) Schäfer, A.; and Vagedes, J. 2013. How accurate is pulse rate variability as an estimate of heart rate variability?: A review on studies comparing photoplethysmographic technology with an electrocardiogram. International Journal of Cardiology 166(1): 15–29.
Schmidt et al. (2018) Schmidt, P.; Reiss, A.; Duerichen, R.; Marberger, C.; and Van Laerhoven, K. 2018. Introducing wesad, a multimodal dataset for wearable stress and affect detection. In Proceedings of the International Conference on Multimodal Interaction, 400–408.
Shelley et al. (2006) Shelley, K. H.; Awad, A. A.; Stout, R. G.; and Silverman, D. G. 2006. The use of joint time frequency analysis to quantify the effect of ventilation on the pulse oximeter waveform. Journal of Clinical Monitoring and Computing 20(2): 81–87.
WHO (2017) WHO. 2017. Cardiovascular Diseases. https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds). (Accessed on 07/10/2020).
Zhang, Zhou, and Zeng (2017) Zhang, Q.; Zhou, D.; and Zeng, X. 2017. HeartID: A Multiresolution Convolutional Neural Network for ECG-Based Biometric Human Identification in Smart Health Applications. IEEE Access 5: 11805–11816.
Zhu et al. (2019a) Zhu, F.; Ye, F.; Fu, Y.; Liu, Q.; and Shen, B. 2019a. Electrocardiogram generation with a bidirectional LSTM-CNN generative adversarial network. Scientific Reports 9(1): 1–11.
Zhu et al. (2017) Zhu, J.-Y.; Park, T.; Isola, P.; and Efros, A. A. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, 2223–2232.
Zhu et al. (2019b) Zhu, Q.; Tian, X.; Wong, C.-W.; and Wu, M. 2019b. Learning Your Heart Actions From Pulse: ECG Waveform Reconstruction From PPG. bioRxiv 815258.