This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\history

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000. 10.1109/ACCESS.2021.DOI

\tfootnote

Financial support from NSERC and Dapasoft Inc. (CRDPJ529677-18) to conduct the research is highly appreciated.

\corresp

Corresponding author: ZEESHAN AHMAD (e-mail: z1ahmad@ryerson.ca)

ECG Heartbeat Classification Using Multimodal Fusion

ZEESHAN AHMAD1       ANIKA TABASSUM2    LING GUAN3       NAIMUL MEFRAZ KHAN4    Department of Electrical, Computer and Biomedical Engineering, Ryerson University, Toronto, Canada.(e-mail: z1ahmad@ryerson.ca) Master of Data Science program, Ryerson University, Toronto, Canada. (e-mail: anika.tabassum@ryerson.ca) Department of Electrical, Computer and Biomedical Engineering, Ryerson University, Toronto, Canada.(e-mail: lguan@ee.ryerson.ca) Department of Electrical, Computer and Biomedical Engineering, Ryerson University, Toronto, Canada.(e-mail: n77khan@ryerson.ca)
Abstract

Electrocardiogram (ECG) is an authoritative source to diagnose and counter critical cardiovascular syndromes such as arrhythmia and myocardial infarction (MI). Current machine learning techniques either depend on manually extracted features or large and complex deep learning networks which merely utilize the 1D ECG signal directly. Since intelligent multimodal fusion can perform at the state-of-the-art level with an efficient deep network, therefore, in this paper, we propose two computationally efficient multimodal fusion frameworks for ECG heart beat classification called Multimodal Image Fusion (MIF) and Multimodal Feature Fusion (MFF). At the input of these frameworks, we convert the raw ECG data into three different images using Gramian Angular Field (GAF), Recurrence Plot (RP) and Markov Transition Field (MTF). In MIF, we first perform image fusion by combining three imaging modalities to create a single image modality which serves as input to the Convolutional Neural Network (CNN). In MFF, we extracted features from penultimate layer of CNNs and fused them to get unique and interdependent information necessary for better performance of classifier. These informational features are finally used to train a Support Vector Machine (SVM) classifier for ECG heart-beat classification. We demonstrate the superiority of the proposed fusion models by performing experiments on PhysioNet’s MIT-BIH dataset for five distinct conditions of arrhythmias which are consistent with the AAMI EC57 protocols and on PTB diagnostics dataset for Myocardial Infarction (MI) classification. We achieved classification accuracy of 99.7% and 99.2% on arrhythmia and MI classification, respectively.

Source code at https://github.com/zaamad/ECG-Heartbeat-Classification-Using-Multimodal-Fusion

Index Terms:
Convolutional neural network, deep learning, ECG, image fusion, multimodal fusion.
\titlepgskip

=-15pt

I Introduction

Electrocardiogram is a reliable, effective and non-invasive diagnostic tool and is the best representation of electrophysiological pattern of depolarization and repolarization of the heart muscles during each heartbeat. Heart beat classification based on ECG provides conclusive information to the cardiologists about chronic cardiovascular diseases [1]. An intelligent system for diagnosing cardiovascular diseases is highly desirable because they are the leading source of death around the globe [2].

Arrhythmia is a heart rhythmic problem which occurs when electrical pulses that coordinate hearbeats cause heart to beat irregularly i.e either too slow or too fast. Arrhythmias can be caused by coronary artery disease, high blood pressure, changes in the heart muscle (cardiomyopathy), valve disorders etc.

© 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Myocardial Infarction, also known as heart attack, is caused due to the blockage of blood supply to the coronary arteries and in general to the myocardium. This blockage stops the supply of oxygen-rich blood to the heart muscle which can be life-threatening for the patient [3].

ECG beat-by-beat examination is vital for early diagnosis of cardiovascular conditions. However, differences of recording environment, variations of disease patterns among the subjects during testing, complex, non-stationary and noisy nature of ECG signal [4] make heartbeat classification a challenging and laborious exercise for cardiologists [5]. Thus, computer based novel practices are useful for automatic and autonomous detection of abnormalities in heartbeat ECG classification.

Conventional methods for heartbeat classification using ECG signal rely mostly on hand-crafted or manually extracted features using signal processing techniques such as digital filter-based methods [6], mixture of experts methods [7], threshold-based methods [8], Principal Component Analysis (PCA) [9], Fourier Transform [10] and wavelet transform  [11]. Some of the classifiers used with these extracted features are Support Vector Machines (SVM) [12], Hidden Markov Models (HMM) [13] and Neural Networks [14]. The first disadvantage with these conventional methods is the separation of feature extraction part and pattern classification part. Furthermore, these methods need expert knowledge about the input data and selected features [15]. Moreover, extracting features using subject experts is a time consuming process and features may not invariant to noise, scaling and translations and thus can fail to generalize well on unseen data.

Exemplary performance of deep neural networks (DNNs) on ECG [16] and especially the performance of CNN using ID convolution [17] and 2D convolution [18] has recently attracted attention of many researchers. Deep learning models are capable of automatically learning invariant and hierarchical features directly from the data and employ end-to-end learning mechanism that takes data as input and class prediction as output. Recent deep learning models use 1D ECG signal or 2D representation of ECG by transforming ECG signal to images or some matrix form. For 1D ECG classification, commonly used deep learning models are deep belief networks, restricted Boltzmann machines, auto encoders, CNN [19] and recurrent neural network (RNN) [20]. For 2D ECG classification, CNNs are used and the input ECG data is transformed to images or some other 2D representation. It is experimentally proved in [21] that 2D representation of ECG provides more accurate heartbeat classification compared to 1D. In our previous work [22], univariate ECG signal is transformed to images by segmenting ECG signal between successive R-R intervals and then stacking these R-R intervals row wise to form images. Finally, multidomain multimodal fusion is performed to improve the stress assessment. Experimental results proved that multidomain multimodal fusion achieved highest performance as compared to single ECG modality.

Existing deep learning methods deprived of providing robust fusion framework and rely mostly on concatenation [23] and decision level fusion [24].

In this manuscript, we deal with the shortcomings of existing deep learning models for ECG heartbeat classification by proposing two fusion frameworks that have the capacity of extracting and fusing complementary and discriminative features while reducing dimensionality as well.

The proposed work has following significant contributions:

  1. 1.

    Two multimodal fusion frameworks for ECG heartbeat classification called Multimodal Image Fusion (MIF) and Multimodal Feature Fusion (MFF), are proposed. At the input of these frameworks, we convert the heartbeats of raw ECG data into three types of two-dimensional (2D) images using Gramian Angular Field (GAF), Recurrence Plot (RP) and Markov Transition Field (MTF). Proposed fusion frameworks are computationally efficient as they keep the size of the combined features similar to the size of individual input modality features.

  2. 2.

    We transform heartbeats of ECG signal to images using Gramian Angular Field (GAF), Recurrence Plot (RP) and Markov Transition Field (MTF) to conserve the spatial domain correlated information among the data samples. These transformations result in an improvement in classification performance in contrast to the existing approaches of transforming ECG to images using spectrograms or methods involving time-frequency analysis (Short time Fourier transform or wavelet transform).

II Related Work

Deep Learning models especially CNN has been used over the years for ECG heartbeat classification for the detection of cardiovascular diseases such as arrhythmia and MI. These models include both 1D and 2D CNNs.

II-A One-dimensional CNN Approaches

Various models based on 1D CNN has been proposed in the literature for ECG classification. In [25], an active learning model based on ID CNN is presented for arrhythmia detection using ECG signal. Model performance is improved by using breaking-ties (BT) and modified BT algorithms. Authors in [26] proposed a model for adaptive real time implementation of a patient-specific ECG heartbeat classification based on 1D CNN using end-to-end learning. In [27], a novel algorithm making use of an 11-layer deep CNN is proposed for automatic detection of MI using ECG beats with and without noise. A transfer learning method based on CNN is proposed in [28] where the information learned from arrhythmia classification task is employed as a reference for the training of classifiers. A computationally intelligent method for patient screening and arrhythmia detection using CNN is proposed in [29]. The proposed method is capable of diagnosing arrhythmia conditions without expert domain knowledge and feature selection mechanism. In [30], wavelet transform based on Fourier-Bessel series expansion is proposed for the localization of ECG. The Fourier-Bessel spectrum of the ECG beats is separated into adjacent parts using the fixed order ranges and then multiscale CNN is employed for MI classification of different categories. Multi-Channel Lightweight Convolutional Neural Network (MCL-CNN) which uses squeeze convolution, the depth-wise convolution, and the point-wise convolution is proposed in [31] for MI classification. Two end-to-end deep learning models based on CNN are proposed in [32]. These models are called two stage hierarchical model. Furthermore, generative adversarial networks (GANs) is used for data augmentation and to reduce the class imbalance. In [33], authors proposed a neural network model for precise classification of heartbeats by following the AAMI inter-patient standards. This model works in two steps. In the first step the signals are preprocessed and then features are extracted from the signals. In the second step, the classification is performed by a two-layer classifier in which each layer consists of two independent fully-connected neural networks. The experiments show that the proposed model precisely detects arrhythmia conditions. In [34], authors proposed a complex deep learning model consists of CNN and LSTM. This model classifies six types of ECG signals by processing ten seconds ECG slices of MIT-BIH arrhythmia dataset. Experimental results proved that the proposed model could be used by cardiologists to detect arrhythmia. In [35], authors presented CNN based model for proper diagnoses of congestive heart failure using ECG. The testing and training of the proposed model was carried out on publicly available ECG datasets. Performance of the proposed model shows the authenticity of model for congestive heart failure detection.

II-B Two-dimensional CNN Approaches

The knock out performance of CNN on 2D data such as images convinced the researchers to convert raw ECG data to images for improved results. In [21], short-time Fourier transform is used to convert ECG signal into time-frequency spectrograms that were used as input to CNN for arrhythmia classification. Experimental results show that 2D-CNN achieved higher classification accuracy than 1D-CNN. In [36], ECG signal is converted into spectro-temporal images that were sent as an input to multiple dense convolutional neural network to capture both beat-to-beat and single-beat information for analysis. Authors in [37] transformed heartbeat time intervals of ECG signals to images using wavelet transform. These images are used to train a six layer CNN for heartbeat classification. In [38], Generative neural network is used to convert the raw 1D ECG signal data into a 2D image. These images are input to DenseNet which produces highly accurate classification, with high sensitivity and specificity using 4 classes of heart beat detection. To distinguish abnormal ECG samples from normal, authors in [39] used pretrained CNNs such as AlexNet, VGG-16 and ResNet-18 on spectrograms obtained from ECG. Using a transfer learning approach, the highest accuracy of 83.82% is achieved by AlexNet. In [40], multi-lead ECG are treated as 2D matrices for input to a novel model called multilead-CNN (ML-CNN) which employs sub two-dimensional (2D) convolutional layers and lead asymmetric pooling (LAP) layers. In [41], authors generated dual beat coupling matrix from the sections of heartbeats. This dual beat coupling matrix was then as 2D input to a CNN classifier. Gray-level co-occurrence matrix (GLCM), obtained from ECG data is employed for features vector description due to its exceptional statistical feature extraction ability in [42]. In [43], ECG signals were segmented into heartbeats and each of the heartbeats were transformed to 2D grayscale images which were input to CNN. In [44], two second segments of ECG signal are transformed to recurrence plot images to classify arrhythmia in two steps using deep learning model. In the first step the noise and ventricular fibrillation (VF) categories were recognized and in the second step, the atrial fibrillation (AF), normal, premature AF, and premature VF labels were classified. Experimental results show the promising performance of the proposed method.

II-C Fusion based approaches

Fusing different modalities mitigates the weaknesses of individual modalities both in 1D and 2D forms by integrating complementary information from the modalities to perform the analysis and classification tasks accurately. In [45], a Multi-scale Fusion convolutional neural network (MS-CNN) is proposed for heartbeat classification using ECG signal. The Multi-scale Fusion convolutional neural network is a two stream network consisting of 13 layers. The features obtained from the last convolutional layer are concatenated before classification. Another Deep Multi-scale Fusion CNN (DMSFNet) is proposed in [46] for arrhythmia detection. Proposed model consists of backbone network and two different scale-specific networks. Features obtained from two scale specific networks are fused using a spatial attention module. Patient-specific heartbeat classification network based on a customized CNN is proposed in [47]. CNN contains an important module called multi-receptive field spatial feature extraction (MRF-SFE). The MRF-SFE module is designed for extracting multispatial deep features of the heartbeats using five parallel convolution layers with different receptive fields. These features are concatenated before being sent to the third convolutional layer for further processing. Two stage serial fusion classifier system based on SVM’s rejection option is proposed in [48]. SVM’s distance outputs are related with confidence measure and then ambiguous samples are rejected with first level SVM classifier. The rejected samples are then forwarded to a second stage Logistic Regression classifier and then late fusion is performed for arrhythmia classification. Authors in [49] presented a unique feature fusion method called parallel graphical feature fusion where all the focus is given to geometric features of data. Original signal was first split into subspaces, then multidimensional features are extracted from these subspaces and then mapped to the points in high-dimensional space. Multi-stage feature fusion framework based on CNN and attention module was proposed in [50] for multiclass arrhythmia detection. Classification is performed by extracting features from different layers of CNN. Combination of CNN and the attention module shows the improved discrimination power of the proposed model for ECG classification.

The shortcoming in the existing fusion methods is that they depend mostly on concatenation fusion. Concatenation leads towards the problem computational complexity, curse of dimensionality and hence the degradation in classification accuracy [51]. In this paper, we address the imperfections of the existing literature and propose two fusion frameworks called Multimodal Image Fusion (MIF) and Multimodal Feature Fusion (MFF) which extract and fuse the features while reducing dimensionality as well. The proposed fusion frameworks are described in section III.

Refer to caption
Figure 1: Complete Overview of the Proposed Multimodal Image Fusion (MIF) Framework. We fused GAF, RP and MTF image to form a triple-channel (GAF-RP-MTF) compound image containing both static and dynamic features of input images.
Refer to caption
Figure 2: Complete Overview of the Proposed Multimodal Feature Fusion (MFF) Framework. The MFF extracted features from fc-7 layer of AlexNet. These features are then integrated through Gated Fusion Network (GFN) and are finally sent to the classifier
Refer to caption
Figure 3: Structure of the proposed Gated Fusion Network. Input feature f1f_{1}, f2f_{2} and f3f_{3} from modalities are convolved with high boost kernel and then gated values w1w_{1}, w2w_{2} and w3w_{3} are generated using sigmoid function. Finally, these gated values are multiplied element-wise with input features to perform fusion.
Refer to caption
Figure 4: Architecture of CNN for Signal Image of size 64 x 64.

III Materials and Methods

This section explains the proposed fusion frameworks called Multimodal Image Fusion (MIF) and Multimodal Feature Fusion (MFF). The common element in both of the proposed fusion framework is ECG signal to image transformation as shown in Figures 1 and 2. Therefore in this section, first we will explain ECG signal to image transformation and then MIF, MFF and the two important elements of MFF, gated fusion network shown in Fig. 3 and architecture of CNN shown in Fig. 4, will be explained.

III-A ECG Signal to Image Transformation

For each fusion framework, we transform the input heart-beats into three types of images called GAF, RP and MTF images.

III-A1 Formation of Images by Gramian Angular Field (GAF)

Converting heart-beats of ECG into Gramian Angular Field (GAF) images maps the ECG in an angular coordinate system instead of typical rectangular coordinate system.

Consider that EE is an ECG signal of nn samples such that E={s1,s2,s3,,sk,sl,,sn}E=\{s_{1},s_{2},s_{3},...,s_{k},s_{l},...,s_{n}\}. We normalized EE between 0 and 1 to get E¯\overline{E}. Now we map the normalized ECG in angular coordinate system by transforming the value into the angular cosine and the time stamps into the radius. Following equation is used to explain this encoding.

β=arccos(sk0)R=tkC\begin{split}\beta=arccos(s_{k0})\\ R=\frac{t_{k}}{C}\end{split} (1)

In the above equation, sk0s_{k0} is normalized kthkth sample of the ECG, tkt_{k} is the time stamp for sk0s_{k0} and CC is a constant to adjust the spread of the angular coordinate system. This encoding provides two benefits. It is bijective and it conserves the spatial domain affiliations through the RR [52]. Since the image location with respect to the ECG heart beat samples is consistent along the principal diagonal, therefore, the original heart beat samples of ECG can be restored from angular coordinates [53].

The angular viewpoint of the encoded image can be exploited by taking into account the sum/difference between each sample to indicate the correlation among various time stamps. The summation method, used in this article is explained by the following set of equations.

Grammianfield=cos(βk+βl)Grammianfield=cos(\beta_{k}+\beta_{l}) (2)
Grammianfield=E¯T.E¯IE¯2T.IE¯2Grammianfield=\overline{E}^{T}.\overline{E}-{\sqrt{I-\overline{E}^{2}}}^{T}.\sqrt{I-{\overline{E}}^{2}} (3)

II is the unit row vector in equation 3

GAF Images of five different categaories for MIT-BIH dataset are shown in Fig 5.

Refer to caption
Figure 5: GAF, RP and MTF Images of MIT-BIH dataset according to the five different heartbeats defined in Table II

III-A2 Formation of Images by Recurrence Plot (RP)

ECG is a non-stationary signal, therfore to visulaize the recurrent behavior and to observe the recurrence pattern of ECG signal [54], we encode ECG heartbeats into RP images. An RP image obtained from a heartbeat of ECG represents spacing between time points [55].

For ECG signal EE defined in section III-A1, the recurrence plot is given by

Rplot=α(λs(k)s(l))R-plot=\alpha(\lambda-||s(k)-s(l)||) (4)

where λ\lambda is threshold and α\alpha is the heaviside function.

RP Images of five different categaories for MIT-BIH dataset are shown in Fig 5.

III-A3 ECG to Markov Transition Field (MTF) image conversion

For ECG heartbeats to MTF image encoding, we used the same approach explained in [56]. Let EE is the ECG signal defined in section III-A1, then the foremost step is to define its BB bins based on quantiles and assign every sks_{k} to the related bins bj(jϵ[1,B])b_{j}(j\epsilon[1,B]). Second step is the construction of B×BB\times B weighted adjacency matrix WW by computing tranformations within quantile bins like a first-order Markov chain on the time axis. Weighted adjacency matrix in the normalized form is called Markov transition matrix and is non-reative to the spatial domain characteristics, resulting in information loss. For handling the loss of information, Markov transition matrix is transformed to Markov transition field matrix (MTF) by stretching the transition likelihoods corresponding to the spatial domain locations. The MTF matrix is denoted by M and is shown below

M=[wlk|s1ϵbl,s1ϵbkwlk|s1ϵbl,snϵbkwlk|s2ϵbl,s1ϵbkwlk|s2ϵbl,snϵbkwlk|snϵbl,s1ϵbkwlk|snϵbl,snϵbk]M=\begin{bmatrix}w_{lk|s_{1}\epsilon b_{l},s_{1}\epsilon b_{k}}&\dots&w_{lk|s_{1}\epsilon b_{l},s_{n}\epsilon b_{k}}\\ w_{lk|s_{2}\epsilon b_{l},s_{1}\epsilon b_{k}}&\dots&w_{lk|s_{2}\epsilon b_{l},s_{n}\epsilon b_{k}}\\ \vdots&\ddots&\vdots\\ w_{lk|s_{n}\epsilon b_{l},s_{1}\epsilon b_{k}}&\dots&w_{lk|s_{n}\epsilon b_{l},s_{n}\epsilon b_{k}}\par\end{bmatrix} (5)

Where wlkw_{lk} is the frequency of transition of a point between two quantiles. Since the formation of transformed matrix depends upon the chances of moving element, the MTF cannot be restored to original ECG signal.

Bins are the quantiles where the probability distribution is same. Any number of bins can be selected for ECG to MTF images. We decided to take 10 bins as the data is normalized between 0 and 1. These bins are defined during the formation of Weighted adjacency matrix which is the first step for creating MTF matrix shown in equation 5.

MTF Images of five different categaories for MIT-BIH dataset are shown in Fig 5.

For ECG to image transformation using GAT, RP and MTF methods, we are using the full length of heartbeats to transform 1D information to 2D. Therefore, ECG signal of any length can be transformed to images and then can be resized using interpolation.

We can see from Fig. 5, that for each kind of image (GAF, RP and MTF), the gray scale images are more interpretable. These images show different patterns for each of the five categories of MIT-BIH dataset. The x-y values of the 2D images are just pixel values of the GAF, RP, and MTF images.

III-B Multimodal Image Fusion Framework

Multimodal Image Fusion (MIF) framework is shown in Fig. 1. At the input, we transform the heartbeats of raw ECG signal into three types of images as described in section III-A and shown in Fig. 5. The motivation of choosing GAF, MTF and RP is that they are three different statistical methods of transforming ECG to images. During transformation they preserve the temporal information and hence they are lossless transformations. We combine these three gray scale images to form a triple channel image (GAF-RP-MTF). A triple channel image is a colored image in which GAF, RP and MTF images are considered as three orthogonal channels like three different colors in RGB image space. However, this three-channel image is not conventional way of converting a gray scale image to RGB, rather in this paper all three gray scale images are formed from raw ECG data with different statistical methods. Thus, a three-channel image in the presented work carries statistical dynamics of the ECG and therefore, is more informative. Furthermore, three-channel image can be easily utilized with with off-the-shelf CNNs like AlexNet.

We use AlexNet, (CNN based model) [57] for feature extraction and classification tasks and thus employ end-to-end deep learning where feature extraction and classification parts are embedded in a single network as shown in Fig. 1.

III-C Multimodal Feature Fusion Framework

At the input of MFF, we transform ECG heartbeats into images as shown in Fig. 2. AlexNets are employed to learn features from input imaging modality. We extract these learned features from (fc-7) of each AlexNet and are then fused by an efficient Gated Fusion Network (GFN), backbone of the proposed MFF, which fuses the features effectively by taking care of their dimensionalities as well. These fused features are input of the SVM classifier as shown in Fig. 2.

III-C1 Gated Fusion Network

The architecture of our proposed gated fusion network (GFN) is shown in Fig. 3. We have adapted this network from our previous work in [58]. The input to the GFN are the features extracted from the second last fully connected layer (fc-7) of each AlexNet as shown in Fig. 2.

Let f1f_{1}, f2f_{2} and f3f_{3} be the features from each imaging modality respectively. These feature are then convolved with high boost kernel KK as shown in Fig. 3.

We used high boost filter for convolution with features since this filter precisely recognize important information of feature and accredits boosted value to every element of features according to its importance [59]. High boost filter is the difference between scaled version and low-pass version of the input image as shown below in equation 6.

fhb(m,n)=cf(m,n)flp(m,n)f_{hb}(m,n)=cf(m,n)-f_{lp}(m,n) (6)

where cf(m,n)cf(m,n) and flp(m,n)f_{lp}(m,n) are respectively the scaled version and low pass version of image f(m,n)f(m,n)

In general, high boost filter is given by

K=[1111c+81111]K=\begin{bmatrix}\par-1&-1&-1\\ -1&c+8&-1\\ -1&-1&-1\par\end{bmatrix} (7)

where cc is the amplification factor that assigns the weights to the feature during convolution.

The best filter performance is obtained for cc = 1. Other values of cc produces less amplification.

Thus, following high boost kernel is selected empirically that highlights the important characteristics.

K=[111191111]K=\begin{bmatrix}-1&-1&-1\\ -1&9&-1\\ -1&-1&-1\par\end{bmatrix} (8)

High boost filter highlights the high frequency components while conserving the low frequency components.

After convolution of features with the high boost filter, sigmoid function is used for generating proper gated weights w1w_{1}, w2w_{2} and w3w_{3} respectively as shown in Fig. 3. Finally, we obtained point-wise product of the weights w1w_{1}, w2w_{2} and w3w_{3} and the features f1f_{1}, f2f_{2} and f3f_{3} respectively, to perform feature fusion and to generate fused features. The working of GFN can be understood by the following equations.

w1=σ(f1K)w_{1}=\sigma(f_{1}\circledast\ K) (9)
w2=σ(f2K)w_{2}=\sigma(f_{2}\circledast\ K) (10)
w3=σ(f3K)w_{3}=\sigma(f_{3}\circledast\ K) (11)
Ff(j)=w1f1(j)+w2f2(j)+w3f3(j)F_{f}(j)=w_{1}\odot f_{1}(j)+w_{2}\odot f_{2}(j)+w_{3}\odot f_{3}(j) (12)

Where,

σ(x)11+ex\sigma(x)\triangleq\frac{1}{1+e^{-x}} : Sigmoid Function.

aba\circledast b : Convolution

aba\odot b : Point Wise Multiplication

Fi(k)F_{i}(k) : kkth feature of iith modality

Ff(k)F_{f}(k) : kthkth Fused feature

III-C2 CNN Architecture

Architecture of CNN used in proposed MFF is shown in Fig. 4. It consists of three convolutional layers, two pooling layers, and a fully connected layer. The first convolutional layer has 16 kernels of size 5x5, followed by pooling layer of size 2x2 and stride 2. Second and third convolutional layers have 32 kernels of size 5x5 followed by 2x2 pooling layer with stride 2.

III-D Classification Task and Classifier

The classification task of the proposed methods is ECG heart beat classification for arrythmia and MI detection.

The classification metrics used for classification are accuracy, precision and recall as shown in Tables VVIVII and VIII. The accuracies, precisions and recalls are calculated using following equations.

Accuracy=TP+TNTP+TN+FP+FNAccuracy=\frac{TP+TN}{TP+TN+FP+FN} (13)
Precision=TPTP+FPPrecision=\frac{TP}{TP+FP} (14)
Recall=TPTP+FNRecall=\frac{TP}{TP+FN} (15)

where,

TPTP = True positive

TNTN = True negative

FPFP = False positive

FNFN = False negative

We used Softmax classifier in proposed MIF and Support Vector Machines (SVM) classifier in proposed MFF for classification task.

Softmax classifier is a multiclass classifier or regressor used in the fields of machine learning. Score function for softmax classifier computes the class specific probabilities whose sum is 1.

The mathematical representation of score function for softmax classifer is shown below.

f(y)=eyjkeykf(y)=\frac{e^{y_{j}}}{\sum_{k}e^{y_{k}}} (16)

where yy is the input vector and the score function maps the exponent domain to the probabilities.

In simplest form, the score function for SVM is the mapping of the input vector to the scores and is a simple matrix operation as shown in Equation 17.

f=Wx+bf=Wx+b (17)

Where xx is the input vector, WW is the weight determined by input vector and the number of classes and bb is the bias vector.

TABLE I: Training Parameters for AlexNet and CNN
Training Parameters Values
Momentum 0.9
Initial Learn Rate 0.005
Learn Rate Drop Factor 0.5
Learn Rate Drop Period 10
L2L_{2} Regularization 0.004
MiniBatchSize 128

III-E Training and Optimization

We resize images to 227 x 227 to perform experiments with AlexNet. We also perform experiments with smaller but computationally efficient CNN, whose architecture is shown in Fig. 4, to show that proposed frameworks can achieve comparable performance even with the smaller CNN. The comparison in terms of computational cost between both CNN models is provided in Table XI. We fine tune Alexnet by reducing the size of second last fully connected layer ’fc7’ from 4096 to 512 and the size of last fully connected layer ’fc8’ from 1000 to size equal to the number of classes in our datasets. The size of “fc7” layer of AlexNet is 4096 which is according to size of classification layer which is 1000. For our MIT-BIH dataset and PTB dataset, we need the size of classification layer equal to 5 and 2 respectively due to number of classes in these datasets. Thus to make ‘fc7’ compatible with classification layer, we reduce its size to 512. The training parameters for AlexNet and CCN are shown in Table I.

For optimization of the deep networks, we used Stochastic Gradient Descent with Momentum (SGDM) algorithm. SGDM is a method which helps accelerate gradients vectors in the right directions, thus leading to faster converging. It is one of the most popular optimization algorithms and many state-of-the-art models are trained using it.

IV Experimental Results

IV-A ECG Databases

Experiments are performed with PhysioNet MIT-BIH Arrhythmia dataset [60] [61] for heartbeat classification and PTB Diagnostic ECG dataset [62] for MI classification using both proposed fusion frameworks. For experiments, ECG lead-II re-sampled data at sampling frequency of 125Hz is used as the input.

We used the standardized form of both datasets provided in [63]. These datasets are already denoised and the training and testing parts are provided in the form of standard ECG heartbeats. Furthermore, five classes of arrythmia and MI localization has already been done and provided in terms of standard ECG heart-beats. Our study focused on ECG to image transformation and to the design of proposed multimodal fusion frameworks. The main focus is increasing the overall performance of classification of heartbeats. We did not attempt at modeling or solving for a specific type of noise.

We conduct our experiments on Matlab R2020a on a desktop computer with NVIDIA GTX-1070 GPU.

The experimental results are discussed in detail in section V.

TABLE II: Mapping between annotations and AAMI EC57 [64] categories

Categories Annotations N Normal Left/Right bundle branch block Atrial escape Nodal escape S Atrial Premature Aberrant atrial premature Nodal premature Supra-ventricular premature V Premature ventricular contraction Ventricular escape F Fusion of ventricular and normal Q Paced Fusion of paced and normal Unclassifiable

TABLE III: Information about Number of Heartbeats before and after SMOTE for training component of MIT-BIH Dataset

Dataset Classes Original heartbeats Number of heartbeats after SMOTE MIT-BIH N 72471 72471 S 2223 30000 V 5788 20000 F 641 20000 Q 6431 10000

TABLE IV: Training and Testing Samples of datasets

Dataset Training Samples Testing Samples MIT-BIH 152471 21892 PTB 11641 2911

IV-A1 PhysioNet MIT-BIH Arrhythmia Dataset

Forty seven subjects were involved during the collection of ECG signals for the dataset. The data was collected at the sampling rate of 360Hz and each beat is annotated by at least two experts. Using these annotations, five different beat categories are created in accordance with Association for the Advancement of Medical Instrumentation (AAMI) EC57 standard [64] as shown in Table II.

For training on CNN, we need large number of samples. We use the same testing and training segments provided in [63] to train on CNNs. Since there is a class-imbalanced in the training part of the dataset as apparent from the numbers, we applied SMOTE [65] to upsample the minority classes (classes other than N) and finally settled on the numbers shown in the right column of Table III.

SMOTE is a data augmented technique which is used to reduce overfitting during training and is helpful to reduce the biasness of classifier.

We perform experiments using both proposed fusion frameworks on MIT-BIH dataset with the training and testing samples shown in Table IV and with the training parameters shown in Tables I. The experimental results are shown in Tables V and VI.

TABLE V: Experimental results of MIT-BIH Dataset using AlexNet.
Modalities Accuracies% Precision% Recall%
GAF Images only 97.3 85 91
RP Images only 97.2 82 93
MTF Images only 91.5 86 89
Concatenation Fusion 97 82 91
Average Fusion 98.5 95 93.1
Proposed MIF 98.6 93 92
Proposed MFF 99.7 98 98
TABLE VI: Experimental results of MIT-BIH Dataset using simpler CNN of Fig. 4

. Modalities Accuracies% Precision% Recall% GAF Images(gray scale) 94.2 74.2 91 RP Images(gray scale) 96.3 80 90 MTF Images(gray scale) 94 72 86 Concatenation Fusion 94.6 80.4 84 Average Fusion 97.6 87 92 Proposed MFF 98.3 90.5 93

TABLE VII: Experimental results of PTB Dataset using AlexNet.
Modalities Accuracies% Precision% Recall%
GAF Images only 98.4 98 96
RP Images only 98 98 94
MTF Images only 95.3 94 89
Concatenation Fusion 97.4 95 95
Average Fusion 98.5 97 98
Proposed MIF 98.4 98 94
Proposed MFF 99.2 98 98
TABLE VIII: Experimental results of PTB Dataset using simpler CNN of Fig. 4

. Modalities Accuracies% Precision% Recall% GAF Images (gray scale) 94.7 91 90 RP Images (gray scale) 95.1 95 87 MTF Images(gray scale) 86.6 80 69 Concatenation Fusion 92.2 88 84 Average Fusion 96.3 91 94 Proposed MFF 96.5 94 93

IV-A2 PTB Diagnostic ECG dataset

Two hundred and ninety (290) subjects took part during collection of ECG records for PTB Diagnostics dataset. 148 of them are diagnosed as MI, 52 healthy control, and the rest are diagnosed with 7 different diseases. Frequency of 100Hz is used for each ECG record from 12 leads. However, for our experiments, we used lead II ECG recordings and worked with healthy control and MI categories.

We perform experiments using both proposed fusion frameworks on PTB dataset with training and testing samples shown in Table IV and with training parameters shown in Tables I. Training and testing parts of the dataset are provided in [63] to train CNN models. The experimental results are shown in Tables VII and VIII

V Discussion

We present the comparative results of the proposed frameworks with the state-of-the art methods in Tables IX and X. As we can see, our proposed frameworks considerably outperform the existing methods in terms of accuracy, precision, and recall.

To justify the importance of the proposed fusion frameworks, we assess the performance of different components of the proposed framework with both datasets by concatenation and average fusion methods. We performed average fusion by accrediting the unity value to all the weights i.e w1w_{1} = 1, w2w_{2} = 1 and w3w_{3} = 1 in the gated fusion network. Since we have three modalities, therefore, by taking simple average, we get the equal value of 0.333 for each weight. We also experiment with 0.333 and get the same results. Since weights are equal in average fusion, therefore, to make things simpler, we assign a unity value to every weight. It is possible that better weight can be acquired through trainable weight coefficients. This is something we plan to investigate in future. Tables VVIVII and VIII reports the results of assessing different fusion methods along with proposed fusion frameworks.

TABLE IX: Comparison of heart beat Classification results of MITBIH Dataset with Previous Methods
Previous Methods Accuracies% Precision% Recall%
Izci et al. [43] 97.96 - -
Dang et al. [23] 95.48 96.53 87.74
Li et al. [47] 99.5 97.3 98.1
Zhao et al. [49] 98.25 - -
Oliveria et al. [37] 95.3 - -
Huang et al. [21] 99 - -
Shaker et al. [32] 98 90 97.7
Kachuee et al. [28] 93.4 - -
Xu et al. [66] 95.9 - -
He et al. [67] 98.3 - -
Qiao et al. [68] 99.3 - -
Proposed MIF 98.6 93 92
Proposed MFF 99.7 98 98
TABLE X: Comparison of MI Classification results of PTB Dataset with Previous Methods
Previous Methods Accuracies% Precision% Recall%
Dicker et al. [39] 83.82 82 95
Acharya et al. [27] 95.22 95.49 94.19
Kojuri et al. [69] 95.6 97.9 93.3
Kachuee et al. [28] 95.9 95.2 95.1
Liu et al. [40] 96 97.37 95.4
Sharma et al. [12] 96 99 93
Chen et al. [31] 96.18 97.32 93.67
Cao et al. [70] 96.65 - -
Ahamed et al. [71] 97.66 - -
Proposed MIF 98.4 98 94
Proposed MFF 99.2 98 98

The performance of concatenation fusion is poor as compared to other methods as shown by experimental results. Concatenation fusion creates high dimensional feature vector that leads to the additional computational cost and deterioration of information during classification [72].

We also provide the comparison of both proposed fusion frameworks in terms of inference speed as shown in Table XII. Inference speed is the time consumed by classifier to recognize one test sample. It is expressed in microseconds (μ\mus). It is observed that MFF yields high accuracy, precision and recall for both datasets as compared to MIF, however, MIF is computationally efficient in terms of inference speed.

TABLE XI: Comparison of Computational Cost of AlexNet and CNN of Fig. 4 using MFF Framework on MIT-BIH Dataset

CNN Model Fusion Framework Training Parameters AlexNet MFF 9259427 AlexNet MIF 3086475 CNN of Fig. 4 MFF 612069

Since we experiment with two different CNNs, we provide comparison between both CNNs in terms of computational cost as shown in Table XI. Since there is a trade off between accuracy and computational cost, we observe from Tables VVI and XI that CNN, shown in Fig. 4, is less accurate than AlexNet but is computationally efficient.

TABLE XII: Comparison of Inference Speed of both Proposed Fusion Frameworks using AlexNet.

Dataset Fusion Method Inference Speed(μ\mus) MIT-BIH Multimodal Image Fusion 1233 Multimodal Feature Fusion 1670 PTB Multimodal Image Fusion 1205 Multimodal Feature Fusion 1470

We prefer SVM classifier over softmax classifier since we have experimentally proved in our previous work [73] that SVM performs better than softmax, which is typically built into any CNN framework. Softmax classifier reduces the cross entropy function while SVM employs a margin based function. The more rigorous nature of classification is the reason of better performance of SVM over softmax.

The comparison provided in Tables IX and X is on the basis of datasets and the performance metrics. There are slight changes in the conditions for testing in few of the comparisons, However, it is appropriate to compare the results.

The limitation of the proposed Multimodal Image Fusion (MIF) Framework is that it requires exactly three different statistical gray scale images for creating a triple channel compound image. Since Multimodal Feature Fusion (MFF) Framework is using three separate AlexNet for training on GAF, RP and MTF images, it requires more time for training and inference.

VI Conclusion

We proposed two computationally efficient multimodal fusion frameworks for ECG heart beat classification called Multimodal Image Fusion (MIF) and Multimodal Feature Fusion (MFF). At the input of these frameworks, we convert ECG signal into three types of images using Gramian Angular Field (GAF), Recurrence Plot (RP) and Markov Transition Field (MTF). In MIF, we first perform image fusion by combining three input images to create a three channel single image which used as input to the CNN. In MFF, highly informative cues are pulled out from penultimate layer of CNN and they are fused and used as input for the SVM classifier. We demonstrate the superiority of the proposed fusion frameworks by performing experiments on PhysionNet’s MIT-BIH for five different arrhythmias and on PTB diagnostics dataset for MI classification. Experimental results prove that we beat the previous state-of-the-art in terms of classification accuracy, precision and recall. The important finding of this study is that the multimodal fusion of modalities increases the performance of the machine learning task as compare to use the modalities individually.

References

  • [1] L. Sun, Y. Lu, K. Yang, and S. Li, “Ecg analysis using multiple instance learning for myocardial infarction detection,” IEEE transactions on biomedical engineering, vol. 59, no. 12, pp. 3348–3356, 2012.
  • [2] Y. Xia, X. Liu, D. Wu, H. Xiong, L. Ren, L. Xu, W. Wu, and H. Zhang, “Influence of beat-to-beat blood pressure variability on vascular elasticity in hypertensive population,” Scientific reports, vol. 7, no. 1, pp. 1–8, 2017.
  • [3] U. R. Acharya, N. Kannathal, L. M. Hua, and L. M. Yi, “Study of heart rate variability signals at sitting and lying postures,” Journal of bodywork and Movement Therapies, vol. 9, no. 2, pp. 134–141, 2005.
  • [4] U. R. Acharya, Y. Hagiwara, J. E. W. Koh, S. L. Oh, J. H. Tan, M. Adam, and R. San Tan, “Entropies for automated detection of coronary artery disease using ecg signals: A review,” Biocybernetics and Biomedical Engineering, vol. 38, no. 2, pp. 373–384, 2018.
  • [5] Z. Zhang, J. Dong, X. Luo, K.-S. Choi, and X. Wu, “Heartbeat classification using disease-specific feature selection,” Computers in biology and medicine, vol. 46, pp. 79–89, 2014.
  • [6] E. Pasolli and F. Melgani, “Active learning methods for electrocardiographic signal classification,” IEEE Transactions on Information Technology in Biomedicine, vol. 14, no. 6, pp. 1405–1416, 2010.
  • [7] Y. H. Hu, S. Palreddy, and W. J. Tompkins, “A patient-adaptable ecg beat classifier using a mixture of experts approach,” IEEE transactions on biomedical engineering, vol. 44, no. 9, pp. 891–900, 1997.
  • [8] V. Chouhan and S. Mehta, “Threshold-based detection of p and t-wave in ecg using new feature signal,” International Journal of Computer Science and Network Security, vol. 8, no. 2, pp. 144–153, 2008.
  • [9] N. A. Bhaskar, “Performance analysis of support vector machine and neural networks in detection of myocardial infarction,” Procedia Computer Science, vol. 46, no. 4, pp. 20–30, 2015.
  • [10] K.-i. Minami, H. Nakajima, and T. Toyoshima, “Real-time discrimination of ventricular tachyarrhythmia with fourier-transform neural network,” IEEE transactions on Biomedical Engineering, vol. 46, no. 2, pp. 179–185, 1999.
  • [11] H. Khorrami and M. Moavenian, “A comparative study of dwt, cwt and dct transformations in ecg arrhythmias classification,” Expert systems with Applications, vol. 37, no. 8, pp. 5751–5757, 2010.
  • [12] L. Sharma, R. Tripathy, and S. Dandapat, “Multiscale energy and eigenspace approach to detection and localization of myocardial infarction,” IEEE transactions on biomedical engineering, vol. 62, no. 7, pp. 1827–1837, 2015.
  • [13] P.-C. Chang, J.-J. Lin, J.-C. Hsieh, and J. Weng, “Myocardial infarction classification with multi-lead ecg using hidden markov models and gaussian mixture models,” Applied Soft Computing, vol. 12, no. 10, pp. 3165–3175, 2012.
  • [14] H. Lu, K. Ong, and P. Chia, “An automated ecg classification system based on a neuro-fuzzy system,” in Computers in Cardiology 2000. Vol. 27 (Cat. 00CH37163).   IEEE, 2000, pp. 387–390.
  • [15] K. A. Sidek, I. Khalil, and H. F. Jelinek, “Ecg biometric with abnormal cardiac conditions in remote monitoring system,” IEEE Transactions on systems, man, and cybernetics: systems, vol. 44, no. 11, pp. 1498–1509, 2014.
  • [16] V. Krasteva, S. Ménétré, J.-P. Didon, and I. Jekova, “Fully convolutional deep neural networks with optimized hyperparameters for detection of shockable and non-shockable rhythms,” Sensors, vol. 20, no. 10, p. 2875, 2020.
  • [17] I.-C. Tanoh and P. Napoletano, “A novel 1-d ccanet for ecg classification,” Applied Sciences, vol. 11, no. 6, p. 2758, 2021.
  • [18] M. Wasimuddin, K. Elleithy, A. Abuzneid, M. Faezipour, and O. Abuzaghleh, “Multiclass ecg signal analysis using global average-based 2-d convolutional neural network modeling,” Electronics, vol. 10, no. 2, p. 170, 2021.
  • [19] M. Längkvist, L. Karlsson, and A. Loutfi, “A review of unsupervised feature learning and deep learning for time-series modeling,” Pattern Recognition Letters, vol. 42, pp. 11–24, 2014.
  • [20] R. Salloum and C.-C. J. Kuo, “Ecg-based biometrics using recurrent neural networks,” in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).   IEEE, 2017, pp. 2062–2066.
  • [21] J. Huang, B. Chen, B. Yao, and W. He, “Ecg arrhythmia classification using stft-based spectrogram and convolutional neural network,” IEEE Access, vol. 7, pp. 92 871–92 880, 2019.
  • [22] Z. Ahmad and N. Khan, “Multi-level stress assessment using multi-domain fusion of ecg signal,” in 42nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society.   IEEE, 2020, pp. 4518–4521.
  • [23] H. Dang, M. Sun, G. Zhang, X. Zhou, Q. Chang, and X. Xu, “A novel deep convolutional neural network for arrhythmia classification,” in 2019 International Conference on Advanced Mechatronic Systems (ICAMechS).   IEEE, 2019, pp. 7–11.
  • [24] P. De Chazal, M. O’Dwyer, and R. B. Reilly, “Automatic classification of heartbeats using ecg morphology and heartbeat interval features,” IEEE transactions on biomedical engineering, vol. 51, no. 7, pp. 1196–1206, 2004.
  • [25] Y. Xia and Y. Xie, “A novel wearable electrocardiogram classification system using convolutional neural networks and active learning,” IEEE Access, vol. 7, pp. 7989–8001, 2019.
  • [26] S. Kiranyaz, T. Ince, and M. Gabbouj, “Real-time patient-specific ecg classification by 1-d convolutional neural networks,” IEEE Transactions on Biomedical Engineering, vol. 63, no. 3, pp. 664–675, 2015.
  • [27] U. R. Acharya, H. Fujita, S. L. Oh, Y. Hagiwara, J. H. Tan, and M. Adam, “Application of deep convolutional neural network for automated detection of myocardial infarction using ecg signals,” Information Sciences, vol. 415, pp. 190–198, 2017.
  • [28] M. Kachuee, S. Fazeli, and M. Sarrafzadeh, “Ecg heartbeat classification: A deep transferable representation,” in 2018 IEEE International Conference on Healthcare Informatics (ICHI).   IEEE, 2018, pp. 443–444.
  • [29] B. Pourbabaee, M. J. Roshtkhari, and K. Khorasani, “Deep convolutional neural networks and learning ecg features for screening paroxysmal atrial fibrillation patients,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 48, no. 12, pp. 2095–2104, 2018.
  • [30] R. K. Tripathy, A. Bhattacharyya, and R. B. Pachori, “Localization of myocardial infarction from multi-lead ecg signals using multiscale analysis and convolutional neural network,” IEEE Sensors Journal, vol. 19, no. 23, pp. 11 437–11 448, 2019.
  • [31] Y. Chen, H. Chen, Z. He, C. Yang, and Y. Cao, “Multi-channel lightweight convolution neural network for anterior myocardial infarction detection,” in 2018 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI).   IEEE, 2018, pp. 572–578.
  • [32] A. M. Shaker, M. Tantawi, H. A. Shedeed, and M. F. Tolba, “Generalization of convolutional neural networks for ecg classification using generative adversarial networks,” IEEE Access, vol. 8, pp. 35 592–35 605, 2020.
  • [33] H. Wang, H. Shi, K. Lin, C. Qin, L. Zhao, Y. Huang, and C. Liu, “A high-precision arrhythmia classification method based on dual fully connected neural network,” Biomedical Signal Processing and Control, vol. 58, p. 101874, 2020.
  • [34] C. Chen, Z. Hua, R. Zhang, G. Liu, and W. Wen, “Automated arrhythmia classification based on a combination network of cnn and lstm,” Biomedical Signal Processing and Control, vol. 57, p. 101819, 2020.
  • [35] M. Porumb, E. Iadanza, S. Massaro, and L. Pecchia, “A convolutional neural network approach to detect congestive heart failure,” Biomedical Signal Processing and Control, vol. 55, p. 101597, 2020.
  • [36] C. Hao, S. Wibowo, M. Majmudar, and K. S. Rajput, “Spectro-temporal feature based multi-channel convolutional neural network for ecg beat classification,” in 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).   IEEE, 2019, pp. 5642–5645.
  • [37] A. T. Oliveira, E. G. Nobrega et al., “A novel arrhythmia classification method based on convolutional neural networks interpretation of electrocardiogram images,” in IEEE International conference on industrial technology.   Piscataway, NJ, 2019.
  • [38] M. M. Al Rahhal, Y. Bazi, H. Almubarak, N. Alajlan, and M. Al Zuair, “Dense convolutional networks with focal loss and image generation for electrocardiogram classification,” IEEE Access, vol. 7, pp. 182 225–182 237, 2019.
  • [39] A. Diker, Z. Cömert, E. Avcı, M. Toğaçar, and B. Ergen, “A novel application based on spectrogram and convolutional neural network for ecg classification,” in 2019 1st International Informatics and Software Engineering Conference (UBMYK).   IEEE, 2019, pp. 1–6.
  • [40] W. Liu, M. Zhang, Y. Zhang, Y. Liao, Q. Huang, S. Chang, H. Wang, and J. He, “Real-time multilead convolutional neural network for myocardial infarction detection,” IEEE journal of biomedical and health informatics, vol. 22, no. 5, pp. 1434–1444, 2017.
  • [41] X. Zhai and C. Tin, “Automated ecg classification using dual heartbeat coupling based on convolutional neural network,” IEEE Access, vol. 6, pp. 27 465–27 472, 2018.
  • [42] W. Sun, N. Zeng, and Y. He, “Morphological arrhythmia automated diagnosis method using gray-level co-occurrence matrix enhanced convolutional neural network,” IEEE Access, vol. 7, pp. 67 123–67 129, 2019.
  • [43] E. Izci, M. A. Ozdemir, M. Degirmenci, and A. Akan, “Cardiac arrhythmia detection from 2d ecg images by using deep learning technique,” in 2019 Medical Technologies Congress (TIPTEKNO).   IEEE, 2019, pp. 1–4.
  • [44] B. M. Mathunjwa, Y.-T. Lin, C.-H. Lin, M. F. Abbod, and J.-S. Shieh, “Ecg arrhythmia classification by using a recurrence plot and convolutional neural network,” Biomedical Signal Processing and Control, vol. 64, p. 102262, 2021.
  • [45] X. Fan, Q. Yao, Y. Cai, F. Miao, F. Sun, and Y. Li, “Multiscaled fusion of deep convolutional neural networks for screening atrial fibrillation from single lead short ecg recordings,” IEEE journal of biomedical and health informatics, vol. 22, no. 6, pp. 1744–1753, 2018.
  • [46] R. Wang, J. Fan, and Y. Li, “Deep multi-scale fusion neural network for multi-class arrhythmia detection,” IEEE Journal of Biomedical and Health Informatics, 2020.
  • [47] F. Li, J. Wu, M. Jia, Z. Chen, and Y. Pu, “Automated heartbeat classification exploiting convolutional neural network with channel-wise attention,” IEEE Access, vol. 7, pp. 122 955–122 963, 2019.
  • [48] A. Uyar and F. Gurgen, “Arrhythmia classification using serial fusion of support vector machines and logistic regression,” in 2007 4th IEEE Workshop on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications.   IEEE, 2007, pp. 560–565.
  • [49] Y. Zhao, X. Yin, and Y. Xu, “Electrocardiograph (ecg) recognition based on graphical fusion with geometric algebra,” in 2017 4th International Conference on Information Science and Control Engineering (ICISCE).   IEEE, 2017, pp. 1482–1486.
  • [50] R. Wang, Q. Yao, X. Fan, and Y. Li, “Multi-class arrhythmia detection based on neural network with multi-stage features fusion,” in 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC).   IEEE, 2019, pp. 4082–4087.
  • [51] N. Manshor, A. A. Halin, M. Rajeswari, and D. Ramachandram, “Feature selection via dimensionality reduction for object class recognition,” in 2011 2nd International Conference on Instrumentation, Communications, Information Technology, and Biomedical Engineering.   IEEE, 2011, pp. 223–227.
  • [52] Z. Wang and T. Oates, “Imaging time-series to improve classification and imputation,” in Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015.
  • [53] C.-L. Yang, Z.-X. Chen, and C.-Y. Yang, “Sensor classification using convolutional neural network by encoding multivariate time series as two-dimensional colored images,” Sensors, vol. 20, no. 1, p. 168, 2020.
  • [54] J. Eckmann, S. O. Kamphorst, D. Ruelle et al., “Recurrence plots of dynamical systems,” World Scientific Series on Nonlinear Science Series A, vol. 16, pp. 441–446, 1995.
  • [55] Recuplots and cnns for time-series classification. [Online]. Available: https://www.kaggle.com/tigurius/recuplots-and-cnns-for-time-series-classification
  • [56] Z. Wang and T. Oates, “Encoding time series as images for visual inspection and classification using tiled convolutional neural networks,” in Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015.
  • [57] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012, pp. 1097–1105.
  • [58] Z. Ahmad and N. Khan, “Cnn based multistage gated average fusion (mgaf) for human action recognition using depth and inertial sensors,” IEEE Sensors Journal, 2020.
  • [59] H. B. Mitchell, Image fusion: theories, techniques and applications.   Springer Science & Business Media, 2010.
  • [60] A. L. Goldberger, L. A. Amaral, L. Glass, J. M. Hausdorff, P. C. Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C.-K. Peng, and H. E. Stanley, “Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals,” circulation, vol. 101, no. 23, pp. e215–e220, 2000.
  • [61] G. B. Moody and R. G. Mark, “The impact of the mit-bih arrhythmia database,” IEEE Engineering in Medicine and Biology Magazine, vol. 20, no. 3, pp. 45–50, 2001.
  • [62] R. Bousseljot, D. Kreiseler, and A. Schnabel, “Nutzung der ekg-signaldatenbank cardiodat der ptb über das internet,” Biomedizinische Technik/Biomedical Engineering, vol. 40, no. s1, pp. 317–318, 1995.
  • [63] Ecg heartbeat categorization dataset. [Online]. Available: https://www.kaggle.com/shayanfazeli/heartbeat
  • [64] A. for the Advancement of Medical Instrumentation et al., “Testing and reporting performance results of cardiac rhythm and st segment measurement algorithms,” ANSI/AAMI EC38, vol. 1998, 1998.
  • [65] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “Smote: synthetic minority over-sampling technique,” Journal of artificial intelligence research, vol. 16, pp. 321–357, 2002.
  • [66] X. Xu, S. Jeong, and J. Li, “Interpretation of electrocardiogram (ecg) rhythm by combined cnn and bilstm,” IEEE Access, vol. 8, pp. 125 380–125 388, 2020.
  • [67] R. He, Y. Liu, K. Wang, N. Zhao, Y. Yuan, Q. Li, and H. Zhang, “Automatic detection of qrs complexes using dual channels based on u-net and bidirectional long short-term memory,” IEEE Journal of Biomedical and Health Informatics, 2020.
  • [68] F. Qiao, B. Li, Y. Zhang, H. Guo, W. Li, and S. Zhou, “A fast and accurate recognition of ecg signals based on elm-lrf and blstm algorithm,” IEEE Access, vol. 8, pp. 71 189–71 198, 2020.
  • [69] J. Kojuri, R. Boostani, P. Dehghani, F. Nowroozipour, and N. Saki, “Prediction of acute myocardial infarction with artificial neural networks in patients with nondiagnostic electrocardiogram,” Journal of Cardiovascular Disease Research, vol. 6, no. 2, 2015.
  • [70] Y. Cao, T. Wei, N. Lin, D. Zhang, and J. J. Rodrigues, “Multi-channel lightweight convolutional neural network for remote myocardial infarction monitoring,” in 2020 IEEE Wireless Communications and Networking Conference Workshops (WCNCW).   IEEE, 2020, pp. 1–6.
  • [71] M. A. Ahamed, K. A. Hasan, K. F. Monowar, N. Mashnoor, and M. A. Hossain, “Ecg heartbeat classification using ensemble of efficient machine learning approaches on imbalanced datasets,” in 2020 2nd International Conference on Advanced Information and Communication Technology (ICAICT).   IEEE, 2020, pp. 140–145.
  • [72] E. Akbas and F. T. Y. Vural, “Automatic image annotation by ensemble of visual descriptors,” in 2007 IEEE Conference on Computer Vision and Pattern Recognition.   IEEE, 2007, pp. 1–8.
  • [73] Z. Ahmad and N. Khan, “Towards improved human action recognition using convolutional neural networks and multimodal fusion of depth and inertial sensor data,” in 2018 IEEE International Symposium on Multimedia (ISM).   IEEE, 2018, pp. 223–230.
[Uncaptioned image] Zeeshan Ahmad received B.Eng. degree in Electrical Engineering from NED University of Engineering and Technology Karachi, Pakistan in 2001, M.Sc. degree in Electrical Engineering from National University of Sciences and Technology Pakistan in 2005 and MEng. degree in Electrical and Computer Engineering from Ryerson University, Toronto, Canada in 2017. He is currently pursuing Ph.D. degree with the Department of Electrical and Computer Engineering, Ryerson University, Toronto, Canada. His research interests include Machine learning, Computer vision, Multimodal fusion, signal and image processing.
[Uncaptioned image] Anika Tabassum is a recent graduate of the MSc Data Science and Analytics program at Ryerson University. She received her BA degree in Computer Science from McGill University in 2013. She has previously worked 4+ years as a software engineer/developer.
[Uncaptioned image] Dr. Ling Guan is a professor of Electrical and Computer Engineering at Ryerson University, Toronto, Canada, and was a Tier I Canada Research Chair in Multimedia and Computer Technology from 2001 to 2015. Dr. Guan has published extensively in multimedia processing and communications, human-centered computing, machine learning, adaptive image and signal processing, and, more recently, multimedia computing in immersive environment. He is a Fellow of the IEEE, an Elected Member of the Canadian Academy of Engineering, and an IEEE Circuits and System Society Distinguished Lecturer.
[Uncaptioned image] Naimul Khan is an assistant professor of Electrical and Computer Engineering at Ryerson University, where he co-directs the Ryerson Multimedia Research Laboratory (RML). His research focuses on creating user-centric intelligent systems through the combination of novel machine learning and human-computer interaction mechanisms. He is a recipient of the best paper award at the IEEE International Symposium on Multimedia, the OCE TalentEdge Postdoctoral Fellowship, and the Ontario Graduate Scholarship. He is a senior member of IEEE and a member of ACM.
\EOD