This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A Multimodal Data-driven Framework for Anxiety Screening

Haimiao Mo, Shuai Ding*,  Siu Cheung Hui Haimiao Mo and Shuai Ding are with the School of Management, Hefei University of Technology, Anhui Hefei 23009, China, also with the Key Laboratory of Process Optimization and Intelligent Decision-Making, Ministry of Education, China. (Email: mhm_hfut@163.com, dingshuai@hfut.edu.cn)
Siu Cheung Hui is with the School of Computer Science and Engineering, Nanyang Technological University, Singapore 639798. (Email: ASSCHUI@ntu.edu.sg)
Abstract

Early screening for anxiety and appropriate interventions are essential to reduce the incidence of self-harm and suicide in patients. Due to limited medical resources, traditional methods that overly rely on physician expertise and specialized equipment cannot simultaneously meet the needs for high accuracy and model interpretability. Multimodal data can provide more objective evidence for anxiety screening to improve the accuracy of models. The large amount of noise in multimodal data and the unbalanced nature of the data make the model prone to overfitting. However, it is a non-differentiable problem when high-dimensional and multimodal feature combinations are used as model inputs and incorporated into model training. This causes existing anxiety screening methods based on machine learning and deep learning to be inapplicable. Therefore, we propose a multimodal data-driven anxiety screening framework, namely MMD-AS, and conduct experiments on the collected health data of over 200 seafarers by smartphones. The proposed framework’s feature extraction, dimension reduction, feature selection, and anxiety inference are jointly trained to improve the model’s performance. In the feature selection step, a feature selection method based on the Improved Fireworks Algorithm is used to solve the non-differentiable problem of feature combination to remove redundant features and search for the ideal feature subset. The experimental results show that our framework outperforms the comparison methods.

Index Terms:
Anxiety Screening, Mental Health Assessment, Multimodal Features, Feature Selection, Improved Fireworks Algorithm.

1 Introduction

In 2019, mental illnesses, particularly anxiety disorders, were not only among the top twenty-five leading causes of excess global health spending, but also among the most disabling mental illnesses [1]. Furthermore, anxiety disorders are accompanied by immune disorders [2], and interfere with cognitive functions through memory and attention [3], thereby affecting normal life and work. Early anxiety assessment and appropriate interventions can greatly reduce the rate of self-harm and suicide in patients [4].

Psychological scales and routine health checks with professional medical equipment are traditional anxiety screening methods. The Self-rating Anxiety Scale (SAS) [5] and the Generalized Anxiety Disorder-7 (GAD-7) [6] are two psychological scales that are currently used for anxiety screening. Anxiety frequently results in a variety of symptoms or behavioral modifications, such as breathlessness [7], variations in blood pressure [8] and heart rate [9], perspiration, tense muscles, and dizziness [10]. These objective signs can also be used as an important basis for anxiety screening. However, due to the limitation of lacking of medical resources in remote areas and high cost, routine health examinations such as Magnetic Resonance Imaging (MRI) [11], Computed Tomography (CT), electrocardiogram (ECG) [12], [13] and electroencephalogram (EEG) [9], [14], may not be available.

Noncontact screening methods are another typical anxiety screening tool. They usually use computer vision or deep learning techniques to extract the behavioral or physiological features for anxiety screening. These methods have the advantages of low cost and convenience. The application of behavioral features [10], [15], speech features and text features provides more objective evidence for anxiety screening. Moreover, physiological signals, such as heart rate [10], [9], heart rate variability [12], and respiration rate, can be obtained by imaging photoplethysmography (iPPG) technology [16], which can also be used as important features for anxiety screening.

Due to the complicated genesis and protracted nature of mental diseases[17], diagnosing them frequently involves knowledge from a number of fields, including biomedicine, psychology, and social medicine. It is a challenging problem to obtain more timely multimodal information about patient’s health using traditional medical screening methods due to the limitation of medical resources [5]. In addition, multimodal data can also provide more objective evidence [18] to improve the accuracy of anxiety screening. Therefore, multimodal data will be the driving force behind the future development of anxiety screening[19].

However, the large amount of noise in the multimodal data and the imbalance of the data make the model prone to overfitting. In other words, the model cannot screen anxious patients with high precision and take intervention measures in advance, which may have a negative impact on their lives or mental conditions. In addition, due to the poor medical conditions in remote areas, model interpretability [20] and important features are crucial to assist primary care staff in anxiety screening. Traditional machine learning methods [13], [21] are difficult to deal with the problem of scalability and generalization of multimedia content data in a fast and accurate way. Deep learning methods [18], [22] based on computer vision have higher robustness and accuracy compared with traditional methods, and are therefore increasingly widely used for anxiety screening. Most of the existing technologies for anxiety screening focus on differentiable optimization problems. And the combination of high-dimensional and multimodal features is a non-differentiable problem when used as input to the model and incorporated into the model training. These existing methods are unable to meet the requirement on both the high accuracy and model-interpretability scenarios for anxiety screening. Therefore, we propose a Multimodal Data-driven framework for Anxiety Screening (MMD-AS).

The contributions of this paper are as follows.

  • We propose a low-cost, noncontact, interpretable and easy-to-use anxiety screening framework that enables multimodal data capture and remote anxiety screening via smartphones only, which is suitable for scenarios with limited medical resources, such as health protection for seafarers on long voyages and mental health screening in remote areas.

  • To improve the performance, the framework’s components are jointly trained. In addition, our Improved Fireworks Algorithm (IFA) solves the non-differentiable problem in the case of feature combination by enhancing the local search capability, which filters out redundant features and reduces the noise in the data to find the best feature subset.

  • Experimental results of anxiety screening in more than 200 seafarers show that our framework has achieved high precision and model interpretability. More importantly, the results point out that multimodal data is essential for anxiety screening, and the important indicators for anxiety detection are identified, which are both beneficial to clinical practice.

The rest of this paper is organized as follows. Section 2 reviews the related work on anxiety representations, anxiety screening, feature extraction and multimodal data-driven methods. Section 3 presents our proposed framework for anxiety screening. Section 4 presents the performance evaluation. Sections 5 discusses the limitations of the proposed framework. Finally, Section 6 concludes the paper.

2 RELATED WORK

In this section, we review the related work on anxiety representations, anxiety screening, feature extraction and multimodal data-driven methods. Table 1 summarizes the features and methods for anxiety screening.

2.1 Anxiety Representation

Anxiety is a feeling of tension, worry, or restlessness. It occurs frequently in a variety of mental conditions, including phobias, panic disorders, and generalized anxiety disorders [23]. Anxiety is a typical response to risk and mental stress. The amygdala and hippocampus are activated by the feelings of fear and dread brought on by stress, which also affects the autonomic and parasympathetic nervous systems [24]. Patients with anxiety disorders exhibit physical symptoms that are linked to the disease, such as rapid breathing [7], heartbeat, BP [8], and additional symptoms [25] such as perspiration, tightness in the muscles, and vertigo. The physiological signals that are most frequently utilized in physiological research to evaluate mental health include ECG [26], heart rate, heart rate variability [8], EEG [27], and electrode signals, as shown in Table 1.

Patients with anxiety disorders exhibit structural and functional abnormalities in the nerve system that regulates emotion, according to brain imaging studies. As shown in Table 1, a person’s ability to manage their emotions can be plainly noticed in their facial and behavioral characteristics [10] as well as audio indicators (such as intonation and speech tempo) [28]. For example, the insula, frontal orbital cortex, anterior cingulate cortex, striatum, and amygdala all exhibit diminished response to unfavorable emotional stimuli [29]. Due to physiological respiratory issues, individuals with anxiety disorders have voices that are a mirror of their circumstances. Related studies have shown that anxious patients exhibit elevated wavelet, jitter, shimmer, and fundamental frequency (F0) mean coefficients [30]. Mel-Frequency Cepstral Coefficients (MFCCs) declines in the presence of anxiety [28]. The main signs of facial anxiety include changes in the eyes, including pupil size variations, blink rates [19], and gaze distribution [31], as well as in the lips, including lip twisting and mouth movement. Other key facial anxiety indicators include changes in the cheeks, head movement, and head speed [32]. Additional facial signs of anxiety include pallor, twitching eyelids, and stiffness in the face. Numerous studies have shown a link between the size of the pupil and emotional or mental activity. Dilated pupils may be a sign of higher anxiety levels [33]. The coherence and direction of the eyes, are also impacted by anxiety [31]. Increased gaze volatility during voluntary and stimulus-driven gazing are correlated with high levels of trait anxiety [34]. For instance, those who are nervous typically scan negative stuff more than those who are not anxious [10].

TABLE I: Features, feature extraction, feature selection and classification methods for anxiety screening.
Features Feature extraction Feature selection Classification
1) Physiological features: Heart Rate (HR) [10], [9], [12], Heart Rate Variability (HRV) [12], Respiratory Rate (RR) [35], Blood Pressure (BP) [8], electroencephalogram (EEG) [9], [14], [19], [18], electrocardiography (ECG) [12], [13], [27], Electrodermal Activity (EDA) [10], [9], [12], and imaging photoplethysmography (iPPG) [10], [18]. 2) Behavioral features: Eyes: gaze spatial distribution and gaze direction [10], saccadic eye movements [10], [36], [19], [37], pupil size [10] and pupil ratio variation [10], blink rate, eyelid response, eye aperture, eyebrow movements [10]; Lip: lip deformation, lip corner puller/depressor and lip pressor [10]; Head: head movement [10]; Mouth: mouth shape [10]; Gait [15]; Motion data [13]. 3) Audio features: Wavelet, jitter, shimmer, F0 mean coefficients [30] and Mel-frequency cepstral coefficients (MFCCs) [28]. 4) Text features: Demographics [38], occupation and health; blog posts [39], semantic location [40]. 5) Questionnaires features: SAS [5] and GAD-7 items [6]. 1) Neural networks (NN): Convolutional Neural Network (CNN) [9], [39], [36], [41], Long Short-Term Memory (LSTM) [36], Radial Basis Function (RBF) [22], Artificial Neural Networks (ANN) [41], [18], and Generalized Likelihood Ratio (GLR) [10]. 2) Correlation analysis: Principal Component Analysis (PCA) [35], Canonical Correlation Analysis (CCA), Sparse Canonical Correlation Analysis (SCCA) [19]. 3) Signal processing method: Kalman filter [13], fast Fourier transform [42]. 1) Filter methods: informative properties-based includes Random Forest (RF) with Gini index [22]; saliency tests-based, such as SKB, correlation analysis [9], [19], [43], Pearson Correlation Coefficient (PCC) [15], [36], and t-test [36]. 2) Wrapper methods: sequential feature selection [36], such as Sequential Backward Selection (SBS) and Sequential Forward Selection (SFS) methods[14]; iterative-based includes RFE [38]. 1) Machine learning methods: Support Vector Machines (SVM) [10], [12], [13], [39], [36], [22], [38], [19], Logistic Regression (LR) [10], [13], [15], [38], [41], [27], DTs [12], [13], RF [39], [36], [38], [41], Naïve Bayes (NB) [10], [12], [38], K-Nearest Neighbors (KNN) [10], [13], [41], [19], [27], Adaptive Boosting (AdaBoost) [10], [13], XGBoost [40], Catboost [13], [38]. 2) methods based on NN: CNN, LSTM, RBF, GLR, ANN.

2.2 Anxiety Screening Methods

Traditional and noncontact screening methods are based on machine learning or deep learning techniques, which are the two primary categories of anxiety screening techniques. Psychological scales, [6] and assisting technologies (such as MRI [11], CT, ECG [26], and EEG [27], and biochemical indicators [44]) are frequently used in traditional screening approaches to evaluate anxiety levels. Noncontact screening methods mainly use computer vision [25] or deep learning techniques [13], [42] to extract behavioral characteristics and physiological signals related to anxiety.

2.2.1 Traditional Screening Methods

Traditional mental health examinations commonly use psychological scales, such as the SAS [5] and GAD-7 [6] to ascertain whether patients are suffering anxiety. In real clinical settings, doctors routinely conduct structured interviews with patients to find out more about their mental health. The patient’s body language and facial emotions should be closely observed by the doctor all the time. This method is severely constrained by the interactions between the doctor and patient as well as by the expertise and experience of psychiatrists. To make the proper diagnosis, medical professionals may also take into account additional data from tests such as MRI [11], CT, ECG [26], and EEG [27]. To find people who might have psychiatric problems, extensive biological data gathering is also carried out on them, such as monitoring inflammatory markers and hormone changes [44]. However, traditional screening methods place an undue emphasis on psychiatrists’ training and experience. Traditional detection methods fall short in the face of unique situations, such as on long-distance voyages with limited medical resources [45].

2.2.2 Noncontact Screening Methods

Changes in behavioral characteristics, such as concentration on things (reflected in eye gaze duration [31], eye movement characteristics [34], pupil size, and changes in head posture [46]), mouth shape, eyebrow shape [10], facial expression [47], and gait [15], can reflect some extent the person’s mental activity. These mental activities can lead to significant changes in a person’s physiological characteristics, such as EEG [27], heart rate [12], and respiration rate. Noncontact anxiety screening methods capture or extract these behavioral or EEG changes mainly through computer vision or signal processing methods. A facial action coding system [25] is often used to characterize a person’s facial behavior. To explore the relationship between behavioral and physiological features and anxiety, Canonical Correlation Analysis (CCA) methods [9] such as the Pearson Correlation Coefficient (PCC) [15], [36] are commonly used. Moreover, because of the physiological-behavioral link, machine learning methods can even perform better by incorporating EEG and eye-movement features. However, since correlation analysis methods tend to cause overfitting of classical machine learning methods such as SVM and KNN, sparse representation methods address this problem by introducing constraint terms [19]. To reduce the risk of model overfitting, sequence-based feature selection approaches [14] such as Sequential Backward Selection (SBS) and Sequential Forward Selection (SFS) are used to remove redundant features from the original data.

The link between physiological symptoms and mental illness has led researchers to focus on additional identification of mental illness through physiological characteristics. Elevated heart rate, rapid breathing [48], high BP [8], dizziness, sweat, and muscle tension can all be used to objectively screen for anxiety. However, specialized hardware is needed for the traditional gathering of physiological signals. Its relatively high cost frequently prevents it from providing early diagnosis of psychiatric diseases.

Physiological characteristics [49] such as blood volume pulse, heart rate, heart rate variability, and respiratory rate can be captured by signals from imaging photoplethysmography (iPPG). In fact, the iPPG signals are extracted by computer vision technology. Affordableness, noncontact, safety, the ability to obtain continuous measurements, and ease of use are just a few advantages of iPPG [16]. In the research of noncontact telemedicine monitoring and physical and mental health monitoring, iPPG offers a novel perspective.

2.3 Feature Extraction Methods

As shown in Table 1, the feature extraction methods for anxiety screening are mainly classified into three categories: neural network-based, correlation analysis, and signal processing methods. High data dimensionality and data redundancy are properties of the time series data (such as ECG [19] and EEG [27]) and image data (such as CT, MRI [11]) utilized for anxiety inference. The performance of screening anxiety methods may thus be adversely affected. To improve the performance of the traditional screening methods, the feature extraction methods in Table 1, such as Neural Networks (NN) [9], [39], [22] and correlation analysis [15], [36], are used to extract features useful for anxiety inference. Nonlinear information useful for anxiety inference is frequently extracted using neural network-based feature extraction techniques including CNN, LSTM, and Radial Basis Function (RBF).

Canonical Correlation Analysis (CCA), Sparse Canonical Correlation Analysis (SCCA) [19], and Principal Component Analysis (PCA) [35] are correlation analysis techniques for the extraction of anxiety-related features, which are helpful for inferring anxiety. In [35], PCA uses orthogonal transformations to project observations of a group of possibly correlated variables (such as the time and frequency domain statistical from respiratory signals) into the principle components of a group of linearly uncorrelated variables. The number of features in a dataset can be controlled by imposing a threshold using PCA, which is frequently used to minimize the dataset’s dimensions.

Due to acquisition equipment or environmental factors, the quality of physiological signal extraction or analysis is easily affected. For example, the iPPG signals used for anxiety inference is particularly susceptible to ambient lighting and motion artifacts [42]. So these physiological features from the iPPG signals contain a lot of noise. The signal processing techniques in Table 1, such as Kalman filter [13] and fast Fourier transform [42], are typically used to reduce noise and eliminate its detrimental effects on the anxiety screening model. In [13], the enhanced Kalman filter processes the heart rate and accelerometry signals to follow the user’s heart rate information in various contexts and choose the best model for anxiety detection based on the user’s exercise conditions.

2.4 Multimodal Data-Driven Methods

Due to complex etiology and long development cycle, the diagnosis of mental illness usually requires a multidisciplinary approach that combines biomedicine, psychology, and social medicine [17]. Moreover, multimodal data can provide more objective evidence for anxiety screening [18]. The multimodal data-driven approaches are promising for anxiety screening. By using the correlation analysis approach to examine the structural information between the EEG and eye movement data, it is possible to combine these two types of variables and detect anxiety more precisely [19]. Biophysical signals such as heart rate, skin electricity, and EEG in virtual reality applications, are extracted as features of different dimensions [9]. The features from the time domain and frequency domain are then fused to achieve high-precision anxiety detection. Several biosignals, including EEG, iPPG, Electrodermal Activity (EDA), and pupil size [18], are used to measure anxiety in various driving scenarios.

However, the use of multimodal data will necessarily result in a rapid increase in data dimensions and feature redundancy. The increase in data dimensions leads to curse of dimensionality. The model’s accuracy may be affected since redundant features may have a lot of noise. Decision Trees-based (DTs) approaches such as Random Forest (RF), Adaptive Boosting (AdaBoost) [10], Naïve Bayes (NB) [38], extreme gradient boost (XGBoost) [40], can select features useful for anxiety detection. They use information attributes (such as information gain, information entropy, or Gini index [22]) to lessen the dimensionality and number of redundant features in the data. Besides, unbalanced datasets are very common, particularly in the medical industry. It also poses a significant obstacle to anxiety screening. The existing machine learning techniques [10], [13], [39], [38], [21] and neural network techniques [10], [39], [36], [22], [18] used for anxiety screening are therefore challenging to apply to this unique situation.

3 PROPOSED FRAMEWORK

Refer to caption
Figure 1: The framework for anxiety screening.

Figure 1 shows the proposed Multimodal Data-driven framework for Anxiety Screening (MMD-AS) which consists of four main components: feature extraction, dimension reduction, feature selection, and anxiety inference for seafarers. The steps of the framework are as follows. First, feature extraction extracts the heart rate and respiratory rate features (iPPG), Behavior Features (BF), and Audio Features (AF) from facial videos. The Text Features (TF) are extracted from audio. The Questionnaires Features (QF) are extracted by processing the questionnaire. Next, the extracted iPPG, BF, AF, and TF features are processed by the designed ”1DCNN+GRU” and ”CNNText” networks for dimension reduction, and then combined with the QF feature to create the feature vector F=[FiPPG,FBF,FAF,FTF,FQF]F=[F_{iPPG},F_{BF},F_{AF},F_{TF},F_{QF}]. Then, feature selection selects the feature subsets from the vector FF based on the Improved Fireworks Algorithm (IFA). Finally, the data from the selected features are trained by the AdaBoost classifier for evaluating the generalization ability of the selected features based on the classification evaluation metrics. The trained model and selected features are finally used for anxiety inference.

3.1 Feature Extraction

Table 2 shows the different features extracted in Feature Extraction. The heart rate and respiration rate features from iPPG signals, BF, AF, TF, and QF features from facial video and questionnaires, which are used for anxiety inference.

3.1.1 Heart Rate and Respiration Rate Features

TABLE II: Features extracted from facial video and questionnaires.
Feature types Description
1) Heart Rate and Respiration Rate Features: iPPGTDNose{{}_{Nose}^{TD}}, iPPGTDNose{{}_{Nose}^{TD}}, iPPGFDFore{{}_{Fore}^{FD}}, and iPPGFDFore{{}_{Fore}^{FD}} The iPPG signals extracted from the nose and forehead regions containing time and frequency domain features of heart rate (HR) and respiration rate (RR).
2) Behavioral Features (BF): Head Position (HP) Eye Gaze (EG) Action Unit (AU): AU01, AU02, AU04, AU05, AU06, AU07, AU09, AU10, AU12, AU14, AU15, AU17, AU20, AU23, AU25, AU26, AU44, AU45 1) HP=(xHPx_{HP}, yHPy_{HP}, zHPz_{HP}) describes the positional information of the head rotation when the center of the head is taken as the origin. 2) EG=(xEGx_{EG}, yEGy_{EG}) describes the eye gaze direction. 3) AUs are defined by the facial action coding system [50], which refers to the set of facial muscle movements corresponding to the displayed emotions.
3) Audio Features (AF): Mel-frequency cepstral coefficients (MFCCs), fundamental frequency F0, Zero-Crossing Rate (ZCR), Prosody, and Phonation The audio in the video uses the vocal separation method to obtain audio from human voices. Then, AF are extracted by audio analysis technology [51]. Phonation features include F0’s first and second derivatives (F01F0^{1} and F02F0^{2}), jitter, shimmer, Amplitude Perturbation Quotient (APQ), Pitch Perturbation Quotient (PPQ), and Logarithmic Energy (LE).
4) Text Features (TF): Text features from audio Process the text in the audio through the iFlytek toolkit [52], and then extract the text features using the pre-trained BERT model [53].
5) Questionnaires Features (QF): Assessment Time (AT) Personal Information (PF) Big Five Personalities Traits (BFPT) [54] Sleep Quality (SQ) [55] Lifestyle Emotional State (ES) Work Environment (WE) Entertainment (En) Attitude to Life (AL) Social Support (SS) Family Relationships (FR) 1) AT: It includes the stages of before boarding, sailing, and after disembarking. 2) PF: It includes marital status, family size, income, place of household registration, position, working hours, smoking and alcohol use [38]. 3) BFPT: It includes extraversion, agreeableness, openness, conscientiousness, and neuroticism. 4) SQ: It is evaluated by Pittsburgh Sleep Quality Index (PSQI) [45]. 5) Lifestyle: It is evaluated by Health-Promoting Lifestyle Profile-II (HPLP) [56]. 6) ES: It is evaluated by multidimensional fatigue inventory (MFI) [57], GAD-7 items and patient health questionnaire 9 (PHQ) [6], Depression Anxiety and Stress Scale 21 (DASS) [5]. 7) WE: It includes company culture, equipment management and maintenance, office environment, safety. 8) En: It includes type of entertainment, frequency of participation in activities. 9) AL: It is evaluated by suicide behaviors questionnaire-revised (SBQ-R) [58]. 10) SS: It is evaluated by social support rating scale( SSRS) [45]. 11) FR: It is evaluated by family assessment device-general functioning (FAD-GF) [45].
Refer to caption
Figure 2: Heart rate and respiration rate features extraction.

Patients with anxiety disorders experience overt clinical symptoms such as muscle tension, a racing heartbeat, rapid breathing [7], elevated Blood Pressure (BP) [8], and dizziness. One of the key aspects for identifying anxiety is the iPPG signals, which include Heart Rate (HR) and Respiratory Rate (RR). The face’s motion-insensitive regions [42], such as the forehead and nose, can be used for extracting iPPG characteristics that contain information on HR and RR [49]. Therefore, the time-domain and frequency-domain features from iPPG signals in the range of HR and RR are used as one of the features for anxiety inference.

Figure 2 depicts the process of extracting HR and RR features from iPPG signals. Each facial video is used for extracting pictures frame by frame. The key feature points from the face are extracted from each frame using a face detection algorithm, such as Google media pipe. The Region of Interest (ROI) from the face in each frame of the picture, such as the forehead and nose, can be precisely tracked based on the key feature points. The ROI (such as ROIForeROI_{Fore} and ROINoseROI_{Nose}) in each frame is resized to a fixed length and width. The ROI’s various channels’ pixel averages for each frame are used to create iPPG signals. Equation (1) is used to calculate the ROI’s Mean Pixel (MP) value for the red channel in the t-th frame. In Equation (2), the MP of all frames constitutes the initial iPPG signal.

MPR(t)=1HROI×WROIx=1HROIy=1WROIPR(x,t,y)MP_{R}(t)=\frac{1}{H_{ROI}\times W_{ROI}}\sum_{x=1}^{H_{ROI}}\sum_{y=1}^{W_{ROI}}P_{R}(x,t,y) (1)
iPPG=[MPR(1),MPR(2),,MPR(t)MPG(1),MPG(2),,MPG(t)MPB(1),MPB(2),,MPB(t)]C×TiPPG=\left[\begin{array}[]{l}MP_{R}(1),MP_{R}(2),\ldots,MP_{R}(t)\\ MP_{G}(1),MP_{G}(2),\ldots,MP_{G}(t)\\ MP_{B}(1),MP_{B}(2),\ldots,MP_{B}(t)\end{array}\right]_{C\times T^{\prime}} (2)

where the red channel pixel value at position (x, y) in the t-th frame is represented by PRP_{R}(x, t, y). Similarly, the green and blue channel pixel values at position (x, y) in the t-th frame are denoted as PGP_{G} (x, t, y) and PBP_{B} (x, t, y), respectively. The number of channels in each picture frame and the total number of frames in the video are denoted by CC and TT^{\prime}, respectively. The signals from the nose and forehead are denoted as iPPGNoseiPPG_{Nose} and iPPGForeiPPG_{Fore}, respectively.

In addition, the initial iPPG signals are processed by Butterworth filter [42] and Fast Fourier Transform (FFT) to obtain time domain and frequency domain signals with heartbeat and respiration information. The normal heartbeat and breathing of humans are in the range of [0.75, 3.33] Hz, and [0.15, 0.40] Hz, respectively. The time-domain signals within the normal HR and RR range of human body are separated from the initial iPPG signal by filtering, which are called iPPGForeTDiPPG_{Fore}^{TD} and iPPGNoseTDiPPG_{Nose}^{TD}, respectively. The frequency domain signals with HR or RR are extracted from the iPPG signals using the FFT. FFT is able to extract frequency domain features (such as iPPGForeFDiPPG_{Fore}^{FD} and iPPGNoseFDiPPG_{Nose}^{FD}) from HR or RR that as well as efficiently minimizing the noise of the iPPG signals. The iPPG features iPPG=[iPPGTD,iPPGFD]iPPG=[iPP{G^{TD}},iPP{G^{FD}}] from the time and frequency domains in the iPPG signals are denoted as iPPGTD=[iPPGForeTD,iPPGNoseTD]2×C×TiPP{G^{TD}}={[iPPG_{Fore}^{TD},iPPG_{Nose}^{TD}]_{2\times C\times T^{\prime}}} and iPPGFD=[iPPGForeFD,iPPGNoseFD]2×C×TiPP{G^{FD}}={[iPPG_{Fore}^{FD},iPPG_{Nose}^{FD}]_{2\times C\times T^{\prime}}}, respectively.

3.1.2 Behavioral Features

Key signs of anxiety in the behavioral context include changes in pupil size, blink rate [19], gaze distribution [31], lip twisting, mouth shape movements, cheek changes, head movement, and head speed [32]. These behavioral features can be described by facial Action Units (AUs) and used as characteristics for anxiety inference.

The different AUs of the face are defined by the Facial Action Coding System (FACS) [50], which describes muscle movements in particular locations. They are used to describe a person’s facial activity, including those of the mouth, chin, lips, eyes, eyelids, and eyebrows. Different combinations of AUs can be used to describe various facial behaviors or emotional expressions. A facial behavior analysis toolkit [46] is used for analyzing the behavioral features of each frame from the facial video. EG(t)=(xEG(t),yEG(t))EG(t)=({x_{EG}}(t),{y_{EG}}(t)), AUs(t)AUs(t), HP(t)=(xHP(t),yHP(t),zHP(t))HP(t)=({x_{HP}}(t),{y_{HP}}(t),{z_{HP}}(t)) are Eye Gaze (EG), AUs, and Head Posture (HP) features extracted from the t-th frame, respectively. Each frame in the video is used to extract behavioral features, and they are combined to create a sequence of behavioral features BFBF = [EGEG, AUsAUs, HPHP].

EG=[EG(1),EG(2),,EG(t)]EG=[EG(1),EG(2),...,EG(t)] (3)
AUs=[AUs(1),AUs(2),,AUs(t)]AUs=[AUs(1),AUs(2),...,AUs(t)] (4)
HP=[HP(1),HP(2),,HP(t)]HP=[HP(1),HP(2),...,HP(t)] (5)
Refer to caption
Figure 3: Behavioral feature extraction.

3.1.3 Audio Features

Negative emotions, such as anxiety, and depression, may cause changes in the somatic and autonomic nervous systems that are reflected in muscle tension and respiratory system [59]. These changes can have an impact on prosody and speech quality. In [59], the mean, median, standard deviation, maximum, and minimum of the time domain and frequency domain signals of Zero-Crossing Rate (ZCR) in each sliding time window are extracted. Previous research has demonstrated a correlation between a few auditory factors and anxiety, including fundamental frequency F0 [30], the first and second formant frequency (F1 and F2), phonation, MFCCs, and wavelet coefficients. The Phonation consists of the F0’s first and second derivatives (F01F0^{1} and F02F0^{2}), as well as jitter, shimmer, Amplitude Perturbation Quotient (APQ), Pitch Perturbation Quotient (PPQ) and Logarithmic Energy (LE). Partial acoustic signatures vary in different directions and intensities with anxiety. Anxious seafarers have higher F0 mean, F1 mean, jitter, shimmer, and wavelet coefficients. MFCCs are decreased with anxiety [28]. As a result, these audio features as shown in Table 2 are also used as one of the characteristics for anxiety inference.

There may have background noises in the original audio due to the background environment. It is necessary to eliminate the background noises from the audio and extract the seafarer’s voice from each video. Only seafarers’ voices are present in the audio data after vocal separation. Initial audio features, such as MFCCs, F0, ZCR, prosody, and phonation, are obtained from the audio data after they have been processed by audio analysis technology [51]. Phonation features include F01F0^{1}, F02F0^{2}, jitter, shimmer, perturbation quotient, pitch perturbation quotient, and logarithmic energy. Finally, MFCCs, F0, Phonation, Prosody and ZCR form the audio features AFAF=[MFCCsMFCCs, F0F0, PhonationPhonation, ProsodyProsody, ZCRZCR].

3.1.4 Text Features

Refer to caption
Figure 4: Extracted text features based on the answers from seafarers’ relationship with relatives and friends, and work status.

When a person works in a closed environment for a long time, his relationship with relatives and friends, and work status are closely related to anxiety [45]. By asking these questions shown in Figure 4, seafarers’ phone cameras record audio and facial video of their responses to learn more about their health. Therefore, text features from audio are also used as the main features for anxiety inference.

The main steps of text feature extraction from audio are as follows. First, human voice is obtained after it has undergone vocal separation processing. Next, the iFlytek toolkit [52] with 97.5% accuracy is used for Chinese voice analysis, and then the Chinese text is extracted from the audio. Finally, the pre-trained BERT model [53] is used to process each sentence from the text in the audio to create the text feature vector. Figure 4 shows a sample of the seafarer’s responses to the two questions. Seafarers answer questions based on their current situation.

3.1.5 Questionnaires Features

As a result of turbulence, an airtight environment, vibration noise, variations in circadian rhythm, a monotonous diet, and social isolation, long-haul seafarers are exposed to serious health risks [45]. It can easily trigger a variety of physical and mental health problems for seafarers, including those related to anxiety, diet, illness, fatigue, depression, and cognition. Many studies have found that a variety of factors such as personality traits [54], poor sleep quality (leading to fatigue) [45], bad emotional state, attitude to life [58], a lack of family and social support [60], and so on, can contribute to anxiety.

In the questionnaire, most of the questions in Table 2 provide options to be accepted which represent different severity or grade levels. Table 3 shows some example questions in the questionnaire. Therefore, each question in the questionnaire can be assigned with a score. The questionnaire features are processed and then denoted as QF.

TABLE III: Sample questions in the questionnaire.
Questions Grade levels: low\tohigh
1) I feel energized. 0 1 2 3 ☑4
2) Have you ever thought about suicide? 0 1 2 3 ☑4
3) The working environment is comfortable. 0 1 2 3 ☑4
4) Difficulty falling asleep. 0 1 2 3 ☑4
5) Limit sugar and sugary foods. 0 1 2 3 ☑4
6) I often feel exhausted. 0 1 2 3 ☑4
7) It’s hard to relax. 0 1 2 3 ☑4
8) Family members rarely talk to each other. 0 1 2 3 ☑4
9) Your colleagues often care about each other. 0 1 2 3 ☑4
10) Often receive support and care from family. 0 1 2 3 ☑4

3.2 Dimension Reduction

The majority of features are time series data. The data dimensions of the original HR and RR features from the time domain and frequency domain, behavioral features, and text features from audio are often rather high. For example, a total of 18 AUs can be extracted from each frame of a facial picture given in Figure 3, so the total number of dimensions of AUs obtained from a video with a sampling rate of 25 Frames Per Second (FPS) and a duration of one second is 18 ×\times 25. It is similar to the dimensions of other time series features. To reduce the data dimensions and the cost of feature selection in the next step, the original features are processed by deep learning networks [9], [39] for dimension reduction, which is shown in Figure 5.

Refer to caption
Figure 5: Dimension reduction by the ”1DCNN+GRU+CNNText”.

The Convolutional Neural Network (CNN) can effectively extract spatial features of high-dimensional data [41], while Gated Recurrent Units (GRU) network is a variant of the Long Short-Term Memory (LSTM), which can extract temporal features in time series data [61]. Compared with time series data, the dimensions of questionnaire features are not high, so it does not need dimension reduction. Therefore, as depicted in Figure 5, the feature vector [FiPPG,FBF,FAF][F_{iPPG},F_{BF},F_{AF}] and FTFF_{TF} are produced when the iPPG, behavior, and audio, and text features are processed by the ”1DCNN+GRU” and ”CNNText” networks, accordingly. Then, the feature vector F=[FiPPG,FBF,FAF,FTF,FQF]F=[F_{iPPG},F_{BF},F_{AF},F_{TF},F_{QF}] is composed of [FiPPG,FBF,FAF][F_{iPPG},F_{BF},F_{AF}], FTFF_{TF}, and FQFF_{QF}.

Several ”1DCNN+GRU” networks with the same structure are employed to extract spatiotemporal features from iPPG, behavior, and audio respectively to reduce data dimensions. The ”1DCNN+GRU” network consists of three 1D-MaxPool layers, three 1D-convolutional layers, and two GRU layers. A Rectified Linear Unit (ReLU) layer and a dropout layer are added after each convolutional layer and MaxPool layer, respectively. The Conv-k(p1, p2) denotes the k-th convolutional layer. The p1 and p2 represent the out channels and kernel size, respectively. Additionally, the TFTF features are processed by the ”CNNText” network to obtain the feature vector FTFF_{TF}. The ”CNNText” network shown in Figure 5 consists of one embedding layer, two convolutional layers, two ReLU layers, one pooling layer, one flatten layer, two dropout layers, and two Dense layers.

In Figure 5, convolutional layers are used to extract different features of the input. The first layer of convolutional layers may only be able to extract some low-level features, and more layers of networks can iteratively extract more complex features from low-level features. The max-pooling layer not only reduces the feature dimension but also retains more texture information. The ReLU layer is to increase the nonlinear fitting ability of the neural network. The dropout layer can enhance the generalization ability of the model. The embedding layer is to encode sentences or words. The flattened layer is to make the multi-dimensional input one-dimensional, and it is often used in the transition from the convolutional layer to the fully connected layer. The dense layer is to change the previously extracted features through nonlinear changes, extract the association between these features, and finally maps them to the output space.

3.3 Feature Selection

There are still a lot of redundant features in FF=[FiPPGF_{iPPG}, FBFF_{BF}, FAFF_{AF}, FTFF_{TF}, FQFF_{QF}] extracted from the original data. In addition, due to the data imbalance problem, the parameter learning of the model is more biased toward the majority classes during the training process. As such, the performance of the model will be adversely affected. The feature selection [22] approaches, such as the filter and wrapper methods, are used for anxiety screening to remove redundant features from the original data to enhance the effectiveness of models. Embedding methods, which combine the advantages of filter and wrapper techniques, can also be used for anxiety screening. Moreover, bio-inspired methods that introduce randomness into the search process to avoid local optimum to learn the model parameters are more conducive for predicting the minority class. Therefore, the feature selection method based on our Improved Fireworks Algorithm (IFA) is used to search for feature subsets to solve the feature combination optimization problem, which is non-differentiable.

3.3.1 Improved Fireworks Algorithm

Fireworks Algorithm (FA) [62] is a new swarm intelligence algorithm proposed in recent years, and its idea comes from the fireworks explosion process in real life. FA automatically balances local and global search capabilities by regulating the quantity of offspring generated by fireworks through the explosion intensity. The former can hasten population convergence, whilst the latter can guarantee population diversity. The original FA uses the explosion, mutation, selection, and mapping rules as its four major operators. Based on the original FA [62], our Improved Fireworks Algorithm (IFA) enhances the explosion radius (also called explosion amplitude) of FA in the explosion operator to improve the local search capability by Equations (6) and (7) while leaving the other components of the algorithm unaltered.

Rinew ={xCF×(1+N(0,1))xi,Si=SmaxRmaxf(xipbest )Yminpbest +εi=1N(f(xipbest)Yminpbest )+ε,SiSmaxR_{i}^{\text{new }}=\left\{\begin{array}[]{l}x_{CF}\times(1+N(0,1))-x_{i},S_{i}=S_{\max}\\ R_{\max}\frac{f\left(x_{i}^{\text{pbest }}\right)-Y_{\min}^{\text{pbest }}+\varepsilon}{\sum_{i=1}^{N}\left(f\left(x_{i}^{pbest}\right)-Y_{\min}^{\text{pbest }}\right)+\varepsilon},S_{i}\neq S_{\max}\end{array}\right. (6)
xipbest={xi,f(xi)<f(xipbest)xipbest, otherwise x_{i}^{pbest}=\left\{\begin{array}[]{l}x_{i},f\left(x_{i}\right)<f\left(x_{i}^{pbest}\right)\\ x_{i}^{pbest},\text{ {otherwise} }\end{array}\right. (7)

where the Core Fireworks (CF) xCFx_{CF} is the individual with the best fitness value in the fireworks population. The Gaussian distribution function N(0,1)N(0,1) has a mean of zero and a variance of 1. SiS_{i} is the number of explosion sparks produced by the i-th firework individuals xix_{i}. SmaxS_{max} is the maximum number of explosion sparks. RmaxR_{max} is the maximum explosion radius that the fireworks individuals are allowed to displace. The i-th firework individual’s fitness value is represented by f(xi)f(x_{i}). The worst firework individual’s fitness value is YmaxY_{max} = max{f(x1)f(x_{1}), f(x2)f(x_{2}), …, f(xi)f(x_{i})}.

The FA chooses the next generation of fireworks from candidate individuals, and the others are discarded. The best historical information in the candidate set will not be fully utilized using this strategy in the FA. Equations (6) and (7) illustrate how our IFA generates an adaptable explosion radius by utilizing the historical information xipbestx_{i}^{pbest} from the i-th fireworks individual xix_{i}. If the i-th fitness value of xix_{i} is smaller than that of xipbestx_{i}^{pbest}, the xipbestx_{i}^{pbest} is updated by Equation (7).

3.3.2 IFA-based Feature Selection

Refer to caption
Figure 6: Feature selection based on the Improved Fireworks Algorithm.

The process of feature selection based on the Improved Fireworks Algorithm is shown in Figure 6. The purpose of the IFA iteration is to search for the feature subset’s locations. In other words, each individual xix_{i} of the IFA can represent a set of feature subsets, which are used to determine the corresponding dimension of the selected features from the feature vector F=[FiPPG,FBF,FAF,FTF,FQF]F=[F_{iPPG},F_{BF},F_{AF},F_{TF},F_{QF}].

For example, the value of each dimension of individual xix_{i} generated by each iteration of IFA is in the range of [0,1]. After xix_{i} is discretized by Equation (8), it can represent a set of decision variables.

xijB={0,xij<0.51,otherwisex_{ij}^{B}=\left\{\begin{array}[]{l}0,{x_{ij}}<0.5\\ 1,otherwise\end{array}\right. (8)

where xijBx_{ij}^{B} represents the Binary value of the j-th dimension of the i-th individual xix_{i} after being discretized. That is to say, when xij<0.5x_{ij}<0.5 and xijB=0x_{ij}^{B}=0, it means that the feature corresponding to the j-th dimension in the feature vector FF is not selected, otherwise it is selected. The green squares with the value in Figure 6 represent the index of the selected feature, while the red squares represent the index of the unselected feature at the discretized position of the i-th individual xiBx_{i}^{B}. Finally, the Selected Features SF=[SF1,SF2,SF4,,SFj2,SFj]SF=[SF_{1},SF_{2},SF_{4},...,SF_{j-2},SF_{j}] can be determined from feature vector F by xiB=[xi1B,xi2B,,xijB]x_{i}^{B}=[x_{i1}^{B},x_{i2}^{B},...,x_{ij}^{B}], and jj=1, 2, …, dd. And dd is the total number of dimensions of FF.

3.4 Anxiety Inference

Refer to caption
Figure 7: Anxiety Inference.

The parameter learning for anxiety inference models can be viewed as a 0-1 programming problem. If the feature vector FF in Figure 6 has d-dimensional features, there are 2d2^{d} possible feature combinations. The feature subset is searched by IFA from the solution space of 2d2^{d} feature combinations to find the one that can achieve the best model’s overall classification evaluation metrics. In previous studies [19], the model’s performance usually uses one or more classification evaluation metrics, such as accuracy, precision, sensitivity, specificity, or F1-score. However, when the samples in the data set are unbalanced, these metrics can hardly distinguish the model’s performance [63]. In addition, too low sensitivity or specificity may cause adverse consequences. For instance, a test with low sensitivity may cause errors, and fail to detect correctly. The test’s low specificity might lead to a lot of false positive results, which is quite stressful for patients. Therefore, it is necessary to measure the performance of the model by multiple classification evaluation metrics, such as the Area Under Curve (AUC), accuracy (Acc), precision (Pre), sensitivity (Sen), specificity (Spe), and F1-score (F1), and use Equation (9) as the loss function for model optimization. A penalty factor λ=0.2×d\lambda=0.2\times d needs to be introduced to limit the dimensions of the selected features.

minf(xiB)=(AUC+Acc+Pre+Sen+F1+Spe),j=1dxijBλ\begin{split}minf(x_{i}^{B})=-(AUC+Acc+Pre+Sen+\\ F1+Spe),\sum\limits_{j=1}^{d}{x_{ij}^{B}}\geq\lambda\end{split} (9)
Acc=TP+TNTP+TN+FP+FNAcc=\frac{{TP+TN}}{{TP+TN+FP+FN}} (10)
Pre=TPTP+FPPre=\frac{{TP}}{{TP+FP}} (11)
Sen=TPTP+FNSen=\frac{{TP}}{{TP+FN}} (12)
F1=2TP2TP+FP+FNF1=\frac{{2TP}}{{2TP+FP+FN}} (13)
Spe=TNTN+FPSpe=\frac{{TN}}{{TN+FP}} (14)

where True-Positive (TP) and True-Negative (TN) denote the accurate classification of the anxiety and anxiety-free samples, respectively. False-Negative (FN) and False-Positive (FP) suggest that the anxiety and anxiety-free samples are wrongly categorized respectively. In addition, AdaBoost [64] is a technique for ensemble learning that uses an iterative process to improve weak classifiers by learning from their mistakes [10], [13]. Due to its effective classification performance and the interpretability of the inference process, Adaboost is utilized as the classifier for anxiety screening.

Figure 7 depicts the anxiety inference process. The feature subset produced by the feature selection method can be used for determining the selected features. The data corresponding to these features is divided into training and test sets. The training set is used to train the classifier. The trained classifier assesses the feature subset based on the predictions from the test set. When a feature subset is evaluated once, the number of evaluations is increased accordingly, ie, Eve=Eve+1Eve=Eve+1. If the total number of evaluations reaches the predefined maximum number of evaluations, ie, Eve>MaxEveEve>MaxEve, the selected features and trained classifier are used for anxiety inference. Otherwise, the feature selection method process searches for a new feature subset.

4 Performance Evaluation

To gather information about the physical and mental health from seafarers for conducting the experiments, we have collaborated with West China Hospital of Sichuan University for performance evaluation. These experiments mainly focus on seafarers’ health perception and intervention. On August 12, 2020, the Biomedical Ethics Committee of Hefei University of Technology gave the study its full approval with experiment registration number W2020JSFW0388. All participating seafarers had given the consent to the experiments.

4.1 Experiments

Refer to caption
Figure 8: The interface of seafarers’ physical and mental health assessment.

We have designed a system that can be accessed using mobile devices based on the WeChat applet platform. With their smartphones, seafarers can then check their physical and mental well-being. First, seafarers need to fill in a questionnaire. As shown in Table 2, the content of the questionnaires includes personality traits [54], poor sleep quality (leading to fatigue) [45], bad emotional state, attitude to life [58], family relationship, social support [60], etc.

Next, the seafarers are then asked additional questions on their work status, and relationships with family and friends after completing the questionnaires. The seafarer’s responses are captured on camera by their phones. Each video capture lasts for 30 seconds. The smartphone concurrently records the audio data while recording the video. The sampling rates for audio and video are 25 FPS and 22.05 kHz, respectively. The video has a 480 ×\times 480 pixel resolution. The average, maximum, and minimum audio durations are 29.31 seconds, 30.44 seconds, and 28.96 seconds, respectively. When responding to the inquiries, the seafarer’s face is kept as visible as possible on the recorded screen, as shown on the right side of the interface in Figure 8. One frame per second is taken to assess the acquired video’s quality before the video is transferred to the server. The video is uploaded to the server if the rate of faces being detected in it exceeds 90%. Otherwise, it needs to be captured again.

In the experiments, all seafarers taking part in the physical and mental health evaluation were male. The age range of seafarers was 19 to 58. There were 2 seafarers’ age between 18 and 20, 167 people between the ages of 20 and 40, and 20 seafarers’ age between 40 and 60. The seafarers’ collective age was known to be 31, on average. The age information for other seafarers was not recorded due to errors. The start and end dates for data collection were from June 2020 to June 2021. A total of 227 of seafarers’ data were recorded: 189 of them had no anxiety, 33 had mild anxiety, and 5 had moderate anxiety. In other words, 189 people were labeled ”anxiety-free” and the remaining 38 people were labeled ”anxiety”. The GAD-7 [6] was used to measure seafarers’ anxiety levels. Additionally, psychiatrists were invited to validate the outcomes of the GAD-7 scale that the seafarers had completed.

4.2 Performance Results

TABLE IV: Performance comparison (%) of different methods.
Methods Different components in the methods Avg(%) Δ\DeltaAvg(%)
Dimension Reduction Feature Selection Anxiety Inference
MMD-AS (ours) 1DCNN+GRU+CNNText IFA AdaBoost 97.55 - - -
M1 1DCNN+LSTM+CNNText IFA AdaBoost 96.71 -0.84
M2 LSTM+CNNText IFA AdaBoost 96.27 -1.28
M3 1DCNN+CNNText IFA AdaBoost 97.01 -0.54
M4 PCA IFA AdaBoost 58.27 -39.28
M5 1DCNN+GRU+CNNText FA AdaBoost 97.10 -0.45
M6 1DCNN+GRU+CNNText BA AdaBoost 97.36 -0.19
M7 1DCNN+GRU+CNNText PSO AdaBoost 92.23 -5.32
M8 1DCNN+GRU+CNNText SKB AdaBoost 95.30 -2.25
M9 1DCNN+GRU+CNNText IFA DT 94.83 -2.72
M10 1DCNN+GRU+CNNText IFA RF 96.75 -0.80
M11 1DCNN+GRU+CNNText IFA LR 94.08 -3.47
M12 1DCNN+GRU+CNNText IFA KNN 80.03 -17.52
M13 1DCNN+GRU+CNNText IFA SVM 92.43 -5.12
M14 1DCNN+GRU+CNNText IFA MLP 95.05 -2.50

Our proposed MMD-AS framework consists of the dimension reduction component ”1DCNN+GRU+CNNText”, feature selection component ”IFA”, and anxiety inference component ”AdaBoost”. Table 4 shows the performance results of different methods for different components. The performance is computed using Average (Avg) = (Acc + AUC + Pre + Sen + F1 + Spe)/6. Δ\DeltaAvg shows the average performance difference of the method when compared to the proposed MMD-AS framework. Overall, the proposed MMD-AS has achieved the best performance with Avg = 97.55%.

4.2.1 Performance on Dimension Reduction Methods

The dimension reduction method of the proposed MMD-AS framework is ”1DCNN+GRU+CNNText”. The methods, which are used for comparison with the framework’s dimension reduction method, include ”1DCNN+ LSTM+CNNText” (M1), ”LSTM+CNNText” (M2) [36], ”1DCNN+ CNNText” (M3), and Principal Component Analysis (PCA) [35] (M4). The experiments are conducted by using the same feature selection component ”IFA”, and anxiety inference component ”AdaBoost” [10], [13] of the proposed MMD-AS framework with different dimension reduction methods. The performance results of MMD-AS with the dimension reduction method ”1DCNN+GRU+CNNText” has achieved the best performance. The proposed MMD-AS framework has achieved performance improvements of 0.84%, 1.28%, 0.54%, and 39.28%, respectively when compared with M1 to M4. However, the performance of M4 with dimension reduction method PCA performs worse than the methods based on deep learning-based dimension reduction methods such as ”1DCNN+GRU+CNNText” (MMD-AS), ”LSTM+CNNText” (M2) and ”1DCNN+CNNText” (M3). The reason for the performance differences between them is that the dimension reduction methods in MMD-AS and M1 to M3 are deep learning networks, which can extract time-series features from high dimensions and can effectively reduce the original data’s dimensions to improve the performance.

4.2.2 Performance on Feature Selection Methods

The feature selection method of the proposed MMD-AS framework is IFA. The methods, which are used for comparison with the framework’s feature selection method, include Fireworks Algorithm (FA) [62] (M5), Bat Algorithm (BA) [65] (M6), Particle Swarm Optimization (PSO) [66] (M7), and Selecting K-Best (SKB)[67] (M8). The experiments are conducted by using the same dimension reduction component ”1DCNN+GRU+CNNText”, and the anxiety inference component ”AdaBoost” of the proposed MMD-AS framework with different feature selection methods. The proposed MMD-AS framework with the feature selection method ”IFA” has achieved the best performance. The proposed MMD-AS framework has achieved performance improvements of 0.45%, 0.19%, 5.32%, and 2.25%, respectively when compared with M5 to M8.

The proposed MMD-AS framework with the feature selection method IFA has the improved explosion radius, which offers better local search capability to guide the fireworks population to find the better feature subset and reduce the noises in the features. Therefore, The MMD-AS’s IFA algorithm outperforms the Fireworks Algorithm (M5) by 0.44%. However, PSO (M7) and SKB (M8) perform quite poorly when compared with other feature selection methods based on swarm intelligence algorithms such as IFA, FA, and BA. The main reason for the poor performance of PSO (M7) is that the PSO’s search capability is not as good as other swarm intelligence algorithms, such as IFA, FA, and BA. In addition, as features with small variance may contain important information that distinguishes samples, the dimension reduction method of SKB may cause these features to be filtered, which may be the reason for the poor performance of SKB (M8).

4.2.3 Performance on Anxiety Inference Methods

The anxiety inference component of the proposed MMD-AS framework is AdaBoost. The methods, which are used for comparison with the framework’s anxiety inference method, include Decision Tree (DT) [12] (M9), Random Forest (RF) [39] (M10), Logistic Regression (LR) [10] (M11), K-Nearest Neighbors (KNN) [27] (M12), Support Vector Machines (SVM)[39] (M13), and Multilayer Perceptron (MLP) [68] (M14).

The experiments are conducted by using the same dimension reduction component ”1DCNN+GRU+CNNText”, and feature selection component IFA of the proposed MMD-AS framework with different anxiety inference methods. The proposed MMD-AS framework with the anxiety inference method ”AdaBoost” has achieved the best performance, with performance improvements of 2.72%, 0.80%, 3.47%, 17.52%, 5.12%, and 2.50% respectively when compared with M9 to M14. Since AdaBoost is an ensemble learning technique that uses an iterative process to improve weak classifiers [10], [13] by learning from mistakes, it can learn features that are conducive to anxiety inference.

4.3 Ablation Study

TABLE V: Experimental results of the ablation study of the MMD-AS framework.
Methods Different components in the framework Avg(%) Δ\DeltaAvg(%)
Dimension Reduction Feature Selection Anxiety Inference
MMD-AS (ours) 1DCNN+GRU+CNNText IFA AdaBoost 97.55 - - -
M15 1DCNN+GRU+CNNText - AdaBoost 91.90 -5.65
M16 1DCNN+GRU+CNNText - - 93.99 -3.56
M17 - - Adaboost 89.30 -8.25

We have conducted an experiment for an ablation study to evaluate the effectiveness of each component in our proposed MMD-AS framework: (1) M15 is MMD-AS without feature selection. (2) M16 is MMD-AS without feature selection and anxiety inference. As M16 only has a dimension reduction component ”1DCNN+GRU+CNNText”, it cannot perform classification. So we add a fully connected layer behind the component to enable M16 with a classification function. (3) M17 is MMD-AS without dimension reduction and feature selection.

Table 5 shows the results of the ablation experiments. We have the following observations. First, MMD-AS outperforms M15 by 5.65%, which demonstrates the capabilities of the feature selection method IFA in terms of feature selection and feature denoising. Second, MMD-AS outperforms M16 by 3.56%, which shows that the components IFA and AdaBoost improve the model’s performance. Third, MMD-AS outperforms M17 by 8.25%, which shows that the components ”1DCNN+GRU+CNNText” and IFA can improve the model’s performance. Overall, the different components of the proposed MMD-AS framework are important for the framework to achieve the best performance.

4.4 Analysis on Feature Selection

Refer to caption
Figure 9: The importance scores ranking of the proposed MMD-AS on different features

Feature importance scores are used to indicate which features are useful for anxiety screening. When the feature selection algorithm selects a feature of a certain dimension in the feature FF, the importance score of the feature is increased by one. Figures 9(a) to 9(d) show the ranking of our proposed MMD-AS framework on the importance scores of different types of features such as iPPG features (including HR and RR features), BF, AF, TF, and QF for anxiety screening.

In Figure 9(a), the importance of QF and BF features are ranked the highest and the second highest among the five categories of features, respectively. Figure 9(b) shows the feature importance scores of physiological representations in anxiety screening, such as audio features, and iPPG containing heart rate and respiration rate features. Audio features, such as Prosody, fundamental frequency features (including F0 and F01), Pitch Perturbation Quotient (PPQ), Jitter, and Amplitude Perturbation Quotient (APQ), play a more important role in anxiety screening. Their feature importance scores are all greater than 150. The iPPG features from frequency domain signals including iPPGFDFore{}_{Fore}^{FD} and iPPGFDNose{}_{Nose}^{FD} are more important for anxiety screening. Compared with the iPPG features from the nose area, iPPG features from the forehead are more important for anxiety screening, since the forehead is dense with blood vessels [42]. Figure 9(c) demonstrates that anxiety screening is more heavily influenced by characteristics in the chin (e.g. AU17), eyes (e.g. AU02, AU05, AU01, AU45, AU04), and lips (e.g. AU10, AU23, AU20) areas. The scores of these features are mostly distributed in the range of [160, 206]. Figure 9(d) shows the contribution of text and questionnaire features to anxiety screening. The feature importance scores of SSRS, MFI, PF, HPLP, and PSQI features ranked among the top in anxiety screening. The importance scores of these features are all greater than 1400.

Due to the complex etiology and long development cycle of anxiety, it usually requires a combination of multidisciplinary knowledge such as biomedicine, psychology, and social medicine to assist diagnosis [17]. In clinical practice, multimodal data are essential for anxiety screening [19], [18]. Therefore, based on the results of the feature analysis, we suggest that screening patients with anxiety should be combined with multimodal information. In addition, behavioral features (such as chin, eyes, lip area), physiological signals from the frequency domain (such as heart rate, respiration rate), audio characteristics (such as the Prosody, fundamental frequency features, PPQ, Jitter, and APQ), social support, fatigue status, sleep quality [45], lifestyle and other important modal characteristics can be used as indicators for clinical practice.

5 Limitations

There are still some shortcomings with our proposed framework.

First, our dataset has some limitations, such as uneven data and the small sample size. Due to the inherent characteristics of the seafarers’ profession, our dataset lacks health data related to female seafarers. Based on the results of health examinations, shipping companies have restricted seafarers with severe mental illness from boarding the ships for work, which may lead to the lack of data samples of severe anxiety in our dataset. Our framework can only learn from the existing samples, and not from the parameters in the missing samples. Our framework is driven by multimodal data, which may result in our model not being able to effectively screen out people suffering from anxiety when faced with certain new samples, which in turn limits the generalization and application.

Second, in the standard clinical diagnostic process, physicians are required to conduct a structured interview with the patient to further determine the patient’s mental health status. Although the GAD-7 is used to label seafarers’ anxiety levels, the lack of structured interviews with seafarers may lead to misdiagnosis or underdiagnosis of a small number of seafarers. However, multimodal data can provide physicians with more objective evidence when screening for anxiety in seafarers. Psychologists re-examine the psychological scales completed by seafarers, which reduces the probability of misdiagnosis or underdiagnosis of seafarers.

Third, important features can serve as objective evidence for anxiety reasoning, which needs more cohort experiments to verify. Since the factors leading to the formation of anxiety are multifaceted, which involve biological, medical, and sociological aspects. Therefore, it is very necessary to further validate this conclusion by increasing the data sample size and conducting cohort experiments. Nonetheless, these important characteristics are beneficial to design cohort research experiments to investigate the mechanisms of anxiety development.

Because of the above issues, we will focus on the following three areas in our future research.

First, knowledge from a range of fields, such as biomedical, psychological, and sociomedical, needs to be integrated into the proposed framework to enhance interpretability. By utilizing wearable technology and noncontact technologies, we will focus on extracting features from different dimensions, such as physiological features, heart rate variability, changes in facial temperature, distribution of facial temperature, audio features), behavioral features, family social support, sleep quality, and fatigue status. By combining these characteristics with clinical expert knowledge, more scientific evidence useful for anxiety inference can be provided. In addition, additional objective evidence expert domain knowledge will be integrated into our framework to assist primary care physicians in anxiety screening.

Second, the increased data sample size allows for a more comprehensive data distribution, which can enhance the robustness of the anxiety screening framework. The results of the analysis of important features based on the use of anxiety inference are used to build cohort research experiments, which in turn assist physicians in studying how anxiety is developed.

Finally, our framework offers the advantages of low cost, ease of use, noncontact, interpretable and high accuracy. In addition, our framework enables anxiety screening of seafarers by simply analyzing multimodal health data via smartphones. It is invaluable in future telemedicine scenarios. Therefore, our framework will be extended and applied to anxiety detection for large populations in scenarios where medical resources are limited, such as health coverage for seafarers on long voyages, or in remote areas.

6 Conclusion

Existing methods for anxiety screening have some drawbacks, such as the inability to solve the non-differentiable problem of feature combinations and the inability to meet the requirement of scenarios with limited medical resources. To overcome these drawbacks, we have proposed a multimodal data-driven framework called MMD-AS for seafarer’s anxiety screening in this paper. The experimental results of comparative experiments and ablation studies of different components in the framework show that our proposed framework has achieved the best performance among the comparison methods, and each component of the proposed framework is important for performance improvement. In addition, due to the advantages of low cost, noncontact and convenient operation, the proposed framework and the suggested indications for anxiety screening have certain guiding significance and application value for application scenarios with limited medical resources, such as the health protection of seafarers on long-distance voyages and anxiety screening in remote areas. For further work, we will collect relevant health data from more people, and apply the proposed framework for anxiety screening in clinical practices, which can provide more detailed and scientific evidence for anxiety screening to help study the process of anxiety development.

Acknowledgments

The authors would like to thank the professors Wei Zhang and Yuchen Li from West China Hospital of Sichuan University for the guidance on experimental design and data collection. This work was supported in part by the 2020 Science and Technology Project of the Maritime Safety Administration of the Ministry of Transport of China (No. 0745-2041CCIEC016) and the National Natural Science Foundation of China (No. 91846107, 72293581, 72293580).

References

  • [1] G. . M. D. Collaborators et al., “Global, regional, and national burden of 12 mental disorders in 204 countries and territories, 1990–2019: a systematic analysis for the global burden of disease study 2019,” The Lancet Psychiatry, vol. 9, no. 2, pp. 137–150, 2022.
  • [2] R. Hou, C. Holmes, M. Garner, C. Osmond, L. Lau, and D. Baldwin, “How is the immune balance affected in patients with generalised anxiety disorder?” Brain, Behavior, and Immunity, vol. 57, p. e43, 2016.
  • [3] E. Dias-Ferreira, J. C. Sousa, I. Melo, P. Morgado, A. R. Mesquita, J. J. Cerqueira, R. M. Costa, and N. Sousa, “Chronic stress causes frontostriatal reorganization and affects decision-making,” Science, vol. 325, no. 5940, pp. 621–625, 2009.
  • [4] P. M. Enock, S. G. Hofmann, and R. J. McNally, “Attention bias modification training via smartphone to reduce social anxiety: A randomized, controlled multi-session experiment,” Cognitive therapy and research, vol. 38, no. 2, pp. 200–216, 2014.
  • [5] F. Baygi, N. Mohammadian Khonsari, A. Agoushi, S. Hassani Gelsefid, A. Mahdavi Gorabi, and M. Qorbani, “Prevalence and associated factors of psychosocial distress among seafarers during covid-19 pandemic,” BMC psychiatry, vol. 21, no. 1, pp. 1–9, 2021.
  • [6] R. Stocker, T. Tran, K. Hammarberg, H. Nguyen, H. Rowe, and J. Fisher, “Patient health questionnaire 9 (phq-9) and general anxiety disorder 7 (gad-7) data contributed by 13,829 respondents to a national survey about covid-19 restrictions in australia,” Psychiatry Research, vol. 298, p. 113792, 2021.
  • [7] L. Efinger, S. Thuillard, and E. Dan-Glauser, “Distraction and reappraisal efficiency on immediate negative emotional responses: role of trait anxiety,” Anxiety, Stress, & Coping, vol. 32, no. 4, pp. 412–427, 2019.
  • [8] T. Pham, Z. J. Lau, S. A. Chen, and D. Makowski, “Heart rate variability in psychology: A review of hrv indices and an analysis tutorial,” Sensors, vol. 21, no. 12, p. 3998, 2021.
  • [9] L. Petrescu, C. Petrescu, O. Mitruț, G. Moise, A. Moldoveanu, F. Moldoveanu, and M. Leordeanu, “Integrating biosignals measurement in virtual reality environments for anxiety detection,” Sensors, vol. 20, no. 24, p. 7088, 2020.
  • [10] G. Giannakakis, M. Pediaditis, D. Manousos, E. Kazantzaki, F. Chiarugi, P. G. Simos, K. Marias, and M. Tsiknakis, “Stress and anxiety detection using facial cues from videos,” Biomedical Signal Processing and Control, vol. 31, pp. 89–101, 2017.
  • [11] G. Pfurtscheller, A. Schwerdtfeger, B. Rassler, A. Andrade, and G. Schwarz, “Mri-related anxiety can induce slow bold oscillations coupled with cardiac oscillations,” Clinical Neurophysiology, vol. 132, no. 9, pp. 2083–2090, 2021.
  • [12] F. R. Ihmig, F. Neurohr-Parakenings, S. K. Schäfer, J. Lass-Hennemann, and T. Michael, “On-line anxiety level detection from biosignals: Machine learning based on a randomized controlled trial with spider-fearful individuals,” Plos one, vol. 15, no. 6, p. e0231517, 2020.
  • [13] A. Puli and A. Kushki, “Toward automatic anxiety detection in autism: A real-time algorithm for detecting physiological arousal in the presence of motion,” IEEE Transactions on Biomedical Engineering, vol. 67, no. 3, pp. 646–657, 2019.
  • [14] G. Giannakakis, D. Grigoriadis, and M. Tsiknakis, “Detection of stress/anxiety state from eeg features during video watching,” in 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).   IEEE, 2015, pp. 6034–6037.
  • [15] N. Zhao, Z. Zhang, Y. Wang, J. Wang, B. Li, T. Zhu, and Y. Xiang, “See your mental state from your walk: Recognizing anxiety and depression through kinect-recorded gait data,” PLoS one, vol. 14, no. 5, p. e0216591, 2019.
  • [16] R. Favilla, V. C. Zuccala, and G. Coppini, “Heart rate and heart rate variability from single-channel video and ica integration of multiple signals,” IEEE journal of biomedical and health informatics, vol. 23, no. 6, pp. 2398–2408, 2018.
  • [17] A. Farre and T. Rapley, “The new old (and old new) medical model: four decades navigating the biomedical and psychosocial understandings of health and illness,” in Healthcare, vol. 5, no. 4.   MDPI, 2017, p. 88.
  • [18] S. Lee, T. Lee, T. Yang, C. Yoon, and S.-P. Kim, “Detection of drivers’ anxiety invoked by driving situations using multimodal biosignals,” Processes, vol. 8, no. 2, p. 155, 2020.
  • [19] X. Zhang, J. Pan, J. Shen, Z. U. Din, J. Li, D. Lu, M. Wu, and B. Hu, “Fusing of electroencephalogram and eye movement with group sparse canonical correlation analysis for anxiety detection,” IEEE Transactions on Affective Computing, 2020.
  • [20] G. Stiglic, P. Kocbek, N. Fijacko, M. Zitnik, K. Verbert, and L. Cilar, “Interpretability of machine learning-based prediction models in healthcare,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 10, no. 5, p. e1379, 2020.
  • [21] L. Ancillon, M. Elgendi, and C. Menon, “Machine learning for anxiety detection using biosignals: a review,” Diagnostics, vol. 12, no. 8, p. 1794, 2022.
  • [22] J. Šalkevicius, R. Damaševičius, R. Maskeliunas, and I. Laukienė, “Anxiety level recognition for virtual reality therapy system using physiological signals,” Electronics, vol. 8, no. 9, p. 1039, 2019.
  • [23] B. Bandelow and S. Michaelis, “Epidemiology of anxiety disorders in the 21st century,” Dialogues in clinical neuroscience, 2022.
  • [24] L. M. Shin and I. Liberzon, “The neurocircuitry of fear, stress, and anxiety disorders,” Neuropsychopharmacology, vol. 35, no. 1, pp. 169–191, 2010.
  • [25] M. Gavrilescu and N. Vizireanu, “Predicting depression, anxiety, and stress levels from videos using the facial action coding system,” Sensors, vol. 19, no. 17, p. 3693, 2019.
  • [26] M. Elgendi and C. Menon, “Assessing anxiety disorders using wearable devices: Challenges and future directions,” Brain sciences, vol. 9, no. 3, p. 50, 2019.
  • [27] Z. Li, X. Wu, X. Xu, H. Wang, Z. Guo, Z. Zhan, and L. Yao, “The recognition of multiple anxiety levels based on electroencephalograph,” IEEE Transactions on Affective Computing, 2019.
  • [28] T. Özseven, M. Düğenci, A. Doruk, and H. I. Kahraman, “Voice traces of anxiety: acoustic parameters affected by anxiety disorder,” Archives of Acoustics, pp. 625–636, 2018.
  • [29] E. I. Martin, K. J. Ressler, E. Binder, and C. B. Nemeroff, “The neurobiology of anxiety disorders: brain imaging, genetics, and psychoneuroendocrinology,” Clinics in laboratory medicine, vol. 30, no. 4, pp. 865–891, 2010.
  • [30] L. Albuquerque, A. R. S. Valente, A. Teixeira, D. Figueiredo, P. Sa-Couto, and C. Oliveira, “Association between acoustic speech features and non-severe levels of anxiety and depression symptoms across lifespan,” PloS one, vol. 16, no. 4, p. e0248842, 2021.
  • [31] K. Mogg, M. Garner, and B. P. Bradley, “Anxiety and orienting of gaze to angry and fearful faces,” Biological psychology, vol. 76, no. 3, pp. 163–169, 2007.
  • [32] A. Adams, M. Mahmoud, T. Baltrušaitis, and P. Robinson, “Decoupling facial expressions and head motions in complex emotions,” in 2015 International conference on affective computing and intelligent interaction (ACII).   IEEE, 2015, pp. 274–280.
  • [33] M. M. Bradley, L. Miccoli, M. A. Escrig, and P. J. Lang, “The pupil as a measure of emotional arousal and autonomic activation,” Psychophysiology, vol. 45, no. 4, pp. 602–607, 2008.
  • [34] Y. Guo, X. Wang, Q. Xu, F. Liu, Y. Liu, and Y. Xia, “Change-point analysis of eye movement characteristics for female drivers in anxiety,” International journal of environmental research and public health, vol. 16, no. 7, p. 1236, 2019.
  • [35] H. Haritha, S. Negi, R. S. Menon, A. A. Kumar, and C. S. Kumar, “Automating anxiety detection using respiratory signal analysis,” in 2017 IEEE Region 10 Symposium (TENSYMP).   IEEE, 2017, pp. 1–5.
  • [36] H. J. Richards, J. A. Hadwin, V. Benson, M. J. Wenger, and N. Donnelly, “The influence of anxiety on processing capacity for threat detection,” Psychonomic bulletin & review, vol. 18, no. 5, pp. 883–889, 2011.
  • [37] S. Lisk, A. Vaswani, M. Linetzky, Y. Bar-Haim, and J. Y. Lau, “Systematic review and meta-analysis: Eye-tracking of attention to threat in child and adolescent anxiety,” Journal of the American Academy of Child & Adolescent Psychiatry, vol. 59, no. 1, pp. 88–99, 2020.
  • [38] A. Sau and I. Bhakta, “Screening of anxiety and depression among the seafarers using machine learning technology,” Informatics in Medicine Unlocked, vol. 16, p. 100149, 2019.
  • [39] Y. Tyshchenko, “Depression and anxiety detection from blog posts data,” Nature Precis. Sci., Inst. Comput. Sci., Univ. Tartu, Tartu, Estonia, 2018.
  • [40] S. Saeb, E. G. Lattie, K. P. Kording, D. C. Mohr et al., “Mobile phone detection of semantic location and its relationship to depression and anxiety,” JMIR mHealth and uHealth, vol. 5, no. 8, p. e7297, 2017.
  • [41] A. T. Umrani and P. Harshavardhanan, “Hybrid feature-based anxiety detection in autism using hybrid optimization tuned artificial neural network,” Biomedical Signal Processing and Control, vol. 76, p. 103699, 2022.
  • [42] H. Mo, S. Ding, S. Yang, A. V. Vasilakos, and X. Zheng, “Collaborative three-tier architecture noncontact respiratory rate monitoring using target tracking and false peaks eliminating algorithms,” IEEE Transactions on Instrumentation and Measurement, vol. 71, pp. 1–13, 2022.
  • [43] T. J. Doty, S. Japee, M. Ingvar, and L. G. Ungerleider, “Fearful face detection sensitivity in healthy adults correlates with anxiety-related traits.” Emotion, vol. 13, no. 2, p. 183, 2013.
  • [44] R. Hou, M. Garner, C. Holmes, C. Osmond, J. Teeling, L. Lau, and D. S. Baldwin, “Peripheral inflammatory cytokines and immune balance in generalised anxiety disorder: Case-controlled study,” Brain, behavior, and immunity, vol. 62, pp. 212–218, 2017.
  • [45] L. Tang, S. Abila, M. Kitada, S. Malecosio Jr, and K. K. Montes, “Seafarers’ mental health during the covid-19 pandemic: an examination of current supportive measures and their perceived effectiveness,” Marine Policy, vol. 145, p. 105276, 2022.
  • [46] T. Baltrusaitis, A. Zadeh, Y. C. Lim, and L.-P. Morency, “Openface 2.0: Facial behavior analysis toolkit,” in 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018).   IEEE, 2018, pp. 59–66.
  • [47] O. G. Sani, Y. Yang, M. B. Lee, H. E. Dawes, E. F. Chang, and M. M. Shanechi, “Mood variations decoded from multi-site intracranial human brain activity,” Nature biotechnology, vol. 36, no. 10, pp. 954–961, 2018.
  • [48] G. Pfurtscheller, K. J. Blinowska, M. Kaminski, B. Rassler, and W. Klimesch, “Processing of fmri-related anxiety and information flow between brain and body revealed a preponderance of oscillations at 0.15/0.16 hz,” Scientific Reports, vol. 12, no. 1, pp. 1–12, 2022.
  • [49] S. Ding, Z. Ke, Z. Yue, C. Song, and L. Lu, “Noncontact multiphysiological signals estimation via visible and infrared facial features fusion,” IEEE Transactions on Instrumentation and Measurement, vol. 71, pp. 1–13, 2022.
  • [50] Y. Liu, X. Zhang, Y. Lin, and H. Wang, “Facial expression recognition via deep action units graph network based on psychological mechanism,” IEEE Transactions on Cognitive and Developmental Systems, vol. 12, no. 2, pp. 311–322, 2019.
  • [51] M. Pariente, S. Cornell, J. Cosentino, S. Sivasankaran, E. Tzinis, J. Heitkaemper, M. Olvera, F.-R. Stöter, M. Hu, J. M. Martín-Doñas et al., “Asteroid: the pytorch-based audio source separation toolkit for researchers,” arXiv preprint arXiv:2005.04132, 2020.
  • [52] N. Liu and Z. Yuan, “Spontaneous language analysis in alzheimer’s disease: Evaluation of natural language processing technique for analyzing lexical performance,” Journal of Shanghai Jiaotong University (Science), vol. 27, no. 2, pp. 160–167, 2022.
  • [53] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
  • [54] A. V. Nikčević, C. Marino, D. C. Kolubinski, D. Leach, and M. M. Spada, “Modelling the contribution of the big five personality traits, health anxiety, and covid-19 psychological distress to generalised anxiety and depressive symptoms during the covid-19 pandemic,” Journal of affective disorders, vol. 279, pp. 578–584, 2021.
  • [55] T. Mollayeva, P. Thurairajah, K. Burton, S. Mollayeva, C. M. Shapiro, and A. Colantonio, “The pittsburgh sleep quality index as a screening tool for sleep dysfunction in clinical and non-clinical samples: A systematic review and meta-analysis,” Sleep medicine reviews, vol. 25, pp. 52–73, 2016.
  • [56] H.-L. Teng, M. Yen, and S. Fetzer, “Health promotion lifestyle profile-ii: Chinese version short form,” Journal of advanced nursing, vol. 66, no. 8, pp. 1864–1873, 2010.
  • [57] K. A. Donovan, K. D. Stein, M. Lee, C. R. Leach, O. Ilozumba, and P. B. Jacobsen, “Systematic review of the multidimensional fatigue symptom inventory-short form,” Supportive Care in Cancer, vol. 23, no. 1, pp. 191–212, 2015.
  • [58] J. M. Y. Huen, P. S. F. Yip, A. Osman, and A. N. M. Leung, “The suicidal behaviors questionnaire-revised (sbq-r) and its chinese version (c-sbq-r): Further validity testing using the culture, comprehension, and translation bias procedure.” Psychological Assessment, 2022.
  • [59] E. W. McGinnis, S. P. Anderau, J. Hruschak, R. D. Gurchiek, N. L. Lopez-Duran, K. Fitzgerald, K. L. Rosenblum, M. Muzik, and R. S. McGinnis, “Giving voice to vulnerable children: machine learning analysis of speech detects anxiety and depression in early childhood,” IEEE journal of biomedical and health informatics, vol. 23, no. 6, pp. 2294–2301, 2019.
  • [60] S. K. Brooks and N. Greenberg, “Mental health and psychological wellbeing of maritime personnel: a systematic review,” BMC psychology, vol. 10, no. 1, pp. 1–26, 2022.
  • [61] S. Kanai, Y. Fujiwara, and S. Iwamura, “Preventing gradient explosions in gated recurrent units,” Advances in neural information processing systems, vol. 30, 2017.
  • [62] J. Li and Y. Tan, “A comprehensive review of the fireworks algorithm,” ACM Computing Surveys (CSUR), vol. 52, no. 6, pp. 1–28, 2019.
  • [63] S. S. Mullick, S. Datta, S. G. Dhekane, and S. Das, “Appropriateness of performance indices for imbalanced data classification: An analysis,” Pattern Recognition, vol. 102, p. 107197, 2020.
  • [64] Y. Zhao, X. Chen, and J. Yin, “Adaptive boosting-based computational model for predicting potential mirna-disease associations,” Bioinformatics, vol. 35, no. 22, pp. 4730–4738, 2019.
  • [65] X.-S. Yang and A. H. Gandomi, “Bat algorithm: a novel approach for global engineering optimization,” Engineering computations, 2012.
  • [66] S. Praveen, N. Tyagi, B. Singh, G. R. Karetla, M. A. Thalor, K. Joshi, and M. Tsegaye, “Pso-based evolutionary approach to optimize head and neck biomedical image to detect mesothelioma cancer,” BioMed Research International, vol. 2022, 2022.
  • [67] D. V. Akman, M. Malekipirbazari, Z. D. Yenice, A. Yeo, N. Adhikari, Y. K. Wong, B. Abbasi, and A. T. Gumus, “k-best feature selection and ranking via stochastic approximation,” Expert Systems with Applications, vol. 213, p. 118864, 2023.
  • [68] A. A. Alnuaim, M. Zakariah, P. K. Shukla, A. Alhadlaq, W. A. Hatamleh, H. Tarazi, R. Sureshbabu, and R. Ratna, “Human-computer interaction for recognizing speech emotions using multilayer perceptron classifier,” Journal of Healthcare Engineering, vol. 2022, 2022.