Multi-Subset Approach to Early Sepsis Prediction

Kevin Ewig2, Xiangwen Lin2, Tucker Stewart2, Katherine Stern4, Grant O’Keefe4, Ankur Teredesai2,3, and Juhua Hu*2 2School of Engineering and Technology, University of Washington, Tacoma, WA 98402, USA 4Department of Surgery, University of Washington, Seattle, WA 98195, USA 3CueZen Inc., Seattle, WA 98101, USA {ewigkl, xlin8, trstew, kstern, gokeefe, ankurt, juhuah}@uw.edu

Abstract

Sepsis is a life-threatening organ malfunction caused by the host’s inability to fight infection, which can lead to death without proper and immediate treatment. Therefore, early diagnosis and medical treatment of sepsis in critically ill populations at high risk for sepsis and sepsis-associated mortality are vital to providing the patient with rapid therapy. Studies show that advancing sepsis detection by 6 hours leads to earlier administration of antibiotics, which is associated with improved mortality. However, clinical scores like Sequential Organ Failure Assessment (SOFA) are not applicable for early prediction, while machine learning algorithms can help capture the progressing pattern for early prediction. Therefore, we aim to develop a machine learning algorithm that predicts sepsis onset 6 hours before it is suspected clinically. Although some machine learning algorithms have been applied to sepsis prediction, many of them did not consider the fact that six hours is not a small gap. To overcome this big gap challenge, we explore a multi-subset approach in which the likelihood of sepsis occurring earlier than 6 hours is output from a previous subset and feed to the target subset as additional features. Moreover, we use the hourly sampled data like vital signs in an observation window to derive a temporal change trend to further assist, which however is often ignored by previous studies. Our empirical study shows that both the multi-subset approach to alleviating the 6-hour gap and the added temporal trend features can help improve the performance of sepsis-related early prediction.

Index Terms:

Sepsis, Septic Shock, Early Prediction, Machine Learning, Multi-Subset Approach, Temporal Change Trend

I Introduction

Sepsis is a major public health concern. It is a life-threatening disease caused by a host’s failed response to an infection [1]. The immune system of a sepsis patient becomes aggressive in its protection against infection in the body, which causes organ dysfunction and potential organ damage. Sepsis is still a common problem in modern medical settings, particularly Intensive Care Units (ICUs). According to a global survey conducted in 2018 [2], roughly 13.6% to 39.3% of patients admitted to ICU are impacted by sepsis. This share is 29.5% worldwide. Patients with sepsis also experience longer and more expensive hospital stays. For patients that survive, sepsis can cause an increased risk of permanent organ damage, and physical disability [3].

Early treatment before the formation of sepsis in patients has been shown to improve the chances of successfully treating and preventing the disease [4]. Early care can prevent 80% of sepsis-related deaths, and the chances of survival drop by 8% every hour if action is not taken [5]. In particular, studies have shown that treating sepsis 6 hours earlier before the onset significantly improves the patient’s chance of recovering [6]. Therefore, this work focuses on the problem of early sepsis prediction, that is, 6 hours in advance. Several clinical scores such as SOFA have been developed to indicate the onset of sepsis with clinical-based data [7, 1]. These measures are helpful when sepsis is onset already but have been limited for early prediction. For early sepsis onset prediction, predictions derived using supervised machine learning models such as Random Forest or Long Short-Term Memory (LSTM) models have vastly outperformed clinical scores [8].

Because of this, there has been ongoing research in developing machine learning predictive models to detect the onset of sepsis early before it is suspected clinically. Some studies have used traditional machine learning models such as Logistic Regression and Random Forest [9, 10]. However, these studies did not capture the progressing temporal pattern that can be useful for early sepsis prediction. Recently, deep learning models have also been applied in sepsis prediction, such as Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs) [11]. Compared with traditional machine learning methods, deep learning often provides better prediction performance. However, training deep models can be tedious to obtain the optimal subsets and parameters. In addition, none of them has considered the fact that 6 hours is not a small gap. Without the progressing information for the next 6 hours before onset, it is challenging to make an accurate prediction, which is explored in this work.

First, it should be noted that training successful deep neural networks (DNNs) is expensive in terms of time, computational resources, and data. We aim to avoid using DNNs but still capture the progressing temporal change pattern for early sepsis prediction. We accomplish this by computing the hourly changes in feature values like heart rate within a given observation window. Doing this will allow us to consider the temporal changes in the feature even in a traditional machine learning model, where features are often considered independently. Second, we propose a multi-subset approach to close the 6-hour gap. The contributions of this paper can be summarized as follows.

•

We propose to generate a series of delta values to capture the temporal change trend, which is incorporated into the feature set for training and prediction.
•

We develop a multi-subset approach in which the likelihood of sepsis arising earlier than 6 hours are generated from the previous subset and provided to the target subset as additional features to close the big 6-hour gap.
•

Based on the proposed method, we apply an economical machine learning algorithm, that is XGBoost [12], to the trauma patients from the year 2012 to 2019 at ICUs of UW Harborview Center, and obtain an AUROC of 0.7906 for early sepsis prediction and that of 0.9199 for early septic shock prediction, better than a deep learning model due to limited amount of sepsis cases.

II Related Work

Various machine learning models have been studied for sepsis prediction as follows.

II-A Traditional Models

Some of the research done in sepsis prediction uses simple traditional machine learning models. For example, Zabihiet et al. [13] used a wrapper feature selection algorithm based on XGBoost to extract five different sets of features from clinical data to predict sepsis 6 hours before onset. Both valid and missing clinical data are used to derive the relevant attributes. Afterward, an ensemble model comprised of five XGBoost models is utilized to predict sepsis. In addition, Firoozabadi et al. [14] created models to predict sepsis as part of the PhysioNet/CinC Challenge 2019. In this study, the authors took hourly samples of 40 features from each patient’s data from three different ICUs. They processed the data to remove outliers, fill in missing data, and replace the remainder with the population average. They discovered similarly that an ensemble of bagged decision trees is effective for early sepsis prediction 6 hours before onset. A similar observation is shown by Fu et al. [15].

However, these models did not consider that the temporal change trend can be helpful for early sepsis onset prediction, and missing 6 hours of progressing information is challenging to make an accurate prediction. In this work, we address both by adding temporal change trend features and proposing a multi-subset approach.

II-B Deep Learning Models

Deep learning models have also been applied to address specific challenges of sepsis prediction. For example, Gilbertson et al. [16] used the Principal Component Analysis (PCA) to reduce the dimensionality of patient data before applying a simple RNN. The PCA method reduced the original 40 patient features to 10 main components. In addition to this, a fast Sequential Organ Failure Assessment (qSOFA) score generates one additional characteristic. These 11 characteristics are then loaded into a DNN classifier. However, the characteristics connected with each hour are examined independently. Later, Tsang et al. [17] applied LSTM which allows consideration of past time-step data embeddings within the prediction of the current time-step to capture the temporal pattern. Most recently, Shah et al. [18] proposed a DNN model to predict sepsis 6 hours in advance. The DNN model is used because of its feature learning capacity and its property of approximating functions.

Better sepsis onset prediction performances were observed using these DNNs. However, none of them has considered addressing the 6-hour gap challenge, which is explored in this paper. Moreover, training DNNs is often time, resource, and data expensive, which is avoided in this work.

II-C Temporal Change Trend

In 2019 and 2021, Brekke et al. [19] and Mahta et al. [20] respectively proved that the temporal trend of vital signs and white blood cells could improve the performance of deterioration prediction by studying the retrospective cohort. In the machine learning area, Orphanou et al. [21] employed Temporal Association Rules (TAR) combined with Naïve Bayes classifiers for coronary heart disease diagnosis. They showed that higher levels of temporal abstraction improved the prediction performance when long sequences and distant events are critical.

In the sepsis-related prediction area, Ghosh et al. [22] adapted temporal patterns resulting from the contrast pattern mining approach in conjunction with Coupled Hidden Markov Model (CHMM) to predict septic shock 30-60 minutes prior to onset. This paper demonstrated that this performance is significantly higher than applying SVM directly on data or using CHMM with continuous multivariate data. Nonetheless, the 30-60 minutes prediction window is too narrow to let the doctor react.

III The Proposed Method

We assume that the Electronic Health Records (EHRs) for a single patient can be represented as a $t\times f$ matrix $X$ . There are $t$ number of rows, where each row represents an hourly observation of a patient’s features and where the $t$ -th observation is the most recent. There is also $f$ number of columns, where each column represents a physiological feature. Let $x_{t,f}$ denote the numerical entry in $X$ for time $t$ and feature $f$ .

III-A Multi-Subset Approach

We denote the ground truth^§ for a sepsis incident as $\hat{y}$ that has an indication of onset hour $t$ . More than often, the hourly physiological patient data contain missing values, which are imputed and filled in. In order to train the model to predict the onset of a sepsis incident $h$ hours into the future, for each patient, we first shift the values in $\hat{y}$ earlier by $h$ hours and call this $\hat{y_{h}}$ . This step creates a new target feature that will occur in $h$ hours. The leftover values on the table with no ground truth values are dropped from the data. This step is illustrated in Table I when we set $h=3$ . From this example, we can observe that if we want to predict sepsis event onset in 3 hours, using observed patient data up to $t=1$ , the onset prediction for $t=4$ should be 0, and using data up to $t=2$ , the prediction of the onset of a sepsis incident for $t=5$ should be 1.

^§^§footnotetext: The onset of a sepsis incident label has been given using the CDC’s adult sepsis surveillance criteria with prior modifications utilizing readily obtainable EMR data to improve specificity for the trauma population. It is required that all of the following be present: 1) an order for a new IV or qualifying oral antibiotic, not administered within the previous 48 hours and excluding antibiotics used for surgical prophylaxis; 2) a body tissue culture was ordered within 48 hours of antibiotic initiation; 3) a qualifying antibiotic was sustained for at least 4 consecutive days, or until death or discharge; and 4) a 2-point increase in the maximum daily sequential organ failure assessment (SOFA) score occurred within 3 days before and 3 days after the qualifying culture. The criteria are restricted to hospital-acquired infections, which are defined as cultures obtained on or after the third hospital day. Two subgroups were independently adjudicated before final sepsis assignments were made: culture-negative sepsis and patients meeting partial but not full criteria.

TABLE I: In this example, the table on the left contains the ground truth for a sepsis incident at time

t

. In order to produce new ground truth values for an incident at time

t+h

, the

\hat{y}

values are shifted by

h=3

hours. The leftover rows are dropped.

$t$	$\hat{y}$
1	0
2	0
3	0
4	0
5	1
6	1

$\rightarrow$ $t$ $\hat{y_{3}}$ 1 0 2 1 3 1 4 - 5 - 6 - $\rightarrow$ $t$ $\hat{y_{3}}$ 1 0 2 1 3 1

After shifting the label for $h$ hours to predict the onset of an incident in $h$ hours, for a certain hour $t$ , we can use only the $x_{t,f}$ values across all of the features, and the new ground truth label $\hat{y_{h}}$ to train a classification model for the onset of a sepsis incident that will be able to make a prediction on new patients and provide the probability of the event in $h$ hours as $y_{h}$ . This is illustrated as an example in Table II. Specifically, given the hard label of an incident, the trained model can provide the probability of the onset of such an event at each time stamp, which inspires our multi-subset approach.

TABLE II: Probability of a sepsis incident in

h=3

hours. Features

x_{1}

x_{2}

, …,

x_{f}

are used to train a model with the ground truth label

\hat{y_{h}}

, and the model is then used to predict the probability of onset in

h

hours as

y_{h}

$t$	$x_{1}$	$x_{2}$	…	$x_{f}$
1	98	102	…	30
2	95	107	…	32
3	98	111	…	35
4	100	109	…	40
5	97	109	…	41

+ $\hat{y_{3}}$ 0 0 0 0 1 $\rightarrow$ $y_{3}$ 0.16 0.24 0.32 0.49 0.61

To close the 6-hour gap of missing progressing information, we can include the probability of a sepsis incident in 3 hours as a new feature (i.e., one subset) to predict the incident in 6 hours in the target subset as shown in Table III. This procedure is also illustrated in Fig. 1. In a previous subset (e.g., Subset 1 in Fig. 1), we use the observed data up to the prediction time to determine the probabilities that a patient will experience a sepsis incident between the current time $t$ and the sepsis incident onset time $t+6$ . For example, we can use the observed data to determine the probability of the onset of a sepsis incident at $t+3$ . Then, this probability will be treated as an additional input feature in the target subset (e.g., Subset 2 in Fig. 1) of our approach. After that, the large 6-hour gap can be shrunk using an in-between measure with a smaller gap of 3. We can further shrink the gap by providing sepsis incident probabilities of each hour in-between to feed more features in the target subset, which is using multiple subsets to close the 6-hour gap.

TABLE III: A toy example of calculating the probability of onset in 6 hours with an additional feature of

y_{3}

, that is, the probability of onset at

h=3

output from a previous subset.

$t$	$x_{1}$	$x_{2}$	…	$x_{n}$	$y_{3}$
1	98	102	…	30	0.16
2	95	107	…	32	0.24

+ $\hat{y_{6}}$ 0 1 $\rightarrow$ $y_{6}$ 0.46 0.64

Refer to caption — Figure 1: Timeline of subsets used to close the prediction gap

III-B Delta Values for Temporal Change Trend

It should be noted that the same absolute value for different patients can have totally different meanings, while changes over time can share the same indication. Therefore, we use delta values to capture the changes in feature values over time, which can provide valuable temporal patterns for sepsis prediction. The observation window size as shown in Fig. 1, denoted by $w$ , is an important parameter that determines the duration of time over which the delta values are computed.

To compute the delta values, we create a new feature, denoted by $f^{\prime}$ , which is computed as the difference between the value of $f$ at timestamp $t$ and the value of $f$ at the previous timestamp $t-1$ within the observation window $w$ . In other words, the value of $f^{\prime}$ at timestamp $t$ is calculated as $x_{t,f^{\prime}}=x_{t,f}-x_{t-1,f}$ . This computation is repeated for each feature and each timestamp within the observation window. For example, if the heart rate at timestamp 11 is 70 beats per minute (bpm) and the heart rate at timestamp 10 is 75 bpm, the delta value for the heart rate at timestamp 11 would be -5 bpm.

III-C Statistical Values for Temporal Change Trend

We leverage delta values to capture the temporal change trend within the observation window $w$ . We can also incorporate statistical information computed from a feature $f$ within the same time window to describe the feature distribution. Specifically, for each feature $f$ within the observation window $w$ , we can calculate six statistical values based on the sequence $S=[x_{t-w,f},x_{t-w+1,f},x_{t-w+2,f},...,x_{t,f}]$ : the mean, minimum, maximum, standard deviation, skewness, and kurtosis. Variance, which is the expectation of the squared deviation, measures how spread out the data is, while skewness is a measure of the asymmetry of the variable’s probability distribution, and kurtosis measures the tailedness of the probability distribution. The formulas for calculating these statistical values are as follows:

•

Mean value: $\overline{x}=\frac{1}{N}\sum_{i=1}^{N}x_{i}$
•

Standard deviation: $\sigma=\sqrt{\frac{1}{N}\sum_{n=1}^{N}(x_{i}-\overline{x})^{2}}$
•

Kurtosis: $\text{K}=\frac{1}{N}\sum_{i=1}^{N}\frac{(x_{i}-\overline{x})^{4}}{\sigma^{4}}$
•

Skewness: $\text{Sk}=\frac{1}{N}\sum_{i=1}^{N}\frac{(x_{i}-\overline{x})^{3}}{\sigma^{3}}$

By incorporating both delta values and statistical information, our model is better equipped to capture the temporal change trend for early sepsis prediction. The overall process is illustrated in Fig. 2. First, delta and/or statistic values are generated. Then, a 3-hour prediction is generated to address the prediction gap and added as a previous subset for the target 6-hour prediction.

IV Experiments

To sufficiently demonstrate the proposed method, we apply it to two important sepsis-related tasks, that is, early sepsis onset prediction and early septic shock prediction. we conduct extensive experiments using the state-of-art traditional machine learning model, i.e., XGBoost [12], to justify the superiority of our proposal and answer the following questions:

•

What are the benefits of using multi-subsets in the development of sepsis and septic shock prediction models?
•

How do delta and statistic values incorporate the temporal change trend affect the effectiveness of the prediction models for sepsis and septic shock?
•

Which features are more important for early sepsis or septic shock prediction?

IV-A Data Description

To evaluate the proposed method, we use trauma patients’ data from UW Harborview Medical Center. It contains prospectively collected, de-identified data from injured adults ages 16 years and older, who were admitted to the ICU between 2012 to 2019 and required at least three days of invasive mechanical ventilation. This data includes physiological data on 2,802 patients, 486 (17%) were identified to have sepsis during the first 14 days of admission. The patient data is sampled at different time intervals. Vital signs, for example, were sampled hourly, whereas laboratory tests were sampled daily or less frequently. As such, there are some features with many missing values. We address missing values using carry-forward imputation.

To find which features would be more useful for early sepsis prediction, we set groups of features that share common attributes. They are vital signs, static profile, cumulative exposures, and laboratory results, where static profile includes initial physiology from the first 48 hours and patient factors, and cumulative exposures are the summary of events. These 4 groups are denoted as G1, G2, G3, and G4. The detailed list is shown in the Appendix.

IV-B Imbalanced-Class Problem in Data

One of the primary problems with the sepsis patient data we received is predicting an outcome that occurs in a minority of the study population using an hour-by-hour framework. This can be a problem because it leads to imbalanced classes in the data. In this case, the outcome of interest, which is a sepsis event, occurs in only 17% of the patients, making it a minority class. When there is a class imbalance in the data, it can lead to biased model performance. Approximately 1.84% of the hour-by-hour records in the data have a sepsis indicator, while the rest does not have it. Most of the patients admitted to the ICU never develop sepsis. Some patients develop sepsis within an hour or two, and some develop sepsis after a more extended period.

To account for the imbalanced nature of the data, we use the following methods. First, we only include hourly patient records from day 2 until day 14 of the patient’s stay in the ICU. This is because the scope of this study is patients developing sepsis after being in the ICU. Limiting the data will reduce the number of hourly records with no sepsis that does not need to be included. Second, if a patient has an hourly record with sepsis, we remove all the subsequent hourly records for that patient. For example, if we see a patient record with sepsis at 4:00, all observations after that time are removed. It is because we are only interested in the early sepsis onset prediction, that is, the first onset. In addition, to further account for the imbalance of data, we use a random oversampling of 0.8. To illustrate this further, suppose there are 20 minority samples and 1000 majority samples. Random oversampling duplicates the 20 minority samples without replacing them until there are 800 samples. As a result, there are 800 minority samples and 1000 majority samples. Finally, we use random under-sampling to make the number of minority class data the same as the number of majority class data in the sample.

IV-C Evaluation Metrics

To evaluate the performance of the proposed approach, we use metrics of Area Under the Receiver Operator Curve (AUROC), Sensitivity, and Specificity, where the average performance under 5-fold cross-validation is reported. Specifically, $Sensitivity=\frac{TP}{TP+FN}$ , where $TP$ is the number of true positives and $FN$ is the number of false negatives, is defined as a measure of how well the model can recognize positive examples.

At the same time, $Specificity=\frac{TN}{FP+TN}$ is the fraction of real negatives properly detected by the model. Finally, AUROC is a useful metric for evaluating the performance of a binary classifier because it provides a single number that summarizes the classifier’s ability to discriminate between positive and negative samples over all possible decision thresholds.

IV-D Effect of Multi-Subset Approach

Considering the feature selection property of XGBoost that can benefit from more features, we use the largest set of features (i.e., Group 1+2+3+4 with delta and statistic values in the observation window, which is set to $w=6$ ) to demonstrate the effect of our multi-subset approach. In the comparison, we use ‘1 Subset’ to denote the method using only the target subset, ‘2 Subsets’ to denote the method adding the probability of sepsis in 3 hours to the target subset, and ‘6 Subsets’ to denote the method adding probabilities of sepsis in 1 hour, 2 hours, 3 hours, 4 hours, and 5 hours to the target subset.

TABLE IV: Performance Comparison of Multi-subsets with Group 1+2+3+4 delta and stats for Sepsis and Septic shock prediction.

Name	AUROC	Sensitivity	Specificity
Sepsis
1 Subset	0.7720	0.7090	0.6960
2 Subsets	0.7790	0.6990	0.7220
6 Subsets	0.6180	0.6210	0.5490
Septic Shock
1 Subset	0.7906	0.6000	0.6188
2 Subsets	0.8219	0.6971	0.7587
6 Subsets	0.8781	0.7943	0.7829

Table IV compares the performance for both the tasks of early sepsis prediction and early septic shock prediction. We can observe that our multi-subset approach is definitely helpful for the task of early septic shock prediction, where adding the probability of shock in 3 hours improves the performance compared to using only the target subset and adding more probabilities can further improve. However, this is not the case for early sepsis prediction. In the experiment, we observed that more features does not mean more choices of features for XGBoost that can help improve the performance. For the task of early sepsis prediction, we find that using only Group 1 (i.e., vital signs) and Group 2 (i.e., static profile) features are more helpful as shown in Table V. More importantly, we can observe the similar phenomenon that adding the probability of sepsis in 3 hours improves the performance compared to using only the target subset and adding more probabilities can further improve.

TABLE V: Performance Comparison of Multi-subsets with Group 1+2 delta and stats for Sepsis prediction.

Name	AUROC	Sensitivity	Specificity
1 Subset	0.6072	0.5741	0.5917
2 Subsets	0.7111	0.6420	0.6695
6 Subsets	0.7906	0.7222	0.7251

Fig. 3 further demonstrates the observation that for early sepsis prediction, certain features are not helpful and even harmful. In general, the model with more subsets performs better than the model that has only 1 subset. However, we find that the group combinations with G3 and G4 may affect the result lower than fewer subsets. For example, Fig. 3 shows the result of different group combinations with delta and stats using different subsets. Group 1+2 and Group 1+3 are getting better when adding subsets, but Group 1+3+4 is getting lower, and Group 1+2+3+4 is getting higher first, then getting lower. This is because that G4 has much more missing values other groups, which may mislead the model learning.

IV-E Effect of Delta and Stats

In this sub-section, we evaluate the effect of using delta, stats, or a combination of delta and stats in the prediction for sepsis and septic shock. We take the best feature group combination for each task with ‘6 Subsets’ to compare, that is, Group 1+2 for sepsis and Group 1+2+3+4 for septic shock, respectively.

TABLE VI: Performance Comparison with Group 1+2 for Sepsis and Group 1+2+3+4 for Septic Shock using ‘6 Subsets’.

Name	AUROC	Sensitivity	Specificity
Sepsis
Baseline	0.7495	0.6667	0.7277
Delta	0.7629	0.6728	0.7192
Stats	0.7904	0.7119	0.7170
Delta and Stats	0.7906	0.7222	0.7251
DNN	0.6869	0.4845	0.7824
Septic Shock
Baseline	0.9199	0.8457	0.8037
Delta	0.7929	0.7486	0.7021
Stats	0.7943	0.7314	0.7319
Delta and Stats	0.8781	0.7943	0.7829
DNN	0.6098	0.3143	0.7734

Table VI shows the results. First, for sepsis prediction, we can observe that both AUROC and Sensitivity are increasing when incorporating delta and stats. This indicates that using delta and stats can help improve the detection performance of sepsis patients, while the baseline can be biased to non-sepsis patients with a way higher Specificity compared to the Sensitivity. This demonstrates that using temporal change trend is helpful for early sepsis prediction.

For the task of septic shock, we have a different observation. Although the combination of Delta and Stats provides better performance compared to using only one of them, the baseline without using any additional temporal trend features gives the best performance. In fact, this is a reasonable observation, since shock as a sudden event is more related to the most recent conditions of the patient, while sepsis can be more observable using the progressing status of infections. In this case, using multiple subsets to shrink the 6-hour gap is more effective.

Moreover, considering the out-performance of DNNs, we apply RNN to the temporal features and DNN to static features. Then, we concatenate them for sepsis and septic prediction using the same setup. However, DNN performs worse. The potential reason is that DNNs often require sufficient training data to be successful, while we have very limited amount of sepsis cases, resulting in models significantly biased to Specificity.

IV-F Model Explanation

To further verify if these probabilities added from the previous subsets are helpful for making predictions, we apply LIME [23], a tool that attempts to explain the weights in a predictive model, and check the importance of features for sepsis onset prediction. Specifically, we take a random sample of the data and apply LIME to get the importance of each feature for both models of ‘2 Subsets’ and ‘6 Subsets’ shown in Table VII. The top-5 important features of each model are listed.

TABLE VII: Top features from LIME

3 Hour Probability Included
Feature	Value
sepsis-subset-3hr	0.026
creatinine-6	$-0.021$
surgSum-8	$-0.020$
neutrophils-1	$0.020$
neutrophils-9	$0.019$

1-5 Hour Probabilities Included
Feature	Value
creatinine-3	$-0.023$
neutrophils-2	$-0.021$
neutrophils-5	$0.019$
sepsis-subset-5hr	-0.019
creatinine-6	$0.014$

The results show that the probability of sepsis occurring in 3 hours contributed significantly to the prediction of sepsis onset for the method using 2 subsets. Similarly, the likelihood of sepsis occurring in 5 hours contributed significantly to the prediction for the method using 6 subsets. Moreover, the rest top features are delta values. This further demonstrates our proposal of shrinking the gap and using the temporal change trend.

V Conclusion

In this work, we present an economical machine learning approach for sepsis related early prediction, which is way less expensive in terms of data, resources, and training time compared to deep learning models. First, to do early prediction in 6 hours, we propose a multi-subset approach to close the gap of missing 6 hours of patient data. Moreover, to capture the temporal change trend that can share similar meaning for different patients, we construct a series of delta and statistic values. Our extensive experiments demonstrate the effectiveness of these two components considering different scenarios, which is also shown in the model explanation. In the future, we would like to enhance the explainability of the model so that it can provide more valuable data, which in turn can identify discriminatory features to monitor and screen for sepsis related events.

VI Acknowledgment

Stewart and Hu’s research is supported in part by NSF (IIS-2104270). Teredesai’s research is supported in part by CueZen Inc. We would like to thank Microsoft Inc.,and University of Washington eScience Institute- Azure Cloud Computing Credits for Research and Teaching grant for their support by providing us with the Cloud Compute Resources on Microsoft Azure. All opinions, findings, conclusions and recommendations in this paper are those of the authors and do not necessarily reflect the views of the funding agencies.

References

[1] M. Singer, C. S. Deutschman, C. W. Seymour, M. Shankar-Hari, D. Annane, M. Bauer, R. Bellomo, G. R. Bernard, J.-D. Chiche, C. M. Coopersmith et al., “The third international consensus definitions for sepsis and septic shock (sepsis-3),” Jama, vol. 315, no. 8, pp. 801–810, 2016.
[2] Y. Sakr, U. Jaschinski, X. Wittebole, T. Szakmany, J. Lipman, S. A. Ñamendys-Silva, I. Martin-Loeches, M. Leone, M.-N. Lupu, J.-L. Vincent et al., “Sepsis in intensive care unit patients: worldwide data from the intensive care over nations audit,” in Open forum infectious diseases, vol. 5, no. 12. Oxford University Press US, 2018, p. ofy313.
[3] C. Jones and R. Griffiths, “Mental and physical disability after sepsis.” Minerva anestesiologica, vol. 79, no. 11, pp. 1306–1312, 2013.
[4] H. I. Kim and S. Park, “Sepsis: Early recognition and optimized treatment,” Tuberculosis and respiratory diseases, vol. 82, no. 1, pp. 6–14, 2019.
[5] A. Kumar, D. Roberts, K. E. Wood, B. Light, J. E. Parrillo, S. Sharma, R. Suppes, D. Feinstein, S. Zanotti, L. Taiberg et al., “Duration of hypotension before initiation of effective antimicrobial therapy is the critical determinant of survival in human septic shock,” Critical care medicine, vol. 34, no. 6, pp. 1589–1596, 2006.
[6] R. L. Gauer, “Early recognition and management of sepsis in adults: the first six hours,” American family physician, vol. 88, no. 1, pp. 44–53, 2013.
[7] R. A. Balk, “Systemic inflammatory response syndrome (sirs) where did it come from and is it still relevant today?” Virulence, vol. 5, no. 1, pp. 20–26, 2014.
[8] M. Vollmer, C. F. Luz, P. Sodmann, B. Sinha, and S.-O. Kuhn, “Time-specific metalearners for the early prediction of sepsis,” in 2019 Computing in Cardiology (CinC). IEEE, 2019, pp. 1–4.
[9] F. Mahmud, N. S. Pathan, and M. Quamruzzaman, “Early detection of sepsis in icu patients using logistic regression,” in 2019 3rd International Conference on Electrical, Computer & Telecommunication Engineering (ICECTE). IEEE, 2019, pp. 173–176.
[10] ——, “Early detection of sepsis in critical patients using random forest classifier,” in 2020 IEEE Region 10 Symposium (TENSYMP). IEEE, 2020, pp. 130–133.
[11] X. Li, G. A. Ng, and F. S. Schlindwein, “Convolutional and recurrent neural networks for early detection of sepsis using hourly physiological data from patients in intensive care unit,” in 2019 Computing in Cardiology (CinC). IEEE, 2019, pp. Page–1.
[12] T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 2016, pp. 785–794.
[13] M. Zabihi, S. Kiranyaz, and M. Gabbouj, “Sepsis prediction in intensive care unit using ensemble of xgboost models,” in 2019 Computing in Cardiology (CinC). IEEE, 2019, pp. Page–1.
[14] R. Firoozabadi and S. Babaeizadeh, “An ensemble of bagged decision trees for early prediction of sepsis,” in 2019 Computing in Cardiology (CinC). IEEE, 2019, pp. Page–1.
[15] M. Fu, J. Yuan, and C. Bei, “Early sepsis prediction in icu trauma patients with using an improved cascade deep forest model,” in 2019 IEEE 10th International Conference on Software Engineering and Service Science (ICSESS). IEEE, 2019, pp. 634–637.
[16] E. H. Gilbertson, K. M. Jones, A. M. Stroh, and B. M. Whitaker, “Early detection of sepsis using feature selection, feature extraction, and neural network classification,” in 2019 Computing in Cardiology (CinC). IEEE, 2019, pp. Page–1.
[17] G. Tsang and X. Xie, “Deep learning based sepsis intervention: the modelling and prediction of severe sepsis onset,” in 2020 25th international conference on pattern recognition (ICPR). IEEE, 2021, pp. 8671–8678.
[18] N. Shah, J. Bhatia, N. Vasavat, R. Desai, and P. Sonawane, “Early sepsis detection using machine learning and neural networks,” in 2021 2nd Global Conference for Advancement in Technology (GCAT). IEEE, 2021, pp. 1–6.
[19] I. J. Brekke, L. H. Puntervoll, P. B. Pedersen, J. Kellett, and M. Brabrand, “The value of vital sign trends in predicting and monitoring clinical deterioration: A systematic review,” PloS one, vol. 14, no. 1, p. e0210875, 2019.
[20] A. Mahta, M. E. Reznik, B. B. Thompson, L. C. Wendell, and K. L. Furie, “In reply to the letter to the editor regarding” association of early white blood cell trend with outcomes in aneurysmal subarachnoid hemorrhage”,” World neurosurgery, vol. 154, p. 205, 2021.
[21] K. Orphanou, A. Dagliati, L. Sacchi, A. Stassopoulou, E. Keravnou, and R. Bellazzi, “Incorporating repeating temporal association rules in naïve bayes classifiers for coronary heart disease diagnosis,” Journal of biomedical informatics, vol. 81, pp. 74–82, 2018.
[22] S. Ghosh, J. Li, L. Cao, and K. Ramamohanarao, “Septic shock prediction for icu patients via coupled hmm walking on sequential contrast patterns,” Journal of biomedical informatics, vol. 66, pp. 19–31, 2017.
[23] M. T. Ribeiro, S. Singh, and C. Guestrin, “” why should i trust you?” explaining the predictions of any classifier,” in Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 2016, pp. 1135–1144.

VII Appendix

VII-A Feature Groups

¹¹1* means this feature is used for generating delta and stats values

G1 (Vital Signs)*:

•

Heart Rate (hr)
•

Diastolic Blood Pressure (dbp)
•

Mean Arterial Pressure (map)
•

Respiratory Rate (rr)
•

Temperature (temp)
•

FIO2 (fio2)

G2 (Static Profile):

•

Age
•

Sex (male or female)
•

Transfer from another hospital (transfer)
•

Mechanism (Injury Mechanism)
•

Head Injury
•

ED Systolic Blood Pressure (Initial.ED.SBPCat)
•

Reverse Shock Index (rSICat)
•

Max Base Deficit Cat (baseDef48Cat)
•

Lactate Category (lactate48Cat)
•

Red blood cell count (rbc48)
•

Crystalloids count (crys48Cat)
•

Apache
•

Antibiotics 48 (abx48)
•

Surgery 48 (surg48)
•

ER Disposition (er dispCat)

G3 (Cumulative Exposures):

•

IV Fluid BOLUS (bolusSum) *
•

Red Blood Cell Count (RBCsum)
•

Days in Ventilator (ventDaySum)
•

Days in Surgery (surgSum)
•

Hours in Surgery (surgHours)

G4 (Laboratory Result):

•

Bicarbonate (bicarb) *
•

Acidosis (acidosisCat)
•

Strong Ion (StrongIon) *
•

Blood Urea Nitrogen (bun) *
•

Creatinine (creatinine) *
•

White blood cell count (wbc)
•

Urine Output (uop)