¹¹institutetext: School of Computer Science, University of Sydney, NSW, Australia²²institutetext: Med-X Research Institute, Shanghai Jiao Tong University, Shanghai, China

Automatic Tumor Segmentation via False Positive Reduction Network for Whole-Body Multi-Modality PET/CT Images

Yige Peng 11 Jinman Kim 11 Dagan Feng 1122 Lei Bi 11

Abstract

Multi-modality Fluorodeoxyglucose (FDG) positron emission tomography / computed tomography (PET/CT) has been routinely used in the assessment of common cancers, such as lung cancer, lymphoma, and melanoma. This is mainly attributed to the fact that PET/CT combines the high sensitivity for tumor detection of PET and anatomical information from CT. In PET/CT image assessment, automatic tumor segmentation is an important step, and in recent years, deep learning based methods have become the state-of-the-art. Unfortunately, existing methods tend to over-segment the tumor regions and include regions such as the normal high uptake organs, inflammation, and other infections. In this study, we introduce a false positive reduction network to overcome this limitation. We firstly introduced a self-supervised pre-trained global segmentation module to coarsely delineate the candidate tumor regions using a self-supervised pre-trained encoder. The candidate tumor regions were then refined by removing false positives via a local refinement module. Our experiments with the MICCAI 2022 Automated Lesion Segmentation in Whole-Body FDG-PET/CT (AutoPET) challenge dataset showed that our method achieved a dice score of 0.9324 with the preliminary testing data and was ranked 1st place in dice on the leaderboard. Our method was also ranked in the top 7 methods on the final testing data, the final ranking will be announced during the 2022 MICCAI AutoPET workshop. Our code is available at: https://github.com/YigePeng/AutoPET_False_Positive_Reduction.

Keywords:

Automatic Tumor Segmentation PET/CTDeep Learning

1 Introduction

Multi-modality Fluorodeoxyglucose (FDG) positron emission tomography / computed tomography (PET/CT) is regarded as the imaging modality of choice for the diagnosis, staging, and treatment response observation of many cancers, such as lung cancer, lymphoma, and melanoma [6]. This is attributed to the fact that PET/CT combines the high sensitivity of PET in detecting regions of abnormal function and the specificity of CT in depicting the underlying anatomy of where the abnormal functions are occurring [14]. Automatic tumor segmentation is an important prerequisite for quantitative PET/CT image analysis which enables tumor characterization, oncologic staging, and image-based therapy response assessment. Deep learning based technologies have made great progress in automatic medical image analysis [13], and are regarded as the state-of-the-art in PET/CT tumor segmentation [3]. However, it remains challenging to get accurate tumor segmentation results. This is mainly attributed to the fact that tumors across different patients can have large variations in spatial locations, texture, shape, and appearance information. In addition, multiple tumors may present next to the normal high uptake regions, such that existing automatic segmentation methods tend to over-segment the tumor regions and include regions such as the normal high uptake organs, inflammation, and other infections as part of tumor regions. Therefore, a tumor segmentation model that can precisely delineate the tumor area and avoids false positive annotations is highly desirable for PET/CT image analysis.

The MICCAI 2022 Automated Lesion Segmentation in Whole-Body FDG-PET/CT (AutoPET) challenge provides a large training dataset to promote research on machine learning-based automatic tumor lesion segmentation [5]. There are two specific requirements for the tumor lesion segmentation task: (1) accurate and fast lesion segmentation; (2) avoidance of false positive segmentations (e.g., brain, bladder, etc.).

To address the tasks for the AutoPET challenge, we propose a false positive reduction network that accurately delineates the tumor regions in whole-body PET/CT images. We first introduce a self-supervised pre-trained global segmentation module to coarsely delineate the candidate tumor regions, then the candidate tumor regions are then refined by removing false positives via a local refinement module.

2 Method

2.1 Materials

AutoPET consists of a training dataset of 1,014 PET/CT scans derived from 900 patients acquired at the University Hospital Tübingen, Germany [4]. All images are in NifTI format. There are 513 scans without lesions, and 188, 168, and 145 scans are histologically proven with malignant melanoma, lung cancer, and lymphoma, respectively. In addition, all patients have clinical reports including cancer diagnosis, sex, and age. A separate test dataset was not released to the public and was only used for the evaluation. The test dataset has a preliminary test set of 5 studies for self-evaluation and a final test set of 200 studies for final ranking. The preliminary test set was part of the final test set where 100 studies were from the same hospital as the training database (University Hospital Tübingen) and the other 100 scans were acquired from the University Hospital of the LMU in Munich with a similar acquisition protocol. The tumor regions for all the training and testing datasets were annotated by two radiologists with more than 5 years of experience in Hybrid Imaging and experience in machine learning research. In this study, we did not use any external dataset to build and train our model.

2.2 Data Pre-processing

Multiple pre-processing steps were applied. Firstly, to compress the usage of GPU memory, all the PET/CT image volumes were cropped into a patch size of 224×224 in the axial plane. Then the images were set to the SUV range of [0, 14.25] for PET and HU range of [-800, 400] for CT, and further mapped to [0, 1] via min-max normalization. Finally, for PET images, the input slices were normalized with the mean and standard deviation values of the entire training dataset, such that to adjust all the regions of interest (ROIs) to a notionally common scale based on the metabolism intensity of tumor regions. For the CT images, the input slices were normalized with the mean and standard deviation values of the individual patient.

Refer to caption — Figure 1: The overview of our false positive reduction network. The arrows in different colors indicate different steps which are taken sequentially. The self-supervised pre-training improves the representation ability of tumor regions in whole-body PET/CT images; this is followed by the global segmentation which uses the pre-trained ResNet50 encoder to coarsely delineate the candidate regions. Afterward, the local refinement module removes the false positive regions using the output of the global segmentation module that is concatenated with the paired PET/CT images as input.

2.3 False Reduction Network

Our false positive reduction network consists of two main modules, as shown in Fig. 1: a global segmentation module and a local refinement module.

Specifically, a ResNet50 [8] encoder is firstly pre-trained on the training data (tumor presented) via contrastive learning [2] using concatenated 3-channel PET/CT images as the input. Two channels of the input images are set to be PET while the rest of 1 channel is assigned to CT. Then, a U-Net based decoder combined in together with the pre-trained encoder forms the global segmentation module, where a combined loss $\operatorname{Loss}_{GSM}$ is used for training. $\operatorname{Loss}_{GSM}$ consists of a dice loss [11] and a cross-entropy loss [1], which is defined as:

\operatorname{Loss}_{GSM}=-\frac{2\sum_{i}^{N}p_{i}g_{i}}{\sum_{i}^{N}p_{i}^{2}+\sum_{i}^{N}g_{i}^{2}}-\frac{1}{N}\sum_{i=1}^{N}\left[g_{i}\log p_{i}+\left(1-g_{i}\right)\log\left(1-p_{i}\right)\right]

(1)

where $p_{i}\in\left[0,1\right]$ is pixels of the predicted tumor probability map, $g_{i}\in\left[0,1\right]$ is the pixels of the ground truth tumor mask (label), and the sums run over all available N pixels of the segmentation.

The global segmentation module can coarsely annotate the tumor lesion regions with a probability map at a threshold of 0.5. Both the probability map and the binary segmentation prediction (thresholded results) are further concatenated with the corresponding input PET/CT images, which are fed into the local refinement module.

The backbone of the local refinement module is a 2D U-Net [15] with 5-channel imaging data as input (i.e., paired PET/CT images, global tumor probability map, and global binary segmentation prediction). A pixel-wise mean squared error (MSE) loss and a cross-entropy loss [1] are combined as $\operatorname{Loss}_{LRM}$ and used to compare the predicted segmentation outputs with the tumor ground-truth mask such that the false positive regions can be removed from the global segmentation with:

\operatorname{Loss}_{LRM}=\frac{1}{N}\sum_{i=1}^{N}\left\{\left(g_{i}-p_{i}\right)^{2}-\left[g_{i}\log p_{i}+\left(1-g_{i}\right)\log\left(1-p_{i}\right)\right]\right\}

(2)

2.4 Implementation Details

Our method was implemented with PyTorch [12] using one NVIDIA GeForce GTX 2080Ti GPU. Our model was initialized using the approach presented by He et al [7], and an adaptive-moment estimation with decoupled weight decay (AdamW) [10] was used for network optimization. During the training phase, the batch size was set to 8 and the learning rate was set to 0.0001 using a cosine annealing schedule. Data augmentation techniques were in real-time to avoid overfitting. The used data augmentation techniques are random rotation (90°, 180°, or 270°) in the axial axis and randomly flipping in one of the two axes (sagittal and coronal).

Furthermore, we only used PET/CT slices with tumors for the global segmentation module, while an equal number of PET/CT slices without tumors were sampled and added to the training of the local refinement module. All the training was terminated when no further change is in the total loss. In our method, the total loss was generally stable after 160 epochs. Our results on the testing set were obtained using an ensemble of four models trained on different splits of the training set. Three models were built using 3-fold cross-validation, the other one was built from the 3D full resolution nnU-Net [9]. For the segmentation output, the testing prediction was a weighted sum between the nnU-Net output and the direct average result of the 3 cross-validation models, where the weights are empirically set to 0.35 and 0.65 respectively.

3 Results

Three different metrics are used for tumor segmentation evaluation, they are: (1) foreground Dice score of segmented tumors; (2) volume of false positives that do not overlap with positives (=false positive volume, FPV); and (3) volume of positive connected components in the ground truth that do not overlap with the estimated segmentation (=false negative volume, FNV).

For the test set evaluation (inaccessible to the participants), all three metrics are considered for non-healthy patients (tumors presented), whereas only FPV is considered for healthy cases (no tumors presented). The ﬁnal leader board position is based on the ranking across all three metrics with the weights of (0.5, 0.25, 0.25).

Table 1: The segmentation results for the preliminary testing dataset

Methods	Dice $\uparrow$	FPV $\downarrow$	FNC $\downarrow$
Global Segmentation Module	0.9228	0.9555	1.7865
Local Refinement Module	0.9271	0.8360	1.7865
Ensemble with nnU-Net	0.9324	0.7763	1.5676

•

The bold numbers represent the best results.
•

The arrows next to the evaluation metrics indicate how to assess the performance, $\uparrow$ means that the larger the number is, the better the performance is, and vice versa.

Our results in the preliminary testing dataset are presented in Table 1. With the global segmentation module only, our model achieved a Dice score of 0.9228, an FPV of 0.9555, and an FNV of 1.7865. Then, the inclusion of the local refinement module effectively removed the false positive regions, such that improved the FPV by a large margin of 12.5% while the FNV was consistent with a minor improvement in dice score. The false positive reduction network ensembled with nnU-Net obtained overall better performance with a dice score, FPV, and FNV of 0.9324, 0.7763, and 1.5676 respectively, which was ranked 1st place in dice and 2nd place across all metrics on the leaderboard.

4 Conclusion

We introduced a false positive reduction network for the MICCAI 2022 AutoPET challenge. Our proposed method consists of two modules, namely the global segmentation module and the local refinement module. Within the global segmentation module, a self-supervised pre-trained encoder was designed to coarsely delineate the candidate tumor regions with a U-Net decoder, then the candidate tumor regions were refined by removing false positive segmentations via the local refinement modules. Our method achieved the highest dice score of 0.9324 with the preliminary test data on the leaderboard.

References

[1] Bishop, C.M., Nasrabadi, N.M.: Pattern recognition and machine learning, vol. 4. Springer (2006)
[2] Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., Joulin, A.: Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 9650–9660 (2021)
[3] Fu, X., Bi, L., Kumar, A., Fulham, M., Kim, J.: Multimodal spatial attention module for targeting multimodal pet-ct lung tumor segmentation. IEEE Journal of Biomedical and Health Informatics 25(9), 3507–3516 (2021)
[4] Gatidis, S., Kuestner, T.: A whole-body FDG-PET/CT dataset with manually annotated tumor lesions (2022). https://doi.org/10.7937/GKR0-XV29, https://wiki.cancerimagingarchive.net/x/LwKPBQ
[5] Gatidis, S., Küstner, T., Ingrisch, M., Fabritius, M., Cyran, C.: Automated lesion segmentation in whole-body FDG-PET/CT (2022). https://doi.org/10.5281/ZENODO.6362493, https://zenodo.org/record/6362493
[6] Hatt, M., Tixier, F., Pierce, L., Kinahan, P.E., Le Rest, C.C., Visvikis, D.: Characterization of PET/CT images using texture analysis: the past, the present… any future? European journal of nuclear medicine and molecular imaging 44(1), 151–165 (2017), publisher: Springer
[7] He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision. pp. 1026–1034 (2015)
[8] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778 (2016)
[9] Isensee, F., Jaeger, P.F., Kohl, S.A., Petersen, J., Maier-Hein, K.H.: nnU-net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods 18(2), 203–211 (2021), publisher: Nature Publishing Group
[10] Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: International Conference on Learning Representations (2018)
[11] Milletari, F., Navab, N., Ahmadi, S.A.: V-net: Fully convolutional neural networks for volumetric medical image segmentation. In: 2016 fourth international conference on 3D vision (3DV). pp. 565–571. IEEE (2016)
[12] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017)
[13] Peng, Y., Bi, L., Fulham, M., Feng, D., Kim, J.: Multi-modality information fusion for radiomics-based neural architecture search. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 763–771. Springer (2020)
[14] Peng, Y., Bi, L., Kumar, A., Fulham, M., Feng, D., Kim, J.: Predicting distant metastases in soft-tissue sarcomas from pet-ct scans using constrained hierarchical multi-modality feature learning. Physics in Medicine & Biology 66(24), 245004 (2021)
[15] Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and computer-assisted intervention. pp. 234–241. Springer (2015)