¹¹institutetext: ¹School of Software and Microelectronics, Peking University, Beijing, China
²Tencent Cloud Media, Shenzhen, China
Our Code: https://github.com/xiely-123/SHISRCNet

SHISRCNet: Super-resolution And Classification Network For Low-resolution Breast Cancer Histopathology Image

Luyuan Xie 11 Cong Li 11 Zirui Wang 22 Xin Zhang 11 Boyan Chen 11 Qingni Shen Corresponding author: qingnishen@ss.pku.edu.cn11 Zhonghai Wu Corresponding author: wuzh@pku.edu.cn11

Abstract

The rapid identification and accurate diagnosis of breast cancer, known as the killer of women, have become greatly significant for those patients. Numerous breast cancer histopathological image classification methods have been proposed. But they still suffer from two problems. (1) These methods can only hand high-resolution (HR) images. However, the low-resolution (LR) images are often collected by the digital slide scanner with limited hardware conditions. Compared with HR images, LR images often lose some key features like texture, which deeply affects the accuracy of diagnosis. (2) The existing methods have fixed receptive fields, so they can not extract and fuse multi-scale features well for images with different magnification factors. To fill these gaps, we present a Single Histopathological Image Super-Resolution Classification network (SHISRCNet), which consists of two modules: Super-Resolution (SR) and Classification (CF) modules. SR module reconstructs LR images into SR ones. CF module extracts and fuses the multi-scale features of SR images for classification. In the training stage, we introduce HR images into the CF module to enhance SHISRCNet’s performance. Finally, through the joint training of these two modules, super-resolution and classified of LR images are integrated into our model. The experimental results demonstrate that the effects of our method are close to the SOTA methods with taking HR images as inputs.

Keywords:

breast cancer histopathological image super-resolution classification joint training.

1 Introduction

Breast cancer is one of the high-mortality cancers among women in the 21st century. Every year, 1.2 million women around the world suffer from breast cancer and about 0.5 million die of it [3]. Accurate identification of cancer types will make a correct assessment of the patient’s risk and improve the chances of survival. However, the traditional analysis method is time-consuming, as it mainly depends on the experience and skills of the doctors. Therefore, it is essential to develop computer-aided diagnosis (CADx) for assisting doctors to realize the rapid detection and classification.

Due to being collected by various devices, the resolution of histopathological images extracted may not always be high. Low-resolution (LR) images lack of lots of details, which will have an important impact on doctors’ diagnosis. Considering the improvement of histopathological images’ acquisition equipment will cost lots of money while significantly increasing patients’ expense of detection. The super-resolution (SR) algorithms that improve the resolution of LR images at a small cost can be a practical solution to assist doctors in diagnosis. At present, most single super-resolution methods only have fixed receptive fields [7, 10, 18, 11]. These models cannot capture multi-scale features and do not solve the problems caused by LR in various magnification factors well. MRC-Net [6] adopted LSTM [9] and Multi-scale Refined Context to improve the effect of reconstructing histopathological images. It considered the problem of multi-scale, but only fused two scales features. This limits its performance in the scenarios with various magnification factors. Therefore, designing an appropriate feature extraction block for SR of the histopathological images is still a challenging task.

In recent years, a series of deep learning methods have been proposed to solve the breast cancer histopathological image classification issue by the high-resolution (HR) histopathological images. [21, 22, 12] improved the specific model structure to classify breast histopathology images, which showed a significant improvement in recognition accuracy compared with the previous works [20, 1]. SSCA [24] considered the problem of multi-scale feature extraction which utilized feature pyramid network (FPN) [15] and attention mechanism to extract discriminative features from complex backgrounds. However, it only concatenates multi-scale features and does not consider the problem of feature fusion. So it is still worth to explore the potential of extraction and fusion of multi-scale features for breast images classification.

To tackle the problem of LR breast cancer histopathological images reconstruction and diagnosis, we propose the Single Histopathological Image Super-Resolution Classification network (SHISRCNet) integrating Super-Resolution (SR) and Classification (CF) modules. The main contributions of this paper can be described as follows:

(1) In the SR module, we design a new block called Multi-Features Extraction block (MFEblock) as the backbone. MFEblock adopts multi-scale receptive fields to obtain multi-scale features. In order to better fuse multi-scale features, a new fusion method named multi-scale selective fusion (MSF) is used for multi-scale features. These make MFEblock reconstruct LR images into SR images well.

Refer to caption — Figure 1: The structure of the SHISRCNet.

(2) The CF module completes the task of image classification by utilizing the SR images. Like SR module, it also needs to extract multi-scale features. The difference is that the CF module can use the method of downsampling to capture multi-scale features. So we combine the multi-scale receptive fields (SKNet) [13] with the feature pyramid network (FPN) to achieve the feature extraction of this module. In FPN, we design a cross-scale selective fusion block (CSFblock) to fuse features of different scales.

(3) Through the joint training of these two designed modules, the super-resolution and classification of low-resolution histopathological images are integrated into our model. For improving the performance of CF module and reducing the error caused by the reconstructed SR images, we introduce HR images to CF module in the training stage. The experimental results demonstrate that the effects of our method are close to those of SOTA methods that take HR breast cancer histopathological images as inputs.

2 Methods

This section describes the proposed SHISRCNet. The overall pipeline of the proposed network is shown in Fig. 1(a). It composes two modules: SR module and CF module. The SR module reconstructs the LR image into the SR image. The CF module utilize the reconstructed SR images to diagnose histopathological images. In the training stage, we introduce HR images to improve the performance of CF module and alleviate the error caused by SR images.

2.1 Super-Resolution module

To better extract and fuse multi-scale features for super-resolution, we propose a new SR network, called SRMFENet. Like SRResNet [11], SRMFENet takes a single low-resolution image as input and uses the pixelshuffle layer to get the restructured image. The difference between SRMFENet and SRResNet is that a Multi-Features Extraction block (MFEblock) is proposed to extract and fuse multi-scale histopathological images’ features. The structure of the MFEblock is shown in Fig. 1(b). The input features $X$ capture multi-scale features $Y_{i}$ through four 3 $\times$ 3 atrous convolutions [4] with different rates:

{Y_{i}}=\left\{{\begin{array}[]{*{20}{c}}{Cov3\times{3_{rate=1}}(X)}\\ {Cov3\times{3_{rate=2(i-1)}}(X+{Y_{i-1}})}\end{array}\begin{array}[]{*{20}{c}}{i=1}\\ {\quad 1<i<=n}\end{array}}\right.{\rm{}}

where $n$ is the number of atrous convolutions and is set to 4 by the experiments. This design not only preserves the depth of the network, but also increases the width of the network. It is beneficial for the network to extract shallow local texture information and global semantic information. After the feature extraction phase, a new fusion method named MSF fuses all of different scale features $Y_{i}$ . In the end, the input features $X$ are added with the fused features. The details of MSF show in the Fig. 1(c). Firstly, we conduct Global Average Pooling (GAP) [14] on the multi-scale features to obtain their average channel-wise weights. Then using Sigmoid activation function to map weight to 0 to 1. Next softmax operation normalizes the same position of the obtained multi-scale average channel-wise weights. Finally, the features are multiplied by the corresponding normalized weights and the processed features are added together to generate the new multi-scale features. MFEblock is very applicable to process histopathological images of different magnification factors, as it employs convolution and attention operations to capture local and global image context information and fuse them well.

2.2 Classification module

The task of the CF module is to classify the reconstructed SR images. It can use the method of downsampling to capture multi-scale features. So we combine the multi-scale receptive fields (SKNet as backbone network) with the FPN (a downsampling method) to achieve the feature extraction of this module. In Fig. 1(a), the multi-sacle features extracted by SKNet are the input of FPN. We propose a new fusion method, called cross-scale selective fusion block (CSFblock) to effectively fuse high-resolution and low-resolution features in FPN. After the fused features are processed by GAP, they are aggregated into a new multi-scale feature by concatenate operation. Finally, the aggregated multi-scale features are classified through the fully connected (FC) layer. The structure of CSFblock is shown in Fig. 1(d). The inputs of CSFblock are two-way inputs which are the high-resolution features $X_{h}\in W$ × $H$ × $C$ and the low-resolution features $X_{l}\in W_{1}$ × $H_{1}$ × $C$ . In CSFblock, the upsampling operations are performed on the low-resolution features $X_{l}$ to realize consistency with $X_{h}$ dimension. $X_{h}$ and restructured $X_{l}$ are fused via an element-wise summation:

U{\rm{}}={\rm{}}{X_{h}}{\rm{}}+{\rm{}}Up({X_{l}})

Then, using GAP along the channel dimension to get the global information $S$ . A FC layer generates a compact feature vector $Z$ which guides the feature selection procedure. And $Z$ is reconstructed into two weight vectors $a$ , $b$ of the same dimension as $S$ through two FC layers, which can be defined as:

Z{\rm{}}={\rm{}}\delta({W_{c}}S),\quad a{\rm{}}={\rm{}}{W_{a}}Z,\quad b{\rm{}}={\rm{}}{W_{b}}Z

Where $\delta$ denotes ReLU and $W_{a}$ , $W_{b}$ , $W_{c}$ , means the weight of the FC layers. Specifically, a softmax operator is applied on $a$ and $b$ ’s channel-wise digits:

a[i]{\rm{}}={\rm{}}\frac{{{e^{a[i]}}}}{{{e^{a[i]}}+{e^{b[i]}}}},\quad{\rm{}}b[i]{\rm{}}={\rm{}}\frac{{{e^{b[i]}}}}{{{e^{a[i]}}+{e^{b[i]}}}},\quad i\in C

The fused feature map $F$ is obtained through the attention weights on multi-scale features:

F[i]{\rm{}}={\rm{}}a[i]\times{X_{h}}[i]+b[i]\times{Up({X_{l}})}[i],\quad i\in C

2.3 Loss Function

The SR module and the CF module exploit different loss functions for training. In the SR module, $L1$ Loss is used for super-resolution. In the CF module, we introduce HR images to CF module in the training stage for improving the performance of CF module and reducing the error caused by the reconstructed SR images. We use $Focal$ Loss [16] to alleviate the class imbalanced data problem of the HR and SR images’ classification. Inspired by the contrastive learning algorithm SimCLR [5], the HR and SR of the same images are similar to two different views. So the $NT-Xent$ loss [19] is adopted to calculate similarity between SR multi-scale features and HR multi-scale features for CF module’s robustness. The total loss function can be expressed as:

{L_{total}}{\rm{}}={\rm{}}{\lambda_{1}}{L_{{L1}}}+{\rm{}}{\lambda_{2}}{L_{FL}}+{\rm{}}{\lambda_{3}}{L_{NT-Xent}}{\rm{,\quad}}{\lambda_{1}}+{\lambda_{2}}+{\lambda_{3}}=1

where $\lambda_{1}$ , $\lambda_{2}$ and $\lambda_{3}$ are the weights of $L1$ Loss, $Focal$ Loss and $NT-Xent$ Loss, respectively. In the inference stage, only SR images are taken as inputs by CF module. In our experiment, $\lambda_{1}$ , $\lambda_{2}$ and $\lambda_{3}$ are set to 0.6, 0.3 and 0.1, respectively. And the temperature parameter in $NT-Xent$ Loss is set to 0.5.

3 Experiment

Dataset: This work uses the breast cancer histopathological image database (BreaKHis) ¹¹1https://web.inf.ufpr.br/vri/databases/breast-cancer-histopathological-database-breakhis/ [20]. The images in the dataset have four magnification factors (40x, 100x, 200x, 400x) and eight breast cancer classes. This dataset includes four distinct histological types of benign breast tumors: adenosis (A), fibroadenoma (F), phyllodes tumor (PT), and tubular adenona (TA); and four malignant tumors (breast cancer): carcinoma (DC), lobular carcinoma (LC), mucinous carcinoma (MC) and papillary carcinoma (PC). The original dataset is randomly divided into training set and testing set for each magnification at a ratio of 7: 3 following previous work.

Implementation Details: For all experiments, we conduct 5-fold cross validation, and report the mean. We use LR histopathological images with size 48x48, 96x96, 192x192 as input for different single image SR tasks (x8, x4, x2) and set batch size to 8. For the corresponding LR and HR images in the training dataset, the same data augmentation is adopted, such as rotation, color jitter. The model is trained using the ADAM optimizer [25] with the learning rate set to 1x $10^{-3}$ . The learning rate is multiplied by 0.9 for every two epochs. We use SKNet-26 [13] as the backbone network in the CF module. The total training epochs are 100.

4 Results and Discussion

4.1 The results of Super-Resolution and Classification

Table 1 shows the results of the super-resolution phase. We adopt Peak Signal to Noise Ratio (PSNR) and structural similarity index (SSIM) [6] to evaluate the performance of the SR model. MRC-Net and our proposed SRMFENet (SR module) achieves better metrics than the other algorithms. This proves the effectiveness of multi-scale features extraction. Compared with MRC-Net, our MFEblocks can extract and fuse multi-scale features well. And the joint training of SRMFENet and CF module improves the performance of super-resolution. Fig. 2 demonstrates that our model can recover more details with less blurring.

Table 1: Average PSNR/SSIM for x8, x4, x2 SR.

Methods	x8		x4		x2
Methods	PSNR(db)	SSIM	PSNR(db)	SSIM	PSNR(db)	SSIM
Bicubic	20.75	0.4394	23.21	0.6305	26.60	0.9151
SRCNN	21.12	0.4872	24.04	0.6634	28.36	0.8631
WA-SRGAN	21.76	0.5141	26.20	0.7930	30.93	0.9351
EDSR	22.13	0.6063	26.22	0.8005	30.79	0.9325
MRC-Net	22.72	0.6213	26.86	0.8222	31.73	0.9433
SRMFENet (ours)	23.44	0.6325	27.21	0.8370	33.96	0.9566
SHISRCNet (ours)	24.21	0.6814	27.97	0.8413	35.61	0.9680

We compare our introduced CF module with five state-of-the-art breast cancer histopathological image models and Diagnosis Network with MRC-Net [6], as shown in Table 2. The results illustrate that the CF module reaches the best performance in four different magnification factors. This indicates the effectiveness of our proposed combination of two multi-scale feature extraction methods. SHISRCNet, which uses down sample to half resolution (x2↓) from HR images, outperforms the SSCA at 40x, 200x and 400x. And it gets results close to the CF module at all magnification factors. Meanwhile, compared with the Diagnosis Network that also uses LR images as input, SHISCNet has remarked performance advantages. Table 3 compares our results with the CF module using different resolution images. The performance of the CF module decreases significantly with the reduction of resolution. In contrast, SHISRCNet greatly improves the CF module performance of different scale low-resolution images.

Table 2: Compare results with state-of-the-art on image level (* means that inputs are HR images, ^# means that inputs are down sample to half resolution from HR images.).

Methods	Years	ACC(%)
Methods	Years	40x	100x	200x	400x
AlexNet variant* [21]	2016	85.6	83.5	82.7	80.7
Inception V3* [2]	2018	90.2	85.6	86.1	82.5
DSoPN* [12]	2020	96	96.16	98.01	95.97
FE-BkCapsNet* [22]	2021	92.71	94.52	94.03	93.54
SSCA* [24]	2022	96.93	97.32	95.31	96.24
CF module only* (ours)	—	97.82	97.78	98.28	98.15
Diagnosis Network with MRC-Net^# [6]	2021	94.43	94.45	94.73	93.92
SHISRCNet (ours)^#	—	97.49	96.19	97.60	97.04

Table 3: Comparison of accuracy under different scales on the image level.

resolution	Model	ACC(%)
resolution	Model	40x	100x	200x	400x
HR	CF module only	97.82	97.78	98.28	98.15
x2↓ LR	CF module only	94.47	89.92	92.64	91.30
x2↓ LR	SHISRCNet	97.49	96.19	97.60	97.04
x4↓ LR	CF module only	90.32	87.71	88.61	86.89
x4↓ LR	SHISRCNet	94.15	94.22	95.14	95.26
x8↓ LR	CF module only	84.11	82.32	84.62	82.23
x8↓ LR	SHISRCNet	91.47	92.43	92.98	92.78

4.2 Ablation study of the SHISRCNet

To verify the effectiveness of the proposed components in SHISRCNet, a comparison between SHISRCNet and its five components on x2↓ images is given in Table 3. (1) w/o MSF repalces MSF with concatenate operation and 1 $\times$ 1 convolution. (2) w/o FPN + CSFblock means that only SKNet is used for feature extraction in the CF module. (3) w/o CSFblock, w/o HR images and w/o NT-Xent loss remove the corresponding operation, respectively. As shown in table 3, firstly, the performance of super-resolution in the SHISRCNet is significantly reduced when we remove MSF. It indicates the importance of MSF for multi-scale feature fusion in the SR module. Secondly, only SKNet is used to extract multi-scale features in the CF module, and the accuracy decreased significantly. This again proves the effectiveness of our proposed combination of two multi-scale feature extraction methods. Thirdly, compared with using FPN alone, the performance of SHISRCNet is further improved by adding CSFblock to FPN. Finally, the introduction of HR images further promotes the performance of SHISRCNet. Because the training method of HR and SR images proposed by us helps to improve the generalization of the SHISRCNet.

Table 4: Ablation study of SHISRCNet on x2↓ images.

	PSNR(db)	SSIM	40x	100x	200x	400x
	PSNR(db)	SSIM	ACC(%)
SHISRCNet	35.61	0.9680	97.49	96.19	97.60	97.04
w/o MSF	33.13	0.9554	96.02	95.73	95.98	95.78
w/o FPN+CSFblock	34.41	0.9609	93.98	92.11	91.35	92.15
w/o CSFblock	34.57	0.9619	94.37	93.21	93.99	94.71
w/o HR images	34.43	0.9611	93.13	94.28	93.86	94.17
w/o NT-Xent loss	34.54	0.9623	95.53	95.20	95.01	95.36

5 CONCLUSION

This paper proposes SHISRCNet for the low-resolution breast cancer histopathological images’ super-resolution and classification problem. The SR module employs MFEblock to extract and fuse multi-scale features for reconstructing low-resolution histopathological images into high-resolution ones. The CF module adopts two different multi-scale features extraction methods to capture features for the breast cancer diagnosis. We introduce high-resolution images into the CF module in the training stage to improve SHISRCNet’s robustness. Through the joint training of the two modules, the super-resolution and classification of the low-resolution histopathological images are integrated in one model. Our method’s results are close to the SOTA methods, which require using high-resolution breast cancer histopathological images instead of low-resolution ones.

References

[1] Akay, M.F.: Support vector machines combined with feature selection for breast cancer diagnosis. Expert Syst. Appl. 36, 3240–3247 (2009)
[2] Benhammou, Y., Tabik, S., Achchab, B., Herrera, F.: A first study exploring the performance of the state-of-the art cnn model in the problem of breast cancer. In: proceedings of the international conference on learning and optimization algorithms: theory and applications. pp. 1–6 (2018)
[3] Bray, F., Ferlay, J., Soerjomataram, I., Siegel, R.L., Torre, L.A., Jemal, A.: Global cancer statistics 2018: Globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: a cancer journal for clinicians 68(6), 394–424 (2018)
[4] Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence 40(4), 834–848 (2017)
[5] Chen, T., Kornblith, S., Mohammad, N., Hinton, G.: A simple framework for contrastive learning of visual representations. arXiv preprint arXiv:2002.05709, (2020)
[6] Chen, Z., Guo, X., Woo, P.Y., Yuan, Y.: Super-resolution enhanced medical image diagnosis with sample affinity interaction. IEEE Transactions on Medical Imaging 40(5), 1377–1389 (2021)
[7] Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: European conference on computer vision. pp. 184–199. Springer (2014)
[8] Gandomkar, Z., Brennan, P.C., Mello-Thoms, C.: Mudern: Multi-category classification of breast histopathological image using deep residual networks. Artificial intelligence in medicine 88, 14–24 (2018)
[9] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997)
[10] Lai, W.S., Huang, J.B., Ahuja, N., Yang, M.H.: Deep laplacian pyramid networks for fast and accurate super-resolution. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 624–632 (2017)
[11] Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 4681–4690 (2017)
[12] Li, J., Zhang, J., Sun, Q., Zhang, H., Dong, J., Che, C., Zhang, Q.: Breast cancer histopathological image classification based on deep second-order pooling network. In: 2020 International Joint Conference on Neural Networks (IJCNN). pp. 1–7. IEEE (2020)
[13] Li, X., Wang, W., Hu, X., Yang, J.: Selective kernel networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 510–519 (2019)
[14] Lin, M., Chen, Q., Yan, S.: Network in network. arXiv preprint arXiv:1312.4400 (2013)
[15] Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2117–2125 (2017)
[16] Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision. pp. 2980–2988 (2017)
[17] Lim, B., Son, S., Kim, H., Nah, S., Lee, K.: Enhanced Deep Residual Networks for Single Image Super-Resolution. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2017,pp. 1132–1140. Ieee (2017)
[18] Shahidi, F.: Breast cancer histopathology image super-resolution using wide-attention gan with improved wasserstein gradient penalty and perceptual loss. IEEE Access 9, 32795–32809 (2021)
[19] Sohn, K.: Improved deep metric learning with multi-class n-pair loss objective. In: Advances in neural information processing systems. pp. 1857–-1865 (2016)
[20] Spanhol, F.A., Oliveira, L.S., Petitjean, C., Heutte, L.: A dataset for breast cancer histopathological image classification. IEEE Transactions on Biomedical Engineering 63(7), 1455–1462 (2016). \doi10.1109/TBME.2015.2496264
[21] Spanhol, F.A., Oliveira, L.S., Petitjean, C., Heutte, L.: Breast cancer histopathological image classification using convolutional neural networks. In: 2016 International Joint Conference on Neural Networks (IJCNN). pp. 2560–2567 (2016). \doi10.1109/IJCNN.2016.7727519
[22] Wang, P., Wang, J., Li, Y., Li, P., Li, L., Jiang, M.: Automatic classification of breast cancer histopathological images based on deep feature fusion and enhanced routing. Biomedical Signal Processing and Control 65, 102341 (2021)
[23] Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV). pp. 3–19 (2018)
[24] Xu, B., Zhang, W.: Selective scale cascade attention network for breast cancer histopathology image classification. In: ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 1396–1400. IEEE (2022)
[25] Zhang, Z.: Improved adam optimizer for deep neural networks. In: 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS). pp. 1–2. Ieee (2018)