This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: LaTIM UMR 1101, Inserm, Brest, France 22institutetext: Univ Bretagne Occidentale, Brest, France
22email: mostafa.elhabibdaho@univ-brest.fr
33institutetext: Evolucare Technologies, Villers-Bretonneux, France 44institutetext: University of Electronic Science and Technology of China, Chengdu, China 55institutetext: College of Computer Science, Sichuan University, Chengdu, China 66institutetext: IMT Atlantique, Brest, France

Automated Detection of Myopic Maculopathy in MMAC 2023: Achievements in Classification, Segmentation, and Spherical Equivalent Prediction

Yihao Li 1122    Philippe Zhang 112233    Yubo Tan 44    Jing Zhang 1122    Zhihan Wang 1122    Weili Jiang 55    Pierre-Henri Conze 1166    Mathieu Lamard 1122    Gwenolé Quellec 11    Mostafa El Habib Daho (🖂) 1122
Abstract

Myopic macular degeneration is the most common complication of myopia and the primary cause of vision loss in individuals with pathological myopia. Early detection and prompt treatment are crucial in preventing vision impairment due to myopic maculopathy. This was the focus of the Myopic Maculopathy Analysis Challenge (MMAC), in which we participated. In task 1, classification of myopic maculopathy, we employed the contrastive learning framework, specifically SimCLR, to enhance classification accuracy by effectively capturing enriched features from unlabeled data. This approach not only improved the intrinsic understanding of the data but also elevated the performance of our classification model. For Task 2 (segmentation of myopic maculopathy plus lesions), we have developed independent segmentation models tailored for different lesion segmentation tasks and implemented a test-time augmentation strategy to further enhance the model’s performance. As for Task 3 (prediction of spherical equivalent), we have designed a deep regression model based on the data distribution of the dataset and employed an integration strategy to enhance the model’s prediction accuracy. The results we obtained are promising and have allowed us to position ourselves in the Top 6 of the classification task, the Top 2 of the segmentation task, and the Top 1 of the prediction task. The code is available at https://github.com/liyihao76/MMAC_LaTIM_Solution.

Keywords:
Contrast Loss Test-time Augmentation Data Distribution Ensemble Learning.

1 Introduction

Myopia is a common eye disorder that affects millions of people worldwide [11]. It can develop into high myopia, leading to visual impairment, including blindness, due to the development of different types of myopic maculopathy [13, 26]. Myopic maculopathy is especially prevalent in countries such as Japan, China, Denmark, and the United States [23, 30]. The severity of myopic maculopathy can be classified into five categories [23]: no macular lesions, tessellated fundus, diffuse chorioretinal atrophy, patchy chorioretinal atrophy, and macular atrophy. Three additional ”Plus” lesions are also defined and added to these categories: Lacquer Cracks (LC), Choroidal Neovascularization (CNV), and Fuchs Spot (FS). Early detection and treatment are essential for preventing vision loss in people with myopic maculopathy. However, the diagnosis of myopic maculopathy is limited by the time-consuming and labor-intensive process of manually inspecting images individually. Therefore, developing an effective computer-aided system for diagnosing myopic maculopathy is a promising area of research.

Deep learning (DL) methods have emerged as powerful tools in tackling challenges related to classification, segmentation, and prediction, demonstrating particular efficacy in medical imaging [20, 21, 17]. In ophthalmology, DL has catalyzed advancements in diagnosing eye diseases, including Diabetic Retinopathy (DR) [4, 8, 6] and myopia [19, 31]. These recent studies have highlighted improvements in both the precision and efficiency of these diagnoses, underscoring the potential of DL in clinical applications.

Moreover, the field has seen innovative uses of abundant unlabeled data. Recognizing the cost and effort required for data labeling, researchers have pivoted towards self-supervised learning (SSL) approaches. These methods exploit unlabeled data for model pretraining in pretext tasks, subsequently applying them to target downstream tasks. This approach has proven effective in enhancing model performance and adaptability in various contexts [33, 32].

Building upon these advancements in deep learning and the growing need for efficient, automated solutions in ophthalmology, the field has seen the inception of targeted initiatives like the Myopic Maculopathy Analysis Challenge (MMAC). The MMAC was organized to galvanize researchers worldwide to apply these innovative techniques in a focused setting. This challenge comprises three distinct tasks: (1) classification of myopic maculopathy, (2) segmentation of myopic maculopathy plus lesions, and (3) prediction of spherical equivalent, all utilizing a specially curated dataset of fundus images tailored to these tasks.

In alignment with these emerging trends and leveraging our expertise in deep learning, our team enthusiastically participated in all the tasks of this challenge. Our contributions were marked by notable achievements in each category:

  • In the classification task (6th place), we employed SimCLR, a contrastive learning method, that allowed the model to learn richer representations from the data. The integration of ensemble strategies, particularly when paired with SimCLR, further enhanced the model’s robustness.

  • In the segmentation task (2nd place), we designed and tested independent models for different lesion segmentation tasks. In addition, the Test Time Augmentation strategy we used boosted the performance of the models.

  • In regression task (1st place), we focused on the analysis of the distribution characteristics of the dataset and designed the experimental protocol according to the distribution law of the dataset so that the deep regression model can learn and reason in a targeted manner. Furthermore, incorporating the model ensemble strategy increases the accuracy of the prediction.

2 Materials and methods

2.1 Datasets

The MMAC dataset is an extensive collection of color fundus images dedicated to research on myopic maculopathy. The dataset comprises fundus images gathered from various patients diagnosed with and without myopic maculopathy.
The sizes of each split of each task are summarized in Table 1. To maintain the challenge’s integrity and fairness, both the validation and test datasets are securely held and not released to participants. The validation and testing phases are executed on the organizer’s side to ensure unbiased evaluation.

The first Task (Classification of Myopic Maculopathy) is focused on a five-category image classification. The categories are as follows:

  • Category 0: No macular lesions

  • Category 1: Tessellated fundus

  • Category 2: Diffuse chorioretinal atrophy

  • Category 3: Patchy chorioretinal atrophy

  • Category 4: Macular atrophy

The annotations were meticulously generated and reviewed manually by professional ophthalmologists. Two ophthalmologists annotated each image independently. In cases of discrepancies in labeling, a third senior ophthalmologist provided the final label. This rigorous process ensures the high quality and reliability of the dataset annotations.

The second Task (Segmentation of Myopic Maculopathy Plus Lesions) aims to segment three types of lesions:

  • Lacquer Cracks (LC)

  • Choroidal Neovascularization (CNV)

  • Fuchs Spot (FS)

An ophthalmologist first performed the lesion annotations. A second ophthalmologist refined these annotations in consultation with the first. Both these ophthalmologists have over five years of experience. A senior ophthalmologist with a decade of experience in ophthalmology reviewed and finalized the annotations.

For the third Task (Prediction of Spherical Equivalent), the true value of the spherical equivalent (SE) was ascertained using the corneal curvature computer refractometer TOPCON KR-8900. The SE was computed as:

SE=S+12CSE=S+\frac{1}{2}C (1)

where SS and CC are the spherical and cylinder diopter, respectively; these values were acquired through the computer refractometer.

Table 1: Distribution of color fundus images for different tasks.
Task Training Set Validation Set Test Set
Task1- Classification of myopic maculopathy 1143 248 915
Task2- Segmentation of myopic maculopathy
plus lesions (Lacquer Cracks)
63 12 46
Task2- Segmentation of myopic maculopathy
plus lesions (Choroidal Neovascularization)
32 7 22
Task2- Segmentation of myopic maculopathy
plus lesions (Fuchs Spot)
54 13 45
Task3- Prediction of spherical equivalent 992 205 806

2.2 Task 1: Classification of Myopic Maculopathy

Several models were trialed for the classification task. These models included Resnet (18 and 50) [9], ViT [5], and Swin [22], among others. However, optimal results were achieved using a pipeline based on contrastive learning. This approach has recently gained traction for its ability to learn expressive representations from unlabeled data [33][32]. Our implementation can be detailed as follows:

2.2.1 Pretext Task: Contrastive Learning Framework (SimCLR)

As depicted in Fig.1, our contrastive learning framework is rooted in the SimCLR architecture [3]. The essence of SimCLR is to maximize the agreement between various augmented views of the same data instance through a contrastive loss in the latent space. A detailed breakdown of the utilized augmentations is provided in Table2. The architecture was supplemented with ResNet50[9] as its backbone, known for its depth and performance prowess in image-related tasks.

Table 2: Data augmentations for Task 1 and Task 3.
Operator Parameters Probability
Flip horizontal, vertical 0.5
ShiftScaleRotate shift_limit=0.2, scale_limit=0.1, rotate_limit=45 0.5
RandomBrightnessContrast brightness_limit=0.2, contrast_limit=0.2 1.0
RandomGamma gamma_limit=(80, 120) 1.0
CoarseDropout max_height=5, min_height=1, max_width=512, min_width=51, max_holes=5 0.2
Sharpen alpha=(0.2, 0.5), lightness=(0.5, 1.0) 1.0
Blur blur_limit=3 1.0
Downscale scale_min=0.7, scale_max=0.9 1.0
Dataset Utilization:

To make the most of the available data for the pretext unsupervised task, we amalgamated datasets from both Task 1 and Task 3. This approach not only broadened our data pool but also enabled the SimCLR model to capture a diverse range of features and representations.

Training Parameters:

For training the SimCLR model, we standardized the image size to 256×256 pixels and utilized a batch size of 256. The temperature parameter was set at 0.07. Optimization was achieved using the AdamW optimizer coupled with the OneCycleLR scheduler. The learning rate was maintained at 0.001, and weight decay was set at 2.3e-05. The training persisted for a total of 2,000 epochs, ensuring ample time for convergence and representation learning.

2.2.2 Downstream Task: Fine-tuning

After the pretext task, we leveraged the learned representations for our primary objective, the classification of myopic maculopathy. For this supervised task, we fine-tuned the ResNet50 backbone extracted from the SimCLR architecture, excluding the projection head of SimCLR, which was discarded during this phase. The Task 1 dataset, accompanied by the provided labels, was employed for fine-tuning.

Checkpoint Strategy:

To optimize our model’s generalization, we employed a strategic checkpoint-saving approach. Checkpoints were saved based on various performance metrics, including Quadratic Weighted Kappa, Macro F1, and Macro Specificity. Additionally, a checkpoint capturing the average performance across these metrics was preserved. This strategy facilitated the eventual ensemble method, allowing for a harmonized prediction rooted in diverse evaluation criteria.

Test Stage:

During the testing phase, we leaned into Test Time Augmentation (TTA) to enhance the robustness of our predictions. TTA has been shown to improve the generalization of models on unseen data. In conjunction with TTA, we employed an ensemble method drawing predictions from all saved checkpoints. This approach not only diversified our prediction source but also increased the reliability and accuracy of the final results.

Refer to caption
Figure 1: Proposed pipeline for Task 1.

2.3 Task 2: Segmentation of myopic maculopathy plus lesions

Segmentation of myopic maculopathy plus lesions in MMAC2023 is intended to detect pixel-level lesions, including LC, CNV, and FS. Table 1 illustrates that the datasets required for different lesion segmentation tasks are different. As a result, it is difficult to obtain a unified segmentation model through multi-task learning in order to segment different lesions. Three independent segmentation models were used for this purpose. For the purpose of achieving optimal lesion segmentation, a data augmentation strategy has been proposed for the training of models on small datasets while backbone selection is performed. The TTA strategy was also incorporated to enhance the model’s robustness.

2.3.1 Data Split & Augmentation

As with the Diabetic Retinopathy Analysis Challenge (DRAC) [24], the segmentation tasks focus on the segmentation of lesions at the pixel level of 2D images, and there are a limited number of patients in the dataset. Following the best segmentation implementation [16] in the DRAC challenge, we used all of the training data from the challenge to train our model. The data augmentation strategy outlined in Table 3 was employed to avoid overfitting problems. The model will not encounter any original training samples due to the use of geometric transforms and pixel-wise transformations, which generate diverse input representations [16].

Table 3: Data augmentations for Task 2.
Operator Parameters Probability
Flip horizontal, vertical 0.5
ShiftScaleRotate shift_limit=0.2, scale_limit=0.1, rotate_limit=90 0.5
RandomBrightnessContrast brightness_limit=0.2, contrast_limit=0.2 1.0
RandomGamma gamma_limit=(80, 120) 1.0
Sharpen alpha=(0.2, 0.5), lightness=(0.5, 1.0) 1.0
Blur blur_limit=3 1.0
Downscale scale_min=0.7, scale_max=0.9 1.0
GridDistortion num steps=5, distort limit=0.3 0.2
CoarseDropout max_height=128, min_height=32, max_width=128, min_width=32, max_holes=3 0.2

2.3.2 Backbone Selection

During the validation phase of the challenge, we extensively tested different segmentation backbones in order to determine the optimal one. Testing was conducted using Unet++ [35], MAnet [7], Linknet [1], FPN [15], PSPNet [34], DeepLabV3+ [2] and U2U^{2}-Net [25] with the encoder of ResNet [10] or EfficientNet [28] architectures.

2.3.3 TTA

The robustness of the model was improved by using TTA during inference. Each color fundus image was rotated by 90°, 180°, and 270° and then used in conjunction with the original image as input to the model. The final inference image was obtained by averaging the four different inference results after their restoration.

2.4 Task 3: Prediction of spherical equivalent

Refer to caption
Figure 2: Proposed workflow for Task 3. Gray folds represent the training set (internal), the validation set (internal) by green folds, and the test set (internal) by yellow folds.

The prediction of spherical equivalent can assist in diagnosing the risk of myopic maculopathy associated with increased degrees of myopia [30]. Considering that there is a limited number of images in the training set and a limited number of submissions in the validation phase, we proposed the workflow shown in Figure 2. The following steps were tested for the prediction of spherical equivalent:

  1. (1)

    Backbone Selection

    To determine which backbone is most effective for the prediction task, we used five-fold cross-validation on the training set to assess various backbones’ overall performance. In order to ensure a balanced data distribution, we sorted the data according to spherical equivalent values from smallest to largest. The training set of the challenge was split into three sets using five-fold cross-validation: an internal training set (3 folds), an internal validation set (1 fold), and an internal test set (1 fold).

    Testing was performed on VGG [27], ResNet [10], DenseNet [12], EfficientNet [28], and EfficientNetV2 [29] architectures. The internal training set was used for training the model, the checkpoint was selected based on the R-Square value of the internal validation set, and the mean of the R-Square on the internal test of Split(0-4) was calculated to represent its overall performance. The list of data augmentation strategies is described in Table 2.

  2. (2)

    Model Re-training & Ensemble

    In order to maximize the use of the training set data, we resplit the training set after the backbone testing. In this step, we used one fold of the internal test set from the previous step as the internal validation set and the remaining four folds as the new internal training set. The backbone that performs well is trained using the internal training set, while the checkpoints selected from the internal validation set are used to test the results on the challenge validation set. We got different models by training with different splits, and we used model ensemble [18] to further boost the performance by averaging their outputs. The performance of TTA during inference was also tested.

2.5 Implementation Details

The challenge comprised three diverse tasks, each with a set of unique requirements and evaluation parameters. Due to the inherent differences in the nature of these tasks, specialized evaluation metrics were formulated and employed for each. For task 1, the adopted evaluation indicators are Quadratic-weighted Kappa (QWK), F1 score, and Specificity. For task 2, the Dice Similarity Coefficient (DSC) is used to indicate the degree of coincidence of lesion segmentation. Finally, for the third task, the coefficient of determination R-squared, and the Mean Absolute Error (MAE) are employed to evaluate label regression’s degree of correlation and distance.

Table 4: Implementation details used in experiments.
Implementation Task1 Task2 Task3
Preprocessing None Normalize(mean=(0,0,0), std=(1,1,1))
Input size 512 ×\times 512 pixels 800 ×\times 800 pixels
Backbone ResNet50 MAnet (Encoder: ResNet34) EfficientNet-v2 (tf_EfficientNetv2_l)
Library timm SMP timm
Pretrained weights Pretext Task Imagenet Imagenet
Loss CrossEntropyLoss DiceLoss+CrossEntropyLoss SmoothL1Loss
Optimizer AdamW
Learning rate 1e-3 (OneCycleLR scheduler) 1e-4 (w/o scheduler) 2e-4 (w/o scheduler)
Weight decay 2.3e-05 1e-2 1e-2
Augmentation see Tab. 2 see Tab. 3 see Tab. 2
Batch size 8 5 6
Epochs 200 1000 800
Train/Val split 0.8:0.2 1:0 0.8:0.2
Metric QWK, Macro F1, Macro Specificity Dice R-squared

In terms of computational infrastructure, the models and algorithms were implemented on a high-performance machine boasting 196 GB of RAM. The Nvidia GPU Tesla V100s with 32GB memory and NVIDIA A6000 with 48GB memory were employed to facilitate the intense calculations demanded by deep learning tasks. The software ecosystem was primarily built around PyTorch, a leading deep learning framework. Additional libraries like Timm (known for its efficient training routines and pre-trained models), Segmentation Models PyTorch (SMP) for the segmentation task, and Lightly for contrastive learning were incorporated to provide a robust and efficient system for the challenge’s requirements.

Table 4 provides a brief description of our operators and detailed parameters for training. Unless otherwise specified, all experiments are conducted using reported configurations and parameters.

3 Results

3.1 Task 1: Classification of myopic maculopathy

Among the backbones we tested for classifying myopic maculopathy, ResNet50 delivered the most promising results, as presented in Table6. This superior performance prompted us to delve deeper into optimizing ResNet50 further. During this optimization, we integrated SimCLR as a Pretext task. The added value of SimCLR to our pipeline was evident, as demonstrated by the improved results compared to Resnet50 without a pretext task. This suggests that SimCLR effectively captures enhanced representations from unlabeled data, thus enriching our model’s features and subsequently elevating its performance. Following this, the model was fine-tuned on the Task 1 dataset.
Further extending the model’s capabilities, we experimented with several ensemble strategies:

  • Mean: The strategy involved selecting the classifier that showcased the best mean performance across the three metrics based on the validation set during training.

  • All: This method calculated the mean of the logits outputs of each model before the Argmax operation, capturing a holistic insight from all models.

  • Majority: A majority voting approach, this strategy collated predictions based on the predominant class predicted by all classifiers.

Upon integrating ensemble strategies with ResNet50 optimized using SimCLR, we observed significant improvements in performance. The ’Majority’ ensemble strategy combined with SimCLR achieved the highest Macro_F1 score of 0.8176 and an overall score of 0.8881. Interestingly, the ’All’ strategy demonstrated the peak QWK (0.9080) and Specificity (0.9427), emphasizing its capacity to capture comprehensive insights from the models. The ensemble method ’Mean’ also displayed commendable results, with an overall score reaching 0.8781 when combined with SimCLR. It’s evident from the data that the addition of ensemble strategies, particularly in tandem with SimCLR, significantly boosts the model’s performance across various metrics.

Table 5: Task 1 validation results using different strategies
Backbone Ensemble TTA Pretext QWK Macro_F1 Specificity score
Rexnet200 0.7524 0.5873 0.9126 0.7508
Swin 0.8159 0.6449 0.9155 0.7921
Resnet18 0.8721 0.6857 0.9236 0.8271
Resnet50 0.8845 0.7491 0.9315 0.8550
Resnet50 Mean SimCLR 0.9030 0.7926 0.9388 0.8781
Resnet50 Majority SimCLR 0.9067 0.8176 0.9400 0.8881
Resnet50 All SimCLR 0.9080 0.7954 0.9427 0.8821
Resnet50 Majority SimCLR 0.9028 0.8049 0.9385 0.8821

As requested by the organizers, we submitted our four best-performing versions of our solutions for the final test, with these models being evaluated on a new, unseen dataset. Referring to Table6, we notice a slight decline in scores from validation to test phases, a common phenomenon due to the nuances of real-world data that may not be entirely captured in the validation set. The highest test scores were achieved using ResNet50 with the Majority ensemble method and TTA. This consistency from validation to test suggests that our methods are robust and not merely overfitting to the validation set.

Table 6: The performance of the different solution versions in Task 1 during the validation and testing phases
Phase Ver. Backbone Ensemble TTA Pretext QWK Macro_F1 Specificity score
Validation (1) Resnet50 Majority SimCLR 0.9067 0.8176 0.9400 0.8881
(2) Resnet50 Majority SimCLR 0.9028 0.8049 0.9385 0.8821
(3) Resnet50 All SimCLR 0.9080 0.7954 0.9427 0.8821
(4) Resnet50 Mean SimCLR 0.9030 0.7926 0.9388 0.8781
Test (1) Resnet50 Majority SimCLR 0.8811 0.7071 0.9373 0.8419
(2) Resnet50 Majority SimCLR 0.8858 0.7081 0.9396 0.8445
(3) Resnet50 All SimCLR 0.8856 0.7044 0.9409 0.8437
(4) Resnet50 Mean SimCLR 0.8677 0.6942 0.9370 0.8330

3.2 Task 2: Segmentation of myopic maculopathy plus lesions

Table 7: Results of Task 2 backbone selection on the validation set.
Backbone Encoder LC DSC CNV DSC FS DSC Avg DSC
UNet++ EfficientNet-b0 0.7030 0.5458 0.7741 0.6743
UNet++ EfficientNet-b1 0.6748 0.5913 0.7866 0.6842
UNet++ EfficientNet-b2 0.7081 0.5990 0.7881 0.6984
UNet++ EfficientNet-b3 0.7158 0.5516 0.7393 0.6689
UNet++ EfficientNet-b4 0.7087 0.6257 0.7123 0.6823
UNet++ EfficientNet-b5 0.7051 0.5890 0.7956 0.6966
UNet++ EfficientNet-b6 0.7203 0.5148 0.7940 0.6764
UNet++ EfficientNet-b7 0.6829 0.5895 0.8068 0.6931
UNet++ ResNet34 0.7303 0.6339 0.8167 0.7270
UNet++ ResNet50 0.7216 0.4165 0.8068 0.6483
UNet++ ResNet101 0.7046 0.6064 0.7685 0.6932
UNet++ ResNet152 0.7055 0.5331 0.8306 0.6897
DeepLabV3+ ResNet34 0.6986 0.6351 0.8347 0.7228
FPN ResNet34 0.6969 0.5891 0.7769 0.6877
Linknet ResNet34 0.7304 0.5385 0.7974 0.6888
PSPNet ResNet34 0.7065 0.3974 0.7908 0.6316
MAnet ResNet34 0.7573 0.6885 0.8498 0.7652
U2U^{2}-Net - 0.7651 0.3877 0.7717 0.6415

During the validation phase of the MMAC Challenge, we tested the performance of different backbones and encoders on the validation set, as shown in Table 7. Based on the UNet++ structure, we compared the performance of ResNet and EfficientNet encoders. The ResNet34 encoder performed the best among them, and we used it to test the performance of other architectures. The segmentation of LC lesions using U2U^{2}-Net was found to be the most accurate, followed by segmentation using MAnet. MAnet demonstrated the best performance for segmenting both CNV lesions and FS lesions.

Table 8: The performance of the different solution versions in Task 2 during the validation and testing phases.
Phase Ver. Model LC Model CNV Model FS TTA LC DSC CNV DSC FS DSC Avg DSC
Validation (1) U2U^{2}-Net MAnet MAnet 0.7651 0.6885 0.8498 0.7678
(2) U2U^{2}-Net MAnet MAnet 0.7367 0.6563 0.8024 0.7318
(3) MAnet MAnet MAnet 0.7573 0.6885 0.8498 0.7652
(4) MAnet MAnet MAnet 0.7563 0.6563 0.8024 0.7383
Test (1) U2U^{2}-Net MAnet MAnet 0.6403 0.6250 0.8215 0.6956
(2) U2U^{2}-Net MAnet MAnet 0.6682 0.6557 0.8348 0.7196
(3) MAnet MAnet MAnet 0.6658 0.6250 0.8215 0.7041
(4) MAnet MAnet MAnet 0.6838 0.6557 0.8348 0.7248

Based on the results of the backbone tests, we selected the four best-performing versions as the solutions submitted in the testing phase of the challenge, as shown in Table 8. The Ver. (1) combines the excellent performance of U2U^{2}-Net and MAnet on different segmentation tasks, thus performing well during the validation phase. Unfortunately, U2U^{2}-Net suffers from the overfitting problem due to the limited number of patients in datasets and therefore, performs poorly in the testing phase. Additionally, the tests indicate that the TTA approach significantly improves the robustness of the model and boosts the segmentation performance of the different submission versions in the testing phase. As a result, we found that the MAnet-based model with TTA strategy submission (Ver. (4)) performed optimally, with dice of 0.6838 for LC lesions segmentation, 0.6557 for CNV lesions segmentation, and 0.8348 for FS lesions segmentation on the test set. Figure 3 illustrates the performance of our MAnet-based model on different lesion segmentation tasks. Our model demonstrates proficiency in segmenting smaller lesions, as depicted in Figure 3(a), but it shows limitations in accurately segmenting larger or more complex lesions, as observed in Figure 3(b).

Refer to caption
Figure 3: Segmentation performance of MAnet on the validation set of Task 2.

3.3 Task 3: Prediction of spherical equivalent

Table 9: R-Squared results of Task 3 with backbone selection on the internal test set.
Backbone(Timm) Split0 Split1 Split2 Split3 Split4 Avg.
vgg11 0.6773 0.7179 0.6808 0.6663 0.6241 0.6733
vgg16 0.5915 0.6507 0.6821 0.7699 0.6753 0.6739
resnet50 0.7061 0.7469 0.7328 0.6976 0.6843 0.7135
resnet152 0.7079 0.6972 0.6768 0.7112 0.7048 0.6996
resnet200d 0.7761 0.7675 0.7300 0.7494 0.7274 0.7501
densenet121 0.7077 0.7289 0.6821 0.7303 0.7057 0.7109
densenet161 0.7543 0.6865 0.7094 0.7449 0.7337 0.7258
densenet169 0.7416 0.7367 0.7166 0.7285 0.7410 0.7329
densenet201 0.7405 0.7347 0.7094 0.7172 0.762 0.7328
efficientnet_b0 0.7318 0.7446 0.7129 0.7384 0.7472 0.7350
efficientnet_b1 0.7108 0.7136 0.7084 0.7098 0.7535 0.7192
efficientnet_b2 0.7282 0.6952 0.7027 0.7554 0.7486 0.7260
tf_efficientnet_b6 0.7443 0.7918 0.7098 0.8264 0.7272 0.7599
tf_efficientnet_b7 0.7728 0.7954 0.7453 0.7868 0.7787 0.7758
tf_efficientnet_b8 0.7980 0.8043 0.8170 0.8581 0.7686 0.8092
tf_efficientnetv2_s 0.7765 0.7746 0.7206 0.8069 0.7520 0.7661
tf_efficientnetv2_l 0.8147 0.8374 0.7801 0.8226 0.8336 0.8177
tf_efficientnetv2_xl 0.8115 0.8354 0.8005 0.8006 0.8166 0.8129

In order to evaluate the overall performance of different backbones on the internal test set, we first tested our proposed five-fold cross-validation method as shown in Table 9. In light of the mean values of R-Squared across different Split internal test sets, we selected three backbones that provided a better performance: tf_efficientnetv2_l, tf_efficientnetv2_xl, and tf_efficientnet_b8. As the organizer’s Python Packages currently do not support the current implementation of tf_efficientnetv2_xl, we chose to use tf_efficientnetv2_l and tf_efficientnet_b8 as backbones.

The dataset was resplit and then tf_efficientnetv2_l and tf_efficientnet_b8 were trained. As shown in Table 10, some models that performed well on the new internal validation set were evaluated on the validation set of the competition. Based on the results, it can be seen that the tf_efficientnet_b8 model obtained by Split’1 training and the tf_efficientnetv2_l model obtained by Split’2 / Split’3 / Split’4 perform well on the validation set. These models are then ensembled. Our model ensemble methods performed well in both the validation and testing phases, improving prediction accuracy without overfitting. As a result of ensemble tf_efficientnetv2_l (Split’ 3) and tf_efficientnetv2_l (Split’ 4), the solution Ver. (3) obtained an R2 value of 0.8735 and an MAE value of 0.7080 on the test set.

Table 10: The performance of the different solution versions in Task 3 during the validation and testing phases.
Phase Ver. Ensemble Model(Split) TTA R-Squared MAE
Validation - tf_efficientnet_b8(Split’ 1) 0.8230 0.6818
- tf_efficientnetv2_l(Split’ 2) 0.8526 0.6723
(1) tf_efficientnetv2_l(Split’ 3) 0.8622 0.6299
- tf_efficientnetv2_l(Split’ 3) 0.8617 0.6307
- tf_efficientnetv2_l(Split’ 4) 0.8539 0.6570
(2) tf_efficientnet_b8(Split’ 1)+ tf_efficientnetv2_l(Split’ 3) 0.8669 0.6254
(3) tf_efficientnetv2_l(Split’ 3)+ tf_efficientnetv2_l(Split’ 4) 0.8734 0.6073
- tf_efficientnetv2_l(Split’ 3)+ tf_efficientnetv2_l(Split’ 4) 0.8705 0.6109
(4) tf_efficientnetv2_l(Split’ 2)+ tf_efficientnetv2_l(Split’ 3)+ tf_efficientnetv2_l(Split’ 4) 0.8745 0.6075
Test (1) tf_efficientnetv2_l(Split’ 3) 0.8507 0.7627
(2) tf_efficientnet_b8(Split’ 1)+ tf_efficientnetv2_l(Split’ 3) 0.8714 0.7258
(3) tf_efficientnetv2_l(Split’ 3)+ tf_efficientnetv2_l(Split’ 4) 0.8735 0.7080
(4) tf_efficientnetv2_l(Split’ 2)+ tf_efficientnetv2_l(Split’ 3)+ tf_efficientnetv2_l(Split’ 4) 0.8732 0.7041

4 Discussion and conclusions

In this work, we presented our solutions for three tasks in the MMAC Challenge. For Task 1, ResNet50 emerged as the backbone of choice for classification. Its performance was substantially amplified upon incorporating SimCLR, a pretext task that effectively harnessed unlabeled data to enrich model representations. A deeper exploration of ensemble strategies, particularly the ’Majority’ and ’All’ methods, revealed significant performance boosts. These findings were validated on an unseen dataset, where our models demonstrated robustness, with the ResNet50 combined with the Majority ensemble method and TTA showcasing impressive consistency. For the final test rank in this task, our model secured the 8th position.

Segmentation of myopic maculopathy plus lesions (Task 2) provided its own set of challenges. The UNet++ structure fortified with ResNet34 encoder showcased promising results. However, a key revelation was the susceptibility of the U2U^{2}-Net model to overfitting, especially with limited datasets. Despite this setback, the MAnet-based model, augmented with the TTA strategy, emerged triumphant, achieving stellar dice scores for various lesion segmentations. This excellence in segmentation led our model to be ranked 2nd in the challenge.

The prediction of the spherical equivalent (Task 3) pivoted on our utilization of the tf_efficientnetv2_l and tf_efficientnet_b8 backbones, with the latter especially excelling in the validation phase. Embracing model ensembling further elevated our performance metrics. Notably, the ensemble of tf_efficientnetv2_l models derived from specific data splits yielded exceptional results on the test dataset, catapulting us to the top position for this task.

It should be noted that more models deserve further testing. As part of Task 3, we identified some backbone network architectures that performed well. However, we were unable to complete the tests due to limitations in the Python Packages provided by the organizer. Furthermore, nnUNet [14] is a common segmentation solution used in medical competitions. In light of the positive results achieved by nnUNet in the DRAC Challenge [20], it raises the possibility that nnUNet can also perform well in the MMAC Challenge.

Acknowledgements

The work was conducted in the framework of the ANR RHU project Evired. This work benefited from state aid managed by the French National Research Agency under the “Investissement d’Avenir” program, reference ANR-18-RHUS-0008.

References

  • [1] Chaurasia, A., Culurciello, E.: Linknet: Exploiting encoder representations for efficient semantic segmentation. In: 2017 IEEE visual communications and image processing (VCIP). pp. 1–4. IEEE (2017)
  • [2] Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV). pp. 801–818 (2018)
  • [3] Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: Proceedings of the 37th International Conference on Machine Learning. ICML’20, JMLR.org (2020)
  • [4] Dai, L., Wu, L., Li, H., Cai, C., Wu, Q., Kong, H., Liu, R., Wang, X., Hou, X., Liu, Y., et al.: A deep learning system for detecting diabetic retinopathy across the disease spectrum. Nature communications 12(1),  3242 (2021)
  • [5] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. CoRR abs/2010.11929 (2020), https://arxiv.org/abs/2010.11929
  • [6] El Habib Daho, M., Li, Y., Zeghlache, R., Atse, Y.C., Le Boité, H., Bonnin, S., Cosette, D., Deman, P., Borderie, L., Lepicard, C., Tadayoni, R., Cochener, B., Conze, P.H., Lamard, M., Quellec, G.: Improved automatic diabetic retinopathy severity classification using deep multimodal fusion of uwf-cfp and octa images. In: Antony, B., Chen, H., Fang, H., Fu, H., Lee, C.S., Zheng, Y. (eds.) Ophthalmic Medical Image Analysis. pp. 11–20. Springer Nature Switzerland, Cham (2023)
  • [7] Fan, T., Wang, G., Li, Y., Wang, H.: Ma-net: A multi-scale attention network for liver and tumor segmentation. IEEE Access 8, 179656–179665 (2020)
  • [8] Gwenolé, Q., Hassan, A.H., Mathieu, L., Pierre-Henri, C., Pascale, M., Béatrice, C.: Explain: Explanatory artificial intelligence for diabetic retinopathy diagnosis. Medical Image Analysis 72 (2021), https://doi.org/10.1016/j.media.2021.102118
  • [9] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015), http://arxiv.org/abs/1512.03385
  • [10] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778 (2016)
  • [11] Holden, B.A., Fricke, T.R., Wilson, D.A., Jong, M., Naidoo, K.S., Sankaridurg, P., Wong, T.Y., Naduvilath, T.J., Resnikoff, S.: Global prevalence of myopia and high myopia and temporal trends from 2000 through 2050. Ophthalmology 123(5), 1036–1042 (2016)
  • [12] Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 4700–4708 (2017)
  • [13] Ikuno, Y.: Overview of the complications of high myopia. Retina 37(12), 2347–2351 (2017)
  • [14] Isensee, F., Jaeger, P.F., Kohl, S.A., Petersen, J., Maier-Hein, K.H.: nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods 18(2), 203–211 (2021)
  • [15] Kirillov, A., He, K., Girshick, R., Dollár, P.: A unified architecture for instance and semantic segmentation. In: CVPR (2017)
  • [16] Kwon, G., Kim, E., Kim, S., Bak, S., Kim, M., Kim, J.: Bag of tricks for developing diabetic retinopathy analysis framework to overcome data scarcity. In: MICCAI Challenge on Mitosis Domain Generalization, pp. 59–73. Springer (2022)
  • [17] Lahsaini, I., El Habib Daho, M., Chikh, M.A.: Deep transfer learning based classification model for covid-19 using chest ct-scans. Pattern Recognition Letters 152, 122–128 (2023), https://doi.org/10.1016/j.patrec.2021.08.035
  • [18] Lakshminarayanan, B., Pritzel, A., Blundell, C.: Simple and scalable predictive uncertainty estimation using deep ensembles. Advances in neural information processing systems 30 (2017)
  • [19] Li, L.F., Gilbert, Y.S.L., Carla, L., Chee Wai, W., Quan V., H., Xiu Juan, Z., Jason C., Y., Leopold, S., Audrey, C., Tien Yin, W., Daniel S. W., T., Seang-Mei, S., Marcus, A.: Deep learning system to predict the 5-year risk of high myopia using fundus imaging in children. npj Digital Medicine 6(10) (2023), https://doi.org/10.1038/s41746-023-00752-8
  • [20] Li, Y., Zeghlache, R., Brahim, I., Xu, H., Tan, Y., Conze, P.H., Lamard, M., Quellec, G., El Habib Daho, M.: Segmentation, classification, and quality assessment of uw-octa images for the diagnosis of diabetic retinopathy. In: MICCAI Challenge on Mitosis Domain Generalization, pp. 146–160. Springer (2022)
  • [21] Liu, R., Wang, X., Wu, Q., Dai, L., Fang, X., Yan, T., Son, J., Tang, S., Li, J., Gao, Z., et al.: Deepdrid: Diabetic retinopathy—grading and image quality estimation challenge. Patterns 3(6) (2022)
  • [22] Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted windows. CoRR abs/2103.14030 (2021), https://arxiv.org/abs/2103.14030
  • [23] Ohno-Matsui, K., Kawasaki, R., Jonas, J.B., Cheung, C.M.G., Saw, S.M., Verhoeven, V.J., Klaver, C.C., Moriyama, M., Shinohara, K., Kawasaki, Y., et al.: International photographic classification and grading system for myopic maculopathy. American journal of ophthalmology 159(5), 877–883 (2015)
  • [24] Qian, B., Chen, H., Wang, X., Che, H., Kwon, G., Kim, J., Choi, S., Shin, S., Krause, F., Unterdechler, M., et al.: Drac: Diabetic retinopathy analysis challenge with ultra-wide optical coherence tomography angiography images. arXiv preprint arXiv:2304.02389 (2023)
  • [25] Qin, X., Zhang, Z., Huang, C., Dehghan, M., Zaiane, O.R., Jagersand, M.: U2-net: Going deeper with nested u-structure for salient object detection. Pattern recognition 106, 107404 (2020)
  • [26] Silva, R.: Myopic maculopathy: a review. Ophthalmologica 228(4), 197–213 (2012)
  • [27] Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  • [28] Tan, M., Le, Q.: Efficientnet: Rethinking model scaling for convolutional neural networks. In: International conference on machine learning. pp. 6105–6114. PMLR (2019)
  • [29] Tan, M., Le, Q.: Efficientnetv2: Smaller models and faster training. In: International conference on machine learning. pp. 10096–10106. PMLR (2021)
  • [30] Yokoi, T., Ohno-Matsui, K.: Diagnosis and treatment of myopic maculopathy. The Asia-Pacific Journal of Ophthalmology 7(6), 415–421 (2018)
  • [31] Yue, Z., Yilin, L., Jing, L., Jianing, W., Hui, L., Jinrong, Z., Xiaobing, Y.: Performances of artificial intelligence in detecting pathologic myopia: a systematic review and meta-analysis. Eye (2023), https://doi.org/10.1038/s41433-023-02551-7
  • [32] Zeghlache, R., Conze, P.H., El Habib Daho, M., Li, Y., Boité, H.L., Tadayoni, R., Massin, P., Cochener, B., Brahim, I., Quellec, G., Lamard, M.: Longitudinal self-supervised learning using neural ordinary differential equation. In: Rekik, I., Adeli, E., Park, S.H., Cintas, C., Zamzmi, G. (eds.) Predictive Intelligence in Medicine. pp. 1–13. Springer Nature Switzerland, Cham (2023)
  • [33] Zeghlache, R., Conze, P.H., El Habib Daho, M., Tadayoni, R., Massin, P., Cochener, B., Quellec, G., Lamard, M.: Detection of diabetic retinopathy using longitudinal self-supervised learning. In: Antony, B., Fu, H., Lee, C.S., MacGillivray, T., Xu, Y., Zheng, Y. (eds.) Ophthalmic Medical Image Analysis. pp. 43–52. Springer International Publishing, Cham (2022)
  • [34] Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2881–2890 (2017)
  • [35] Zhou, Z., Rahman Siddiquee, M.M., Tajbakhsh, N., Liang, J.: Unet++: A nested u-net architecture for medical image segmentation. In: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 20, 2018, Proceedings 4. pp. 3–11. Springer (2018)