Explainable AI based Glaucoma Detection using Transfer Learning and LIME

Touhidul Islam Chayan Computer Science and Engineering
BRAC University
Dhaka, Bangladesh
touhidul.islam.chayan@g.bracu.ac.bd Anita Islam Computer Science and Engineering
BRAC University
Dhaka, Bangladesh
anita.islam@g.bracu.ac.bd Eftykhar Rahman Computer Science and Engineering
BRAC University
Dhaka, Bangladesh
eftykhar.rahman@g.bracu.ac.bd Md. Tanzim Reza Computer Science and Engineering
BRAC University
Dhaka, Bangladesh
tanzim.reza@bracu.ac.bd Tasnim Sakib Apon Computer Science and Engineering
BRAC University
Dhaka, Bangladesh
sakibapon7@gmail.com MD. Golam Rabiul Alam Computer Science and Engineering
BRAC University
Dhaka, Bangladesh
rabiul.alam@bracu.ac.bd

Abstract

Glaucoma is the second driving reason for partial or complete blindness among all the visual deficiencies which mainly occurs because of excessive pressure in the eye due to anxiety or depression which damages the optic nerve and creates complications in vision. Traditional glaucoma screening is a time-consuming process that necessitates the medical professionals’ constant attention, and even so time to time due to the time constrains and pressure they fail to classify correctly that leads to wrong treatment. Numerous efforts have been made to automate the entire glaucoma classification procedure however, these existing models in general have a black box characteristics that prevents users from understanding the key reasons behind the prediction and thus medical practitioners generally can not rely on these system. In this article after comparing with various pre-trained models, we propose a transfer learning model that is able to classify Glaucoma with 94.71% accuracy. In addition, we have utilized Local Interpretable Model-Agnostic Explanations(LIME) that introduces explainability in our system. This improvement enables medical professionals obtain important and comprehensive information that aid them in making judgments. It also lessen the opacity and fragility of the traditional deep learning models.

Index Terms:

Biomedical Image Processing, Glaucoma, Blindness, Machine Learning, Convolutional Neural Network, Explainable AI.

I Introduction

Glaucoma is a very common group of eye diseases caused by damage to the optic nerve that connects the eye to the brain and if untreated, it causes permanent loss of vision which is the second most popular cause of blindness globally. It is also known as the ‘silent thief of sight’ as it cannot be detected at a very early stage [1]. Around 57.5 million people worldwide are affected by Glaucoma [2]. There are two significant kinds of glaucoma: open-angle and angle-closure. Movement of glaucoma can be halted with medicines, however, part of the vision that is now lost can’t be reestablished. This is the reason it’s vital to distinguish early indications of glaucoma with standard eye tests. Acute angle-closure glaucoma is a visual crisis and requires quick consideration through early detection. Glaucoma can be diagnosed and partial or complete blindness could be prevented if we can detect it in an early stage.

Unfortunately, not many people bother about the early detection of Glaucoma whereas it can be diagnosed early to prevent eyesight loss. For this reason, we have decided to work with the early detection of glaucoma disease and going to use Explainable AI (XAI) to classify scanned images of eyes that have glaucoma that proposes the report to the decision of Artificial Intelligence which means Deep Learning or Black Box to the extent that is human interpretable. Moreover, we intend to give an outline of ongoing distributions in regards to the utilization of man-made consciousness to improve the recognition and treatment of glaucoma. Deep Learning (DL) is a subset of Artificial Intelligence (AI) dependent on profound neural networks which have made striking leaps forwards in clinical imaging, especially for image characterization and pattern acknowledgement [3]. The main purpose of this study is to represent whether and how deep learning based measurements can be utilized for glaucoma execution in the clinic [4]. On the other hand, if the vision loss has already occurred, the treatment can delay or hinder further vision loss [5]. Open-angle glaucoma is the most common form of glaucoma and is responsible for 90% of the cases [6]. Fundus pictures can be utilized for glaucoma finding through the CDR strategy [7]. Such CNN models can work in pairs with human specialists to keep up with large eye health and assist recognition of visual deficiency causing eye sickness [8]. Our objective are as follows: (i) Automating the process for Glaucoma categorization. (ii) Provide detailed information to medical professionals against a prediction, so that they can rely on the system. (iii) Increase the efficiency of the entire process. (iv) Reduce the amount of work required by the medical personnel. Additionally, we encountered a number of challenges or obstacles while performing our study, including dealing with the tendency of models to overfit data, computing resources, etc.

The significant contributions of this article are stated as follows:

•

A transfer learning based Glaucoma disease detection model is proposed and a comprehensive study between various pre-trained model’s performance on Glaucoma is conducted.
•

Evaluating the interpretability of the proposed model using Local Interpretable Model-Agnostic Explanations(LIME) that offeres the medical practitioners with key features or information for the accurate classification of visual diseased Glaucoma.
•

Performance of the proposed model has been studied on a benchmark Glaucoma dataset.

A brief overview of previous Glaucoma disease classification research is included in Section II of this paper, followed by a brief discussion of our methodology, models, and techniques in Section III, which is divided into five sub-sections. The explanation of the work plan is provided in section III-A, whereas III-B discusses data set description, and Section III-C exposes our proposed CNN model. The performance evaluation have been depicted in Section IV. Finally, in Section V we have interpreted our model and attempted to illustrate how it makes decisions.

II Literature Review

Glaucoma is one of the most common causes of permanent blindness around the world [9]. As when the pressure inside the eye is too high in a particular nerve that moment glaucoma will develop and it will also create eye ache. The working mechanisms of the different diagnosis tools like tonometers, gonioscopy, scanning laser tomography, etc are available for the treatment and detection but there are some advantages and disadvantages which sometimes create boundaries. For this, there should be an evaluation of how this works. But with using deep learning the boundaries can be removed. As the XAI concept can be understood by humans which will be closer to the human brain to understand. We have utilized ImageNet’s various pre-trained models in order to classify diseased Glaucoma.

TABLE I: Comparison Between the previous studies

Architectures	Dataset	Accuracy	Reference
CNN	Retinal OCT	94.87%	[19]
ResNet50-v1	Retinal OCT	94.92 %	[20]
CNN	Glaucoma	84.50%	[10]
FNN	Glaucoma	92.5%	[11]

Table I depicts the brief illustration of previous studies. Additionally, one more research was done from which We learned The impact of artificial intelligence in the diagnosis and management of glaucoma from [12]. Computerized automated visual field testing represents a significant improvement in mapping the island of vision, allowing visual field testing to become a cornerstone in diagnosing and managing glaucoma. Goldbaum developed a two-layer neural network for analyzing visual fields in 1994 et al.[7]. This network classified normal and glaucomatous eyes with the same sensitivity (65%) and specificity (72%) as two glaucoma specialists.

The pathogenesis of glaucoma appears to be dependent on several interconnected pathogenetic mechanisms, including mechanical effects characterized by excessive intraocular pressure, reduced neutrophil produce, hypoxia, excitotoxicity, oxidative stress, and the involvement of autoimmune processes, according to new evidence [13]. Hearing loss has also been linked to the development of glaucoma. In normal tension glaucoma patients with hearing loss, antiphosphatidylserine antibodies of the immunoglobulin G class were shown to be more prevalent than in normal-tension glaucoma patients with normacusis. The World Health Organization reports that glaucoma affects approximately 60 million people worldwide. By the year 2020, it is expected that approximately 80 million people will suffer from glaucoma, which is anticipated to result in 11.2 million cases of bilateral blindness [14]. This is why it needs to be treated as early as possible according to the authors.

Unlike the studies mentioned above, our focus has been on interpreting our proposed model such that medical practitioners would feel confident utilizing our approach.

III Methodology

We can obtain a clear overview of our proposed model which is separated into three subsections, from Section III. Part III-A discusses about our working plan, followed by part III-B, which discusses data gathering and pre-processing, and lastly, part III-C, which discusses the architecture.

III-A System Model

Refer to caption — Figure 1: System Model: Glaucoma Classification

We have employed Deep Learning or FCNNs in our work which is a BlackBox function. Generally, Black boxes work excellently but their structure won’t give you any insights that will explain how the function is being approximated. For this, we have used LIME which is one of the most popular XAI-based python libraries. There are a lot of XAI frameworks that explain the BlackBox model’s insights by features. XAI functions work well in terms of explaining complex classification models. In short, these functions generate an explanation through charts of graphs for a complex model’s prediction which are also pretty fast. Figure 2 represents how black boxes actually work with the help of LIME.

Here we can see BlackBox models generate a result or output based on some features from the given/training datasets. And through lime, we can have a visualization from which features the output was based on. In our Glaucoma dataset, we have some features for Suspicious glaucoma and Non-glaucoma. In both sections, we have fundus images, and labels as 1 as the confirmed glaucoma case and 0 as the Non-glaucoma case. To apply XAI, we took Fully Connected Neural Networks (FCNNs) as a black box AI model to predict glaucoma with the help of the data. To compile all of these classifications and determine the average of these scores to one single output, we will use ReLU non-linear activation function in the convolutional layers and the Softmax activation function in the output layers. Below a short rendition is being given for the above Deep Learning models.

•

Convolutional Neural Network (CNN): In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of deep neural networks, most commonly applied to analyze visual imagery. We will classify the image data through this model.
•

Fully Connected Neural Network (FCNNs): Fully connected neural networks (FCNNs) are a type of artificial neural network where the architecture is such that all the nodes, or neurons, are in one layer, are connected to the neurons in the next layer [15]. This model will also help us to predict and output.
•

ReLU: A rectified linear activation unit, or ReLU for short, is a node or unit that implements this activation function. Often, networks with hidden layers that use the rectifier function are referred to as rectified networks. The computational cost of adding more ReLUs increases linearly as the size of the CNN grows.
•

Softmax: Softmax is a mathematical function that transforms a vector of integers into a vector of probabilities, with the probability of each value proportional to the vector’s relative scale. The softmax function is most commonly used as an activation function in a neural network model in applied machine learning. The network is set up to produce N values, one for each classification task class, and the softmax function is used to normalize the outputs, turning them from weighted sum values to probabilities that total to one. Each value in the output of the softmax function is interpreted as the probability of membership for each class. This will compile all the convolutional layers of the FCNNs into a single output.

According to our Dataset, we will divide the data chronologically into training and testing data to classify glaucoma.. And through Lime, a XAI function, we will explain these black boxes.

Here in Figure 1, we have shown the whole process from dataset preprocessing to compiled output through Softmax activation function. And with XAI functions, we will explain the black boxes through visualization charts of the used core features which were the main reasons behind the prediction.

III-B Dataset Description

As our data are mostly direct fundus images from LAG-Dataset [16]. CNN is being used in this thesis for image classification, as it is a type of model which processes data such as images. Also, it automatically understands low-to high-level patterns of image classification. which helps us to extract higher representations for the image content.

The dataset contains 4250 images for training, 302 images for testing and 302 images for validation. All of these images have separated into two folders for glaucoma and non glaucoma. Depicted from Table II the label for “glaucoma” is 1 and for “non-glaucoma” is 0.

TABLE II: Dataset Description

Class	Label	Fundus Image
Suspicious Glaucoma	1	1711
Non Glaucoma	0	3143

III-C Architecture

In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of artificial neural networks, most commonly applied to analyze visual imagery. In this study, a Transfer Learning approach is proposed. The data set’s size and features provide a perfect environment for implementing a transfer learning approach, allowing a pre-trained CNN with all of its weights to be utilized to develop a new transfer learning model specialized to identifying Glaucoma with a high degree of accuracy.

We used pre-trained models from Tensorflow Keras implementation and through Transfer Learning we trained only the layers we need to train. All the model’s weights were trained from the ImageNet dataset. After downloading the pre-trained model we have made every trained layer into untrainable layers and deleted the top layers to reuse the model. Then we use a Flatten layer to flatten every pre-trained layer of the keras model’s into one and we used 3 Dense neurons with 100 layers in each of it for VGG-16, VGG-19, InceptionV3 and ResNet50. For DenseNet121 we used 1024 layers for first, 512 layers for second and 256 layers for third neuron. We used the “ReLU” activation function for the Convolutional layers. We also performed batch normalization and a dropout with the rate of 0.5 in each model. For predictions, we used the A Dense neuron with 2 layers in it and “Softmax” for activation function. For the Gradient Descent, we used the Adam optimizer.

IV Performance Evaluation

In the Figure 3 we can see the architecture of RestNet50 which is our proposed model. However, we have utilized VGG-16, VGG-19, DenseNet121, InceptionV3 and ResNet50 models for our study which was compiled with Adam optimizer with the learning rate of 1e-5 in 50 epochs. After 50 epochs, RestNet50 managed to acquire the highest score among the other models with a validation accuracy of 94.7%. Table III depicts the findings of our study. Table IV shows the misclassification count along with percentage among each of the model for both of the classes. Here, G = Glaucoma and n-G = Non-Glaucoma.

TABLE III: MODEL ACCURACY AND LOSS

Model	Accuracy	Train Accuracy	Train Loss	Valid Loss
DenseNet121	86.81%	88.83%	31.18%	24.10%
InceptionV3	86.42%	93.49%	20.04%	35.79%
ResNet50	94.71%	99.56%	3.81%	12.22%
VGG-16	88.63%	98.00%	6.76%	27.92%
VGG-19	93.31%	97.00%	11.53%	14.94%

Figure 4 represents the curve of accuracy, loss along with validation curve. Here, after extensive analysing of all the utilized model, we can state that VGG-19 and ResNet50 were the best-fitted models.

TABLE IV: Model Performance with Misclassification

	Model	All	G	n-G
		Misclassified	Misclassified	Misclassified
	DenseNet121	9 (86.76%)	4 (88.24%)	5 (85.29%)
	InceptionV3	151 (93.83%)	74 (93.95%)	77 (93.71%)
Test	VGG-16	48 (97.65%)	22 (97.84%)	26 (97.45%)
	VGG-19	22 (94.12%)	3 (91.18%)	1 (97.06%)
	ResNet50	26 (95.59%)	1 (97.06%)	2 (94.12%)
	DenseNet121	151 (93.83%)	74 (93.95%)	77 (93.71%)
	InceptionV3	48 (97.65%)	22 (97.84%)	26 (97.45%)
Valid	VGG-16	47 (98.08%)	20 (98.37%)	27 (97.79%)
	VGG-19	17 (99.31%)	14 (98.86%)	3 (33.75%)
	ResNet50	0 (100%)	0 (100%)	0 (100%)

These are the single image predictions of all models -

V XAI: Local Interpretable Model-Agnostic Explanations

Now we will show the explanation for these preprocessed and misclassified images using an XAI [17] framework, LIME. Then we will apply Lime again on a single predicted raw fundus -image directly from the test dataset (labelled) directory to see the difference between a correctly predicted fundus image [18] and wrong predicted fundus image. Given below are the misclassified image with preprocessing, Superpixels focused area and the model prediction explanation by Lime in DenseNet12.

5 depicts the LIME for a single image. Its goal is to make the predictions of machine learning models understandable to humans. The method can explain individual instances which makes it suitable for local explanations. LIME manipulates the input data and creates a series of artificial data containing only a part of the original attributes. Thus, in the case of text data, for example, different versions of the original text are created, in which a certain number of different, randomly selected words are removed. This new artificial data is then assigned to different categories (classified). Hence, through the presence or absence of certain keywords we can see their influence on the classification of the selected text. LIME gives the output as a list of explanations which reflects the contribution of each feature which resulted in the final prediction.

VI Conclusion

In this research, we have a proposed a model for detecting Glaucoma diseases based on transfer learning along with a thorough comparison of the effectiveness of several pre-trained models for the classification of Glaucoma. In addition, we have employed Local Interpretable Model-agnostic Explanations(LIME) on all of the utilized model. This employment aids to build trust among the Medical professionals as they usually do not rely on deep learning based system as it tends to have black box characteristics. Thus its not possible to determine the key features behind a models prediction. However, LIME locates these crucial elements and creates a visual representation of their reasoning. Our proposed system is trained on a benchmark Glaucoma dataset and with ResNet50 we have managed to acquire a validation accuracy of 94.7%. VGG-19 also managed to reach an validation accuracy of 93.3%. We intend to advance this study in the future by enhancing the efficiency and ease of glaucoma detection. Till now we have detected Glaucoma and Non-Glaucoma at a satisfactory accuracy rate using multiple models but in near future we are planning to develop a web application which will help people to identify Glaucoma by just uploading the images. Moreover, we are planning to work on more datasets with improved accuracy.

References

[1] Salam AA, Khalil T, Akram MU, Jameel A, Basit I. Automated detection of glaucoma using structural and non structural features. Springerplus. 2016 Dec;5(1):1-21.
[2] Allison K, Patel D, Alabi O. Epidemiology of glaucoma: the past, present, and predictions for the future. Cureus. 2020 Nov 24;12(11).
[3] Ran A, Cheung CY. Deep learning-based optical coherence tomography and optical coherence tomography angiography image analysis: an updated summary. The Asia-Pacific Journal of Ophthalmology. 2021 May 1;10(3):253-60.
[4] Aleci, C. (2020). Detection of Visual Field Loss Progression in Glaucoma: An Overview and Food for Thought. Ophthalmology Research: An International Journal, 16–24.
[5] Diabetic Retinopathy — National Eye Institute. (2021, July 30). National Eye Institute.
[6] Types of Glaucoma.(2020, June 2). Glaucoma Research Foundation.
[7] aba, T., Khan, M. W., Yasmin, M., Sharif, M. (2017). CDR based glaucoma detection using fundus images: a review. International Journal of Applied Pattern Recognition, 4(3), 261.
[8] Thakoor, K. A., Li, X., Tsamis, E., Sajda, P., Hood, D. C. (2019). Enhancing the Accuracy of Glaucoma Detection from OCT Probability Maps using Convolutional Neural Networks. 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).
[9] Lim, T. C., Chattopadhyay, S., Acharya, U. R. (2012). A survey and comparative study on the instruments for glaucoma detection. Medical Engineering Physics, 34(2), 129–139.
[10] Abbas, Q. (2017). Glaucoma-Deep: Detection of Glaucoma Eye Disease on Retinal Fundus Images using Deep Learning. International Journal of Advanced Computer Science and Applications, 8(6).
[11] Dervisevic, E., Pavljasevic, S., Dervisevic, A., Kasumovic, A. (2016). Challenges In Early Glaucoma Detection. Medical Archives, 70(3), 203.
[12] Mayro, E.L., Wang, M., Elze, T. et al. The impact of artificial intelligence in the diagnosis and management of glaucoma. Eye 34, 1–11 (2020).
[13] Greco, A., Rizzo, M. I., De Virgilio, A., Gallo, A., Fusconi, M., de Vincentiis, M. (2016). Emerging concepts in Glaucoma and review of the literature. The American Journal of Medicine (2016).
[14] Pascolini, D., Mariotti, S. P. (2012). Global estimates of visual impairment: 2010. British Journal of Ophthalmology, 96(5), 614-618.
[15] Murphy, A. M., Moore, C. M. M. (2020). Fully connected neural network.
[16] Khan, S. M. K. (2021, January 3). Papers with Code - LAG Dataset. The Lancet.
[17] Dağlarli E. Explainable artificial intelligence (xAI) approaches and deep meta-learning models. Advances and applications in deep learning. 2020 Jun 25;79.
[18] Vejjanugraha, P., Kongprawechnon, W., Kondo, T., Tungpimolrut, K., Kotani, K. (2017). An automatic screening method for primary open-angle glaucoma assessment using binary and multi-class support vector machines. ScienceAsia, 43(4), 229.
[19] Apon TS, Hasan MM, Islam A, Alam MG. Demystifying Deep Learning Models for Retinal OCT Disease Classification using Explainable AI. In2021 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE) 2021 Dec 8 (pp. 1-6). IEEE.
[20] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778. [6] D. S. Kermany, M. Goldbaum, W. Cai, C. C