Prediction of neovascular age-related macular degeneration recurrence using optical coherence tomography images with a deep neural network

Data collection and labelling

We analyzed the medical records of the patients diagnosed with nAMD between January 2015 and June 2021 at the Kong Eye Hospital. Only treatment-naïve eyes with nAMD were enrolled and all the patients received three monthly injections of either ranibizumab or aflibercept. If both eyes were treated, only one eye was randomly selected. Exclusion criteria were extrafoveal nAMD; non-exudative AMD; more than a 6-week interval between three loading injections; prior treatment in the study eye with photodynamic therapy, subfoveal focal laser photocoagulation, or vitrectomy; anti-VEGF injection other than ranibizumab and aflibercept; macular degeneration, such as epiretinal membrane and macular hole; retinal vascular disease, such as retinal vein occlusion, retinal artery occlusion, and diabetic retinopathy; missing OCT examination; and cataract surgery within 3 months.

Age, sex, underlying diseases such as hypertension and diabetes, and history of ophthalmic surgery were recorded for all the patients. Visual acuity, intraocular pressure, and fundus examinations were performed, and neovascularization in the macula was confirmed using fluorescein angiography (FA) and indocyanine green angiography (ICGA). OCT was performed at each visit to determine the changes in the macula. FA was performed using the Heidelberg Retina Angiograph (HRA; Heidelberg Engineering, Heidelberg, Germany), and OCT was performed using the Heidelberg Spectralis (Heidelberg Engineering, Heidelberg, Germany).

OCT scans were performed prior to injection therapy and at every 4-week visit during injection therapy, and the treatment response was assessed using OCT scans performed 4 weeks after the third injection. The macular fluids were included the IRF, SRF, and PED. Dry macula was defined as the absence of IRF and SRF. Fluid under the retinal pigment epithelium was not considered for identifying dry macula unless the PED increased compared to that during the last visit. Regarding the treatment results, when all the IRF and SRF disappeared, the dry-up response was evaluated as good, and when the SRF and IRF remained and residual fluid was visible, the response was evaluated as poor. When only the PED remained and no other fluid was present, it was judged to be dry-up9. In cases of new macular hemorrhage or increased macular edema on OCT, a dry macula was not considered.

Upon OCT imaging, recurrence was determined if IRF, SRF, or subretinal hemorrhage (SRH) was observed, or PED was significantly increased. Recurrence was considered among only the patients who exhibited dry-up macula after three loading injections, if any signs of IRF, SRF, or SRH were observed, or if there was a significant increase in PED before the completion of follow-up (within 6 months after the initiation of the first treatment). However, if the dry-up state was maintained even 6 months after the initial injection treatment, it was classified as a non-recurrence group.

Data preprocessing

The flowchart illustrating the process of administering injections is depicted in Fig. 4. In our study, which aimed to predict the recurrence of SRF or IRF in the macula or increase of PED, 96 patients were excluded because they did not show dry-up macula after three injections. Additionally, 152 patients were excluded due to follow-up loss or missing follow-up around 6 months, making it difficult to evaluate the timing of recurrence around 4 months after the last injection. As a result, only 269 out of the initial 517 patients were included in the study. In addition, since we aimed to predict whether recurrence would occur within the next 3 months based on the follow-up time 1 month after the last injection, we used censoring statistical analysis22 to relabel the patient’s recurrence; moreover, since some patients could not be tracked and data could not be recorded after three loading injections, data processing through censoring statistical analysis was considered necessary. Using the censoring statistic method, our dataset was divided into four cases: (1) recurrence within 4 months after the last injection, (2) recurrence after 6 months after the initial injection, (3) no recurrence after 6 months after the initial injection, and (4) no patient records after three injections. We relabeled (1) as having recurred, (2) and (3) as non-recurred and excluded (4) because we did not know whether recurrence occurred. Thus, we used the data of 269 relabeled patients as the final dataset by applying censoring statistical analysis to the 388 patients who completed the three loading injection treatments.

Figure 4

The flowchart of the three loading anti-vascular endothelial growth factor injections for patients with neovascular age-related macular degeneration. The process begins with capturing of the pre-injection optical coherence tomography (OCT) images on the first injection day, followed by monthly OCT image captures immediately following the injection. The recurrence is checked 6 months after the first injection day.

Moreover, because our research objective was to predict recurrence by examining the (1) pre-injection image only and (2) pre-injection image and all the images immediately after each of the three injections, we used 1076 SD-OCT images from 269 patients with pre-injection images and images after the 1st, 2nd, and 3rd injections.

We down sampled all the OCT images into a fixed-size image of 224 × 224 RGB for inputting deep neural network. We increased the various input images using data augmentation to build a robust model and avoid overfitting. The data augmentation process included (1) random horizontal image flips and (2) random rotations of up to 10° in the images. We performed data augmentation only during model training.

Model architecture

To predict nAMD recurrence, we built a deep learning model based on DenseNet20123. As shown in Table 2, DenseNet201 demonstrated the best performance among other well-known CNN architectures, such as VGG-1624, Xception25, Inception-V326, and ResNet-5027; thus, we selected DenseNet201 as the base feature extractor. DenseNet has the advantage of significantly reducing the number of parameters by encouraging reuse of the features28. Moreover, we confirmed that the deep-layer structure of Densenet201 captured the representations of the disease better than Densenet121 and Densenet169. To avoid overfitting and train the models faster, we applied transfer learning29 to learn the all models and to ensure fairness, the same input data were used in training model. Specifically, we initialized 200 layers of DenseNet201 with pre-trained weights using the large-scale Dataset, ImageNet30.

In addition, as shown in Fig. 5, we adopted a multi-instance model structure to simultaneously study multiple OCT images after monthly loading injections. To assess multiple input images, as shown in Table 3, both the LSTM31 and attention32 modules performed well in capturing sequential information. However, we selected the attention module as the final fusion method to calculate the attention score for each image and predict nAMD recurrence. While using the dropout layer to prevent overfitting, we added the traditional multilayered perceptron33 as a fully connected layer. Finally, we used the softmax activation function for the final output layers to predict the nAMD recurrence.

Figure 5

Overall architecture of the proposed model. The model is composed of an input layer, four feature extractors, an attention fusion layer, and a fully connected layer with dropouts and sigmoid activation function. The four feature extractors are based on DenseNet201 encoder, each with 200 pretrained convolutional neural network layers, and weight sharing application among them. The last fully connected layer predicts the likelihood of recurrence of the input cases within 6 months from the initial check point.

Visual explanation using Grad-CAM

We applied Grad-CAM34 to provide a visual explanation of the decision-making process of the deep learning model. Grad-CAM highlights the important regions on OCT images for predicting nAMD recurrence through gradient-based localization. Based on the gradient of the feature maps of each convolution layer, we created a heat map representing the part used in the prediction process of the model.

Experimental setup

We performed LOOCV35 to train both the baseline and proposed models. Thus, LOOCV is an effective validation method for small data sizes21. For applying LOOCV to our training process, we retained only one data sample for model testing and used the remaining dataset for training. This process was repeated 269 times, indicating the number of patients, with each observation being excluded once as validation data. Although LOOCV is a well-known effective method for evaluating small data sizes, it is also known for its vulnerability to overfitting compared to K-fold cross-validation36. To manage overfitting, we employed dropouts and implemented early stopping with a patience of 7. To perform LOOCV for all the datasets, we divided the dataset according to the patient ID to prevent the OCT images of the same patient from being mixed in the training, test, and validation sets. In addition, we employed the same LOOCV to train all the models, and each model was evaluated using the average performance for all the LOOCV results. To ensure fairness across all models, we established a uniform parameter standard that configured the batch size, epoch, and dropout rate as 64, 100, and 0.4, respectively. Additionally, we utilized the Adam37 optimization with a learning rate of 0.001 for all models, including the proposed model.

For the general experiment, 100 samples were randomly selected from 269 patients for a performance comparison with the ophthalmologists. To ensure a fair experiment, these 100 samples were not used in the model training process. We provided (1) a single image of the initial state, before the loading injections, and (2) three post-injection images immediately after the loading injection of these 100 patients to six ophthalmologists, including two ophthalmology residents (1 and 3 years of experience, respectively), three retinal fellows (2 years of experience as a retina specialist), and one retinal specialist (more than 10 years of clinical experience). Note that, during the experiment involving the presentation of (1) a single image of the initial state before the loading injections and (2) three post-injection images immediately after the loading injection to six ophthalmologists, only (1) was shown while (2) was intentionally withheld. To analyze the ophthalmologists’ individual and common perspectives, we analyzed each ophthalmologist’s prediction results for 100 patients, consisting of 52 recurrence cases and 48 non-recurrence cases. Simultaneously, we calculated the average decision-making and probability of recurrence by all the ophthalmologists for comparison with the softmax value of the proposed model. If all six ophthalmologists predicted recurrence for the sample, it was calculated as 1, and if all six predicted no recurrence, it was calculated as 0. Statistical analysis was conducted between the average prediction rate of the six ophthalmologists and the softmax value of the proposed model to determine the consistency and relevance of the decision-making process.

Statistical analysis

We applied Fleiss’ kappa coefficients to calculate the level of agreement between the multiple rates, including those of all the ophthalmologists and the proposed model. To compute this statistic, we used the Statsmodels module, a well-known Python package for statistical analysis.

Ethical approval

This study adhered to the principles of the Declaration of Helsinki and was approved by the Institutional Review Board of Kong Eye Hospital (KIRB-202202-HR-001-01), which waived the requirement for obtaining informed consent because this was a retrospective observational study of medical records and was retrospectively registered.