Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Apr 10:2:25.
doi: 10.1038/s41746-019-0099-8. eCollection 2019.

Deep learning versus human graders for classifying diabetic retinopathy severity in a nationwide screening program

Affiliations

Deep learning versus human graders for classifying diabetic retinopathy severity in a nationwide screening program

Paisan Raumviboonsuk et al. NPJ Digit Med. .

Erratum in

Abstract

Deep learning algorithms have been used to detect diabetic retinopathy (DR) with specialist-level accuracy. This study aims to validate one such algorithm on a large-scale clinical population, and compare the algorithm performance with that of human graders. A total of 25,326 gradable retinal images of patients with diabetes from the community-based, nationwide screening program of DR in Thailand were analyzed for DR severity and referable diabetic macular edema (DME). Grades adjudicated by a panel of international retinal specialists served as the reference standard. Relative to human graders, for detecting referable DR (moderate NPDR or worse), the deep learning algorithm had significantly higher sensitivity (0.97 vs. 0.74, p < 0.001), and a slightly lower specificity (0.96 vs. 0.98, p < 0.001). Higher sensitivity of the algorithm was also observed for each of the categories of severe or worse NPDR, PDR, and DME (p < 0.001 for all comparisons). The quadratic-weighted kappa for determination of DR severity levels by the algorithm and human graders was 0.85 and 0.78 respectively (p < 0.001 for the difference). Across different severity levels of DR for determining referable disease, deep learning significantly reduced the false negative rate (by 23%) at the cost of slightly higher false positive rates (2%). Deep learning algorithms may serve as a valuable tool for DR screening.

Keywords: Developing world; Diabetes complications.

PubMed Disclaimer

Conflict of interest statement

Competing interestsJ.K., R.S., K.W., B.J.L.C., G.S.C., L.P., S.P. and D.R.W. are Google employees and receive salary and stock as a part of the standard compensation package. O.K., J.J. and J.T. are consultants for Google.

Figures

Fig. 1
Fig. 1
Comparison of manual grading and algorithm performance. Receiver operating characteristic (ROC) curve of model (blue line) compared to grading by regional graders (red dot) for varying severities of diabetic retinopathy (DR) and diabetic macular edema (DME). The performance represented by the red dot is a combination of all of the grades from the regional graders on all gradable images, since regional graders only graded images from their own region
Fig. 2
Fig. 2
Comparison of algorithm and individual regional grader performance. Grader performances are represented as blue diamonds (ophthalmologists) and red dots (nurse or technician) for a moderate or worse non-proliferative diabetic retinopathy (NPDR), b diabetic macular edema (DME), and c severe NPDR, proliferative diabetic retinopathy (PDR), and/or DME. Analysis is performed on all gradable images
Fig. 3
Fig. 3
Agreement on the image level between the reference standard and regional graders. Comparison of diabetic retinopathy (DR) and diabetic macular edema (DME) performance between the reference standard and a, c regional graders or b, d the algorithm. Adjudication was performed only for images where either the regional grader or the algorithm identified as moderate and above. Thus, for DR, non-referable cases (no/mild) are combined into a non-referable bucket

Similar articles

Cited by

References

    1. Gulshan V, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA. 2016;316:2402–2410. doi: 10.1001/jama.2016.17216. - DOI - PubMed
    1. Krause J, et al. Grader variability and the importance of reference standards for evaluating machine learning models for diabetic retinopathy. Ophthalmology. 2018;125:1264–1272. doi: 10.1016/j.ophtha.2018.01.034. - DOI - PubMed
    1. Ting DSW, et al. Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes. JAMA. 2017;318:2211–2223. doi: 10.1001/jama.2017.18152. - DOI - PMC - PubMed
    1. Jenchitr W, et al. The national survey of blindness low vision and visual impairment in thailand 2006–2007. Thai. J. Pub. Hlth. Ophthalmol. 2007;21:10.
    1. Isipradit S, et al. The First Rapid Assessment of Avoidable Blindness (RAAB) in Thailand. PLoS ONE. 2014;9:e114245. doi: 10.1371/journal.pone.0114245. - DOI - PMC - PubMed
  NODES
admin 2
Association 1
chat 3
COMMUNITY 3
INTERN 3
twitter 2