Presentation Authors: Ohad Kott*, Drew Linsley, Ali Amin, Andreas Karagounis, Dragan Golijanin, Thomas Serre, Boris Gershman, Providence, RI
Introduction: The pathologic diagnosis and grading of prostate cancer are time-consuming, error-prone and subject to inter-observer variability. Machine learning offers opportunities to analyze large amounts of patient data, to discover novel cell morphological features that are relevant to cancer prediction, and to use those characteristics to improve the diagnosis, risk-stratification, and prognostication of prostate cancer. In this study, we evaluated a state-of-the-art deep learning algorithm for the histopathologic diagnosis and Gleason grading of prostate biopsy specimens.
Methods: 147 prostate core biopsy specimens from 23 patients were digitized (Aperio Scanscope CS, Leica Biosystems) at 20x magnification and annotated for Gleason 3, 4 and 5 prostate adenocarcinoma by a urologic pathologist. From these virtual slides, we sampled 21,109 pixel image patches of 256x256 pixels, balanced for malignancy. We trained and tested a state-of-the-art deep residual convolutional neural network to classify each patch at two levels: (1) coarse (benign vs. malignant) and (2) fine (benign vs. Gleason 3 vs. 4 vs. 5). Model performance was evaluated using 5-fold cross validation, and reported as accuracy, sensitivity, specificity, and average precision (weighted area under the precision-recall curve). Randomization tests were used for hypothesis testing of the model&[prime]s performance vs. chance.
Results: The model demonstrated 91.5% accuracy (p < 0.001) at coarse-level classification of image patches as benign versus malignant (0.93 sensitivity, 0.90 specificity, and 0.95 average precision). The model demonstrated 85.4% accuracy (p < 0.001) at fine-level classification of image patches as benign vs. Gleason 3 vs. Gleason 4 vs. Gleason 5 (0.83 sensitivity, 0.94 specificity, and 0.83 average precision), with the greatest number of confusions in distinguishing between Gleason 3 and 4, and between Gleason 4 and 5 (Figure). Expected accuracy by random chance would be 50% for coarse-level and 25% for fine-level classification.
Conclusions: In this study, a deep learning-based computer vision algorithm demonstrated high potential for the histopathologic diagnosis and Gleason grading of prostate cancer. Ongoing studies are planned to externally validate algorithm performance and evaluate additional outcomes.
Source of Funding: This study was supported by NIGMS/Advance-CTR through the IDeA-CTR grant NIGMS/Advance-CTR (U54GM115677). Additional support provided by the Carney Institute for Brain Sciences, the Center for Vision Research (CVR) and the Center for Computation and Visua