Digital Health Innovation and Informatics
Purpose/Objective(s): Accurate contouring of organs at risk (OAR) and gross tumor volumes (GTV) is particularly important in stereotactic body radiotherapy (SBRT) where smaller margins are used. Manual segmentation is labour intensive and can suffer from significant inter-observer variability. Here we evaluate the performance of deep learning auto-segmentation models trained from retrospective manually drawn contours from a single center and assess whether these models can accurately segment patient planning CT scans from a different cancer center with acceptable results.
Materials/Methods: Auto-segmentation models were trained using a deep convolutional neural network based on a U-net architecture using 210 planning CT scans, which included 160 publicly available planning CT scans with ground truth contours reviewed by a radiation oncologist and 50 lung SBRT CT scans from a single center (center A). Deep learning models were then used to segment 100 planning CT scans, which consisted of 50 additional scans from center A and 50 planning CT scans from a separate cancer center (center B). The original clinical contours (CC) were compared with the deep learning based contours (DC) using the Dice Similarity Coefficient (DSC) and 95% Hausdorff distance transform (DT).
Results: Comparing DCs to CCs for all 100 contoured planning CT scans, the mean DSC and 95% DT were 0.93 and 2.8 mm for aorta (n=81), 0.81 and 3.3 mm for esophagus (n=99), 0.95 and 5.1 mm for heart (n=100), 0.98 and 3.1 mm for lung (n=190), 0.56 and 6.6 mm for brachial plexus (n=101), 0.82 and 4.2 mm for proximal bronchial tree (n=100), 0.90 and 1.6 mm for spinal cord (n=87), 0.91 and 2.3 mm for trachea (n=100), and 0.71 and 5.2 mm for lung GTVs (n=85). The DSC and 95% DT were not significantly different for center A and center B for aorta, lung GTV, heart, lung, brachial plexus, spinal cord, and trachea. Structures with significantly different DSC or 95% DT between the two centers included the esophagus DSC (0.80 vs 0.83, p=0.02) and proximal bronchial tree 95% DT (3.6 vs 4.8 mm, p=0.001).
Conclusion: Deep-learning auto-segmentation models can provide accurate segmentation for OARs used in lung SBRT. Models trained with a single institution’s data were accurate when validated on a separate institution’s planning CT scans, despite variations in scan quality and contouring practices. Deep learning lung GTV segmentation models reliably located the target lesions but generally were less accurate than the organs at risk models due to the variable location and size of lung tumors. Deep learning auto-segmentation can provide an accurate starting point for review and manual adjustment and should improve efficiency in lung SBRT planning.