Presentation Authors: Ryan Hsi*, Nashville, TN, Daniel Lee, Philadelphia, PA, Yaomin Xu, Cosmin Bejan, Nashville, TN
Introduction: Phenotyping algorithms able to interrogate electronic health record (EHR) repositories are essential for advancing personalized kidney stone care. Here, we validate our natural language processing (NLP) algorithm for extracting kidney stone composition from clinical notes.
Methods: A pattern matching method was developed to capture stone composition mentions from manually annotated training and test sets of clinical notes. The algorithm identifies % text mentions of calcium oxalate monohydrate (COM) and dihydrate (COD) hydroxyapatite, brushite, uric acid, and struvite stones. We employed the algorithm across >125 million notes from our institutional EHR. Analyses performed on the extracted patients included stone type conversions over time, survival analysis from a second stone surgery, and phenotype associations by stone composition to validate the method against known disease associations.
Results: The NLP algorithm achieved PPV values >90% for all % stone compositions except uric acid (PPV=87.5%). The most prominent interconversions of stone types over time were COMâ†”COD and COMâ†”uric acid. Survival analysis from a second stone surgery showed survival the median survival time for struvite stones at 69 months, while not reached for all other groups (P=0.03). Seven phenotype associations were found significant: uric acid-type 2 diabetes (OR=2.69, 95% CI=1.91-3.79), struvite-neurogenic bladder (OR=12.27, 95% CI=4.33-34.79), struvite-UTI (OR=7.36, 95% CI=3.01-17.99), hydroxyapatite-pulmonary collapse (OR=3.67, 95%CI=2.10-6.42), hydroxyapatite-neurogenic bladder (OR=5.23, 95% CI=2.05-13.36), brushite and calcium metabolism disorder (OR=4.59, 95% CI=2.14-9.81), and brushite-hypercalcemia (OR=4.09, 95% CI=1.90-8.80).
Conclusions: NLP extraction of stone composition from large-scale EHRs is feasible with high precision, enabling high-throughput studies for kidney stone disease.
Source of Funding: Supported by CTSA award No. UL1 TR002243 from the National Center for Advancing Translational Sciences. Its contents are solely the responsibility of the authors and do not necessarily represent official views of the National Center for Advancing Translat