Category: Clinical Stones: Medical Management

MP31-6 - Large-scale text mining of kidney stone composition from electronic health records

Sun, Sep 23
2:00 PM - 4:00 PM

Introduction & Objective :

The electronic health record (EHR) is an underutilized data source that is usable for kidney stone research. Natural language processing (NLP) can structure these data to identify and phenotype kidney stone formers. We developed and tested an NLP system for the task of stone composition identification from millions of clinical notes in the Vanderbilt EHR.


Methods :

Within our de-identified institutional EHR, there are over 2.9 million patients with more than 125 million notes.  Annotation of all text expressions of chemical stone composition from 400 clinical notes with mentions of “stone analysis” or “kidney stone” was performed by two urologists. We randomly selected 70% of the resulted annotations for training and the remaining 30% for validation. Our NLP algorithm consisted of analyzing the annotations from the training set and developing regular expressions for finding predefined patterns of stone compositions in text. We evaluated the algorithm on the validation set and determined positive predictive value (PPV), sensitivity, and F1 score, which is the harmonic mean of PPV and sensitivity. To maximize PPV and minimize false positives, only annotations containing % composition were included. The algorithm was applied to the entire de-identified medical record.  Descriptive statistics with demographic data were extracted by stone composition category for calcium oxalate monohydrate (COM), calcium oxalate dehydrate (COD), hydroxyapatite (HA), brushite, uric acid, and struvite stone formers.


Results :

When applied to the entire medical record, the algorithm identified 2417 total patients of whom there were 1966, 736, 233, 100, 258, and 76 respective mentions with % COM, COD, HA, brushite, uric acid, and struvite compositions. The performance on the test set was in the range of 88-100% for PPV, 80-100% for sensitivity, and 88-100% for F1 score.  The inter-annotator agreement between the two annotators was high (Cohen’s kappa=0.845).  There were differences by stone composition type in the mean frequencies of stone-specific CPT codes, ages of first stone type mention and by sex (each p


Conclusions :

Our NLP algorithm to phenotype kidney stone formers by stone composition has high PPV, sensitivity, and F1 score.  Mining stone composition from large-scale EHR is feasible with high precision from clinical notes. 

Ryan Hsi

Assistant Professor of Urologic Surgery
Department of Urology; Vanderbilt University Medical Center
Nashville, Tennessee

Dr. Hsi is currently an assistant professor int the Department of Urology after completing a laparoscopy and endourology fellowship at the University of California San Francisco, CA. Dr. Hsi received his B.A. and B.S. degree from Stanford University, Stanford, CA, and his medical degree from Loma Linda University School of Medicine, Loma Linda, CA. He completed an internship and a residency at the University of Washington School of Medicine, Department of Surgery, Seattle, WA. Dr. Hsi is Board Certified with the American Board of Urology and is a member of the American Urological Association and the American Medical Association. His clinical areas of interest are performing the medical and surgical management of kidney stone disease and minimally invasive urologic surgery for benign upper tract disease. Dr. Hsi’s focus of research is the pathogenesis, epidemiology, treatment and prevention of kidney stones.

Daniel Lee


Nashville, Tennessee

Cosmin Bejan


Nashville, Tennessee