Poster, Podium & Video Sessions
Presentation Authors: Florian Schroeck*, White River Junction, VT, Olga Patterson, Patrick Alba, Scott DuVall, Salt Lake City, UT, Brenda Sirovich, Douglas Robertson, White River Junction, VT, John Seigne, Lebanon, NH, Philip Goodney, White River Junction, VT
Introduction: Population-based studies to advance bladder cancer care require longitudinal pathology data that allow measurement of disease recurrence and progression. The prime data source for population-based studies has been SEER-Medicare, but SEER data is limited because pathologic information is only abstracted at time of diagnosis. We set out to obtain longitudinal pathology data by developing a natural language processing (NLP) engine to automate abstraction of important details from full text pathology reports.
Methods: We selected a national random sample of 600 bladder pathology reports from the Department of Veterans Affairs (VA) Corporate Data Warehouse. These reports were independently annotated by two reviewers with discrepancies resolved by a third to develop a gold standard. We used Cohen&[prime]s kappa to evaluate inter-rater reliability for histology, invasion (presence versus absence and depth), grade, and statements regarding presence of muscularis propria and of carcinoma in situ. Next, we iteratively trained, developed, and tested the NLP engine&[prime]s ability to abstract these variables from the reports. We assessed NLP performance by calculating accuracy, precision (positive predictive value), and recall (sensitivity).
Results: Inter-rater reliability was excellent between the two reviewers (kappa ranging from 0.82 to 0.90). NLP achieved the highest accuracy for presence of carcinoma in situ (0.98), with accuracy for histology, invasion, grade, and presence of muscularis propria ranging from 0.82 to 0.93 (Table). The most challenging variable was depth of invasion, due to the high variability in the language used to describe findings. Nevertheless, we achieved acceptable accuracy (0.82) and precision (0.79; table).
Conclusions: We developed an NLP engine to accurately abstract important pathologic details from full text bladder cancer pathology reports. This engine now allows for abstraction of data from tens of thousands of bladder cancer pathology reports, enabling us to develop a population-based cohort of patients with longitudinal pathology data. The resulting unique dataset will be used to examine the extent to which bladder cancer care impacts recurrence and progression of disease.
Source Of Funding: Dept of Veterans Affairs VISN 1 Career Development Award; Conquer Cancer Foundation Career Development Award; DHMC Dept of Surgery internal Career Development Award