James S. Floyd, David S. Carrell, Maralyssa Bann, Susan Gruber, Ron L Johnson, Vina F Graham, Cosmin A Bejan, Yong Ma, Mayura Shinde, Robert Ball and Jennifer C Nelson
Background: Anaphylaxis is an acute life-threatening illness that is often misidentified by ICD diagnostic codes. This threatens the validity of claims-based epidemiologic studies of anaphylaxis as an adverse drug event.
Objectives: (1) Conduct physician adjudication of potential ICD-10 based anaphylaxis events to identify genuine cases. (2) Improve the accuracy of claims-based anaphylaxis identification using natural language processing (NLP) of unstructured clinical notes and machine learning methods.
Methods: We obtained medical records with diagnosis codes for anaphylaxis from inpatient (IP), emergency department (ED), and outpatient (OP) encounters in an integrated healthcare system in Washington State from October 2015 to December 2018. Eligible OP events were also required to have modifier codes suggesting presence of an acute condition. Potential drug-related events (ICD-10 code T88.6) were over-sampled. Two physicians performed adjudication using established events criteria; disagreements were resolved by discussion. We also reviewed IP and ED encounters with diagnosis codes for allergic reactions and adverse drug reactions to identify further potential anaphylaxis cases.
Results: Out of 247 potential events sampled, 240 (97%) had medical records adequate for adjudication (162 ED/IP, 78 OP). The overall positive predictive value (PPV) for validated anaphylaxis events was 64.6% (95% CI 58-70%) with minimal heterogeneity by setting (ED/IP vs. OP), sex, race, or specific ICD-10 code. PPV was higher among those under age 40 (72%, 95% CI 63-79%) versus age 40 or higher (58%, 95% CI 49-66%, p=0.02). Among validated events, common causes were food (39%), medications (34%), and insect bite or sting (12%). Among false positives, 94% were considered to be another serious allergic reaction. Clinical context (timing, alternative clinical conditions, information source) was important in making the proper classifications for some cases. Adjudication of 76 encounters identified by allergy and adverse drug reaction codes yielded a single anaphylaxis event (PPV 1.3%, 95% CI 0.0-9.0%). Toward improving identification of actual anaphylaxis events, we are currently developing machine learning models using predictors from structured electronic health record (EHR) data and NLP-extracted information from the clinical notes of these 316 adjudicated cases.
Conclusions: ICD-10 diagnosis codes for anaphylaxis had moderate PPV for validated events. Many of the misclassified events were serious allergic reactions. These findings have implications for pharmacoepidemiologic studies that seek to estimate treatment-related risks of anaphylaxis using EHR data.