Background: The Centor score is a validated 4-element clinical score to direct testing and treatment of group A streptococcal (GAS) pharyngitis. The elements are fever, pharyngeal exudate, adenopathy, and absence of cough. Testing patients with low Centor score may lead to detection of GAS carriers, prescription of unnecessary antibiotics, and treatment ‘failure’. These cases are difficult to identify from the medical record. Artificial intelligence (AI) offers an approach for large-scale identification of such cases for quality improvement (QI) purposes.
Objective: Create and validate a semi-automated, AI-based classification system to screen clinical documents, identify cases and Centor criteria elements, and accurately assign a positive or negative score to each element.
Methods: Cross-sectional study of patients 3-18y who presented with sore throat and underwent rapid strep testing (RST) at a tertiary pediatric ED during Jan 1-July 1, 2017. Exclusions: immunosuppressed, non-verbal, medically complex, existing ENT pathology, or antibiotics/steroids in the prior 7d. We abstracted Centor criteria and input the clinical documents into our classifier using n-gram bag of words model. We divided the data into 60% derivation and 40% validation sets. We used descriptive statistics to determine the accuracy of the model, and kappa statistic (κ) for inter-rater agreement between the model performance and a human reviewer.
Results: There were 1392 patients tested for GAS, and 861 met inclusion criteria. Median age was 9.7y [IQR 6.5-14]; 56% were female (95% CI, 53-59%). On manual review, 189 (34%, 95% CI, 31-37%), scored 0-1 and testing was not indicated, highlighting the need for QI. The model performed as follows: cohort identification (sore throat, yes/no), accuracy 88% (95%CI, 85-91%), κ = 0.7; fever (present/absent), accuracy 89% (85-93%), κ = 0.78; cough (present/absent), accuracy 84% (79-88%), κ = 0.67; tonsillar exudate (present/absent), accuracy 91% (88-95%), κ = 0.62; and tender anterior cervical adenopathy (present/absent), accuracy 97% (95-99%), κ = 0.46.
Conclusion: AI can identify features of the Centor criteria with good accuracy and inter-rater reliability with a human reviewer. A passive surveillance system could feasibly screen and identify cases of unnecessary GAS testing for QI.