Objectives: Systematic reviews (SRs) are time-intensive, and the use of artificial intelligence (AI) has the potential to reduce the time required for the systematic review process. Our objective is to determine if the AI function of DistillerSR is comparable to the conventional dual review by human subject matter experts during the title/abstract phase of the SR process.
Methods: This analytical comparative pilot study analyzes the AI function of DistillerSR for the title/abstract review of references during SRs. We performed a retrospective review of two SRs using our conventional method of dual review by human subject matter experts. To determine the equivalency of DistillerSR’s AI function, we created new projects using the same pool of references as the original projects and created an AI training set using the historical data from the original review. Then, we applied the DistillerSR’s AI tool to the remaining references and reviewed and compared the outcomes of the review methods to investigate equivalency. We calculated the sensitivity and specificity of the AI function and assessed the similarities and differences in the results obtained.
Results: To determine sensitivity and specificity, we compared the articles included at the title/abstract screening stage with the final pool of evidence included in the published guidelines, as that is of primary importance to our subject matter experts. The sensitivity of DistillerSR’s AI tool was 93% for Project 1, with the AI tool missing 8 samples during title/abstract review that were included by human reviewers in the final evidence tables. AI sensitivity for Project 2 was 66%, with the AI tool missing 34 of 116 samples ultimately included in the final evidence tables. The specificity was 58% for Project 1 and 89% for Project 2.
Conclusions: Our pilot project sensitivity calculations for DistillerSR’s AI tool do not currently meet our acceptability threshold to allow the inclusion of this AI tool into our guideline development process. Further research is planned to continue our evaluation by performing a retrospective analysis in at least 4 additional guideline-informing SR projects. These projects vary in size and scope thus will provide valuable data about how the AI tool performs in different scenarios. There is potential that AI can be combined with human review to maximize the AI tool’s capabilities while incorporating the expertise of human subject matter experts. This may help us maintain the high-quality standards required for medical practice guideline development projects.