Category: Fellows Posters
Artificial intelligence (AI) is an emerging field of technology that will play an increasingly meaningful role in health informatics, automation, and patient care. Being able to understand the strengths and limitations of this technology will be vital to our ability to utilize it safely and effectively in patient care. This project's purpose was to determine the capabilities of AI in understanding the context of pharmacy lexicon and its capability to understand pharmacy-specific linguistics (i.e. brand/generic, mechanism of action/indication) using unlabeled data from pharmacy text.
The Skip-gram model we developed was a two-layer neural network that creates a vector representation of words based on unlabeled text data. These vector representations have been previously shown to preserve semantic relationships between words when trained on domain-specific texts. Our Skip-gram model was trained on a corpus comprising of clinical pharmacy and medical texts pertaining to cardiology containing a total of 686,649 phrases and 20,792 unique word types. The critical parameters used in training as well as the details on how data was prepared will be uploaded in our open-source repository. Similarities between words were determined by single matrix-vector product, and analogical reasoning was performed with simple vector addition and subtraction.
The similarity algorithm was run on key terms: disease states (i.e. hypertension), measurements (i.e. INR), risk factors (i.e. alcohol use), and medications (i.e. simvastatin) to gather what the algorithm calculated to be similar terms.
The analogy reasoning algorithm was run on 5 tests each of the following relationships: brand-generic, drug-mechanism, drug-monitoring, and drug-indication. The goal of the algorithm would be to compute the fourth when given the first three. For example, a brand-generic test would be Plavix, clopidogrel, Effient, prasugrel.
The model successfully completed training and successfully calculated similar phrases. For example, diabetes was closely linked to the terms diabetol, HbA1c, diabetes mellitus, fasting glucose, normoglycemia, glucose, insulin, anemia, glycemic, and comparisons.
Of the 20 terms tested, the model was able to compute the analogy reasoning on 19 terms. The term that the model failed to calculate was not mentioned in the original corpus, and so the model was not able to learn its context. The model achieved a logical analogy in 45% of the cases found in the analogy reasoning test set.
This model was useful in understanding the capabilities of simple artificial intelligence in pharmacy, being able to somewhat understand pharmacy-specific semantic relationships. The model performed surprisingly well with the small domain-specific dataset we used as input (compared with industry standards of 6-77B words). With a larger dataset and increased processing power, the accuracy of this model in computing analogy reasoning could be improved.