Quantitative Medicine Scientist The Critical Path Institute, United States
Disclosure(s):
Ayan K, Research Scientist: No financial relationships to disclose
Objective: Drug-Drug Interaction (DDI) information can be found in varied sources of medical literature - doctors' notes and nursing records, research articles, case studies, and review articles. In this study we evaluate the effectiveness of Small Language Models (SLMs) targeting the task extraction of key-value pairs of DDIs from unstructured text from medical literature. Fast and efficient extraction of DDI information from unstructured text accelerates the identification of potential adverse interactions, enabling researchers to focus on safer and more effective drug formulations, potentially reducing trial failures due to unforeseen adverse drug reactions.
Method: SLMs claim to offer efficient solutions, requiring less computational resources than their larger counterparts without compromising on performance. We evaluate three recent SMLs, Gemma by Google DeepMind, Phi-3 by Microsoft and OpemELM by Apple. Our methodology includes two specific experiments -
Prompting Technique Evaluation: We tested the ability of these models to interpret and extract DDIs using data from established databases. The models were assessed under zero-shot, one-shot, and few-shot conditions to evaluate their performance with varying levels of query complexity and assistive frameworks.
Dataset Enhancement and Application: We assessed their Retrieval-Augmented Generation (RAG) and in-context learning capabilities by systematically modifying the information in the prompts and gradually incorporating noise to perform a sensitivity and specificity analysis.
These experiments were designed to validate the practical utility of SLMs in pharmacological research.
Results: Each of the three models feature a unique architecture built on the transformer architecture. Our experiments demonstrate the difference between the three models in speed, performance and efficiency in a controlled experimental setup. In our experiments, Microsoft Phi performed the best with an average accuracy of 85.6%, closely followed by Apple’s OpenELM at 80%. Google’s Gemma performed very poorly, at 22.08%. Gemma on the other hand was the fastest model, taking approximately 0.78 times the speed of Phi, with OpenELM taking the longest at 1.9 times the time of Phi.
Conclusions: The use of SLMs in drug development offers a promising avenue for enhancing efficiency and reducing the time and costs associated with DDI extraction.
Acknowledgments: The Critical Path Institute is supported by the Food and Drug Administration (FDA) of the Department of Health and Human Services (HHS) and is 54% funded by the FDA/HHS, totaling $19,436,549, and 46% funded by non-government source(s), totaling $16,373,368. The contents are those of the author(s) and do not necessarily represent the official views of, nor an endorsement by, FDA/HHS or the U.S. Government.
Citations: Citations:
[1] Mehta, S., Sekhavat, M. H., Cao, Q., Horton, M., Jin, Y., Sun, C., ... & Rastegari, M. (2024). OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework. arXiv preprint arXiv:2404.14619.
[2] Team, G., Mesnard, T., Hardin, C., Dadashi, R., Bhupatiraju, S., Pathak, S., ... & Kenealy, K. (2024). Gemma: Open models based on gemini research and technology. arXiv preprint arXiv:2403.08295.
[3] Abdin, M., Jacobs, S. A., Awan, A. A., Aneja, J., Awadallah, A., Awadalla, H., ... & Zhou, X. (2024). Phi-3 technical report: A highly capable language model locally on your phone. arXiv preprint arXiv:2404.14219.