Objectives: Scientific literature contains a wealth of information and data that can be used to inform QSP models. However manual curation and digitization of literature data is an extremely time intensive process. Large Language Models (LLMs) may improve efficiency of literature tasks, but general purpose LLMs such as GPT-4 or Claude-3 fall short in their ability to handle the technical complexities of the QSP domain. General models often hallucinate responses and rely on the information in their training data which may be inaccurate. Our aim in constructing Delineate is to leverage advanced LLM approaches to rapidly mine and structure data from scientific literature, enabling accurate and accelerated QSP modeling.
Methods: To tailor best-in-class LLMs for systems pharmacology applications, we designed an agent-based architecture capable of partitioning complex tasks into modular subtasks. This partitioning approach has been shown to increase robustness by allowing the system to focus on smaller, more manageable components, reducing the likelihood of errors and enabling optimization of each subtask [1]. We evaluated our agent system on a benchmark set comprising diverse tasks such as replicating components of models from literature, performing advanced Q&A on published models, and obtaining parameter values from experimental papers.
Since the majority of literature data is found encoded in plots, we designed computer vision algorithms that when combined with our LLM agents provide a semi-automated plot digitization solution. Certain tasks of plot digitization such as axis and label detection, and point identification are automated. The context around how the data was collected, such as trial design and dosing schedules, are also extracted via LLM and restructured into a tabular, user-defined format along with the digitized data. We compared the accuracy and speed of Delineate’s semi-automated plot digitizer to that of the commonly used Web Plot Digitizer [2].
Results: The Delineate platform allows for 70% faster digitization of plots compared to Web Plot Digitizer. In addition, the Delineate platform is able to autofill user-defined fields related to the plot such as experimental and clinical conditions under which the data was collected with 92% accuracy on a testing set of 100 clinical trial papers. The accuracy of our LLM agent on the QSP-related benchmarking set rose to 95% from 22% with GPT-4.
Conclusions: By combining computer vision techniques with LLM agents, Delineate achieves time saving and accuracy gains in literature-based tasks. Delineate may enable the construction of large datasets for parameter fitting and model validation, easier replication of published models, and efficient knowledge consolidation to support QSP model building.
Citations: [1] Ruan, J. et al. “TPTU: Large Language Model-based AI Agents for Task Planning and Tool Usage,” 2023 [2] Rohatgi, A. "Webplotdigitizer,” 2022