(W-012) Reinforcement Learning for Pharmacometrics: A Proof of Concept and Future Directions
Wednesday, November 13, 2024
7:00 AM – 1:45 PM MST
Matthew Wiens, M.A. – Senior Scientist, Metrum Research Group; Samuel Callisto, Ph.D. – Senior Scientist, Metrum Research Group; Megan Cala Pane, Ph.D. – Research Scientist, Metrum Research Group; Hillary Husband, Ph.D. – Research Scientist, Metrum Research Group
Senior Scientist Metrum Research Group, Minnesota, United States
Disclosure(s):
Samuel Callisto, PhD: No financial relationships to disclose
Objectives: Reinforcement learning is applied in tandem with traditional pharmacokinetic (PK) modeling approaches for drugs with narrow therapeutic indices to identify individualized dose regimens based on key covariates. This approach is demonstrated through a proof-of-concept example with vancomycin, an antibiotic with treatment-limiting toxicities.
Methods: Reinforcement learning is a branch of machine learning that aims to create a model of a complex system through a “learner” interacting with an “environment”. The learner receives a “reward” to aid in the learner’s developing perception of the environment. As the learner receives feedback from the environment, it can make better decisions about how actions will impact the expected reward. Applied to pharmacometrics, a PK model can be envisioned as an environment, with various PK and pharmacodynamic (PD) outcomes, such as exposures or clinical endpoints, serving as rewards to be optimized.
A population PK model for vancomycin was adapted from a literature model which included covariate effects of weight, age, and serum creatinine [1]. A simulated dataset was constructed using demographics and laboratory values from the NHANES dataset [2]. Keras and Tensorflow using the R interface were used to create a deep reinforcement learning model (RLM) [3]. The RLM was trained on simulated patients, each randomly assigned a feasible dose amount and interdose interval. The learner was primarily rewarded if the simulated patient’s peak and trough concentration remained within established clinical therapeutic ranges. The RLM was then used to identify optimal dose amount and interdose interval for new patients. RLM-optimized regimens were compared to the standard-of-care dose recommendations. Sensitivity of model performance to the size of training data and reward strategy were also investigated.
Results: The RLM was able to predict the optimal dose and interdose interval for a set of new patients based on randomly generated covariates. Simulations verified that the dose regimens resulted in peak and trough concentrations within the accepted range for all subjects. The RLM provided individualized dose regimens which differed from clinical guidelines while maintaining concentrations between the peak and trough limits for vancomycin. Improved model performance was seen with penalty and reward strategies that scaled by deviation from or proximity to the center of therapeutic range, respectively.
Conclusions: The proposed RLM approach was used to generate novel dose regimens for vancomycin. This combination of pharmacometrics and neural networks may improve dosing of compounds with narrow therapeutic ranges. RLMs have the potential to improve patient outcomes by identifying individualized regimens which are more complex than current clinical recommendations.
Citations: [1] Colin, PJ, et al. "Vancomycin pharmacokinetics throughout life: results from a pooled population analysis and evaluation of current dosing recommendations." Clinical pharmacokinetics 58 (2019): 767-780.
[2] Centers for Disease Control and Prevention (CDC). National Center for Health Statistics (NCHS). National Health and Nutrition Examination Survey Data. Hyattsville, MD: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, [2017-2018][https://wwwn.cdc.gov/nchs/nhanes/].
[3] Abadi M, et al. “TensorFlow: Large-scale machine learning on heterogeneous systems.” (2015). Software available from tensorflow.org.