Eric Song, PhD: No financial relationships to disclose
Objectives: Understanding and managing adverse events (AEs) is crucial in ensuring patient safety and the overall treatment regimens success in drug development. However, the mechanisms behind AEs are multifaceted and can be influenced by a complex interplay of drug-related, patient-specific, and external factors, which makes predicting, preventing, and managing these events challenging. Nowadays Machine Learning (ML) presents a transformative opportunity to oversee these challenges. In this work, we aim to evaluate various ML models in identifying covariates and predicting the risk of AEs based on clinical trials data.
Methods: 780 solid cancer patients with administration of an experimental antibody drug conjugate (ADC) in 7 clinical trials were included, and one safety endpoint, peripheral neuropathy (observed in ~20% patients) was selected for analysis. To investigate the mechanisms underlying the AEs, a comprehensive dataset was collected, including 337 baseline and post-treatment lab assessments, cytokine profiles, PK parameters and PD, etc. We employed miceforest [1], which leverages Multiple Imputation by Chained Equations with LightGBM, to address the missing-value issue. Then Boruta [2] was applied for feature selection, identifying covariates associated with AEs. Utilizing these covariates, we developed 4 ML models: Random Forest (RF), XGBoost, Neural Network (NN), and Elastic Net (ENet), with hyperparameters tuned by cross-validation to ensure optimal performance. The performance of ML models in predicting the risk of AEs was finally evaluated across all 7 studies encompassing diverse cancer types.
Results: After quality control, a total of 765 patients with 134 features were finally available for analysis. Boruta identified 18 covariates through an exhaustive process of 1000 iterations, during each a random forest model with 1000 trees were fitted and the features’ importance were evaluated. In two single-cancer clinical trials, the AUC (Area Under the Curve) of the ROC (Receiver Operating Characteristic) curve for RF and XGBoost ranged from 0.76 to 0.79, while NN and ENet yielded lower AUC scores, between 0.68 and 0.73. All 4 ML models got reduced efficacy across the remaining 5 mixed-cancer-type trials, with AUC scores ranging from 0.62 to 0.69. To assess the general predictive performance of our ML models, we conducted 1000 simulations by random sampling with 80/20 split for training/testing. These simulations reinforced the superior performance of RF and XGBoost over NN and ENet. The AUC is 0.7128 (±0.075) for RF and 0.7002 (±0.07) for XGBoost, while 0.692 (±0.028) and 0.647 (±0.039) for NN and ENet.
Conclusions: Boruta is an effective feature selection method to identify covariates associated with target endpoints. ML models, particularly RF and XGBoost, exhibit superior performance in predicting risk of AEs. Moving forward, integrating time-course data in ML models might provide further improvement in prediction accuracy.
Citations: [1] miceforest: https://github.com/AnotherSamWilson/miceforest [2] Kursa et al, Feature Selection with the Boruta Package. Journal of Statistical Software, 36(11)