Senior Scientist Critical Path Institute, United States
Objectives: In many populations, such as neonates, it is difficult or impossible to construct reliable reference ranges for laboratory values via a standard direct study. Using Real World Data (RWD) collected during normal clinical practice and appropriate statistical and machine learning methodologies, it is possible to estimate covariate-dependent reference ranges for a given population by indirect means. We present a data and analytics software pipeline to analyze RWD in OMOP format to produce and display lab value reference ranges, using the International Neonatal Consortium (INC) RWD database as a testbed.
Methods: Algorithms for indirect reference range estimation were surveyed, and the refineR method (Ammer et al. 2021) and its extension to account for covariates via GAMLSS (Ammer et al. 2023) were selected for their flexible statistical assumptions and R software. To integrate data structure and concepts from a variety of source formats with unharmonized coding and labeling systems, INC data were mapped to the OHDSI OMOP Common Data Model (CDM) (Voss et al. 2015). Target labs and covariates were identified using OMOP concept sets, built using ODHSI tools. Python and R interfaces were then written to allow the INC database to be analyzed using the refineR core, with the results displayed in a standardized user-friendly format.
Results: We have tested our software pipeline on three Electronic Health Record (EHR) datasets from the INC database. These data consist of RWD from Neonatal Intensive Care Units (NICUs). A variety of common hepatic, renal and hematology labs were selected for testing. Continuous covariates (gestational and postnatal age, birth weight) and discrete covariates (sex, discharge status) were extracted. The pipeline produces and displays covariate-dependent reference ranges for each OMOP concept selected by the user. Comparisons to existing pediatric reference ranges are provided where appropriate.
Conclusions: Extracting lab value reference ranges from RWD can be prohibitively labor intensive, requiring special statistical knowledge. We believe that a unified programming approach that provides general-purpose algorithms that interface with CDMs such as OMOP will provide a powerful set of tools to produce actionable Real-World Evidence (RWE), such as lab value reference ranges, from RWD. We are actively investigating extensions to our framework to incorporate AI/ML approaches that will improve the speed and robustness of the pipeline.
Citations: [1] Ammer, Tatjana, et al. "refineR: a novel algorithm for reference interval estimation from real-world data." Scientific reports 11.1 (2021): 16023.
[2] Ammer, Tatjana, et al. "A pipeline for the fully automated estimation of continuous reference intervals using real-world data." Scientific Reports 13.1 (2023): 13440.
[3] Voss, Erica A., et al. "Feasibility and utility of applications of the common data model to multiple, disparate observational health databases." Journal of the American Medical Informatics Association 22.3 (2015): 553-564.