Quantitative Medicine Developer II, Quantitative Medicine Program Critical Path Institute Tucson, Arizona, United States
Disclosure(s):
Lauren Quinlan, BS: No financial relationships to disclose
Objectives: Bronchopulmonary dysplasia (BPD) is a chronic lung disease that affects tens of thousands of infants yearly, most often those born prematurely and requiring supplemental oxygen support. The International Neonatal Consortium (INC) has an opportunity to collate and evaluate real world data (RWD) in the format of the Observational Medical Outcomes Partnership (OMOP) common data model (CDM). The OMOP CDM unifies RWD across different clinical sites, vocabularies, and any other discrepancies within medical event collection. The objective of this project is to develop a curation pipeline to transform INC RWD into an analysis set for modeling and simulation, and to assess the challenges of this process.
Methods: The RWD were queried using keywords from a predefined list of covariates such as blood gas measurements (fraction of inspired oxygen [FiO2], partial pressure of carbon dioxide [PaCO2], etc.), and symptoms (wheezing, sepsis, etc.). OMOP data instances were aggregated for each covariate (e.g., PaCO2 measurements from both venous and arterial samples) with input from subject matter experts (SMEs). Baseline was defined as time of birth, and BPD was assigned to infants with any type of respiratory support during 36-37 weeks postmenstrual age. Chronological age was used as the model’s longitudinal component. Respiratory support concepts were collected and aggregated into support categories including oxygen therapy, mechanical ventilation, and tracheostomy care.
Results: Subject-level data of 2193 babies (1244 with BPD) were curated. The curation process came with many challenges. Only 11% of subjects and 36% of records were retained due to uneven paucity of covariate records and longitudinal restrictions from lack of timing granularity. Longitudinal electronic health records lack the serial consistency of clinical trial data, creating large gaps within subjects’ timelines and introducing more uncertainties in analysis. Lack of consistent and/or clearly defined respiratory support records also rendered many instances of analysis endpoint FiO2 uninterpretable, further restricting the analysis set. Though respiratory support concepts were abundant in the data, defining concise levels of support proved challenging. Support concepts could be ambiguous, e.g. “Ventilation mode Ventilator”, and required additional investigation into source concepts and SME input before being useful in analysis. Finally, the learning curve of the OMOP CDM and associated platforms like ATLAS was steep.
Conclusions: OMOP CDM RWD provide a rich source of data in the disease area. But the iterative process of curation toward an analysis set comes with many challenges, including lack of alignment in longitudinal data instances, ambiguity of endpoint interpretation, and heterogeneous sparsity of covariate data. Decision makers should take these challenges into consideration when assessing the feasibility of using RWD in the drug development process.