Building end-to-end data engineering solutions for medical research and analysis

Authors

Karthik Chava
Senior Software Engineer, Knipper Princeton, Atlanta GA, United States

Synopsis

This capability is really a set of skills required to get data to a ready state for analysis where focus is on the building of reproducible, manageable, reliable, enduring, data payloads for analysis and delivery. Data engineering is different from data science but serves as a precursor skill for high quality delivery of data and some level of creation of data products from mainly structured data . Data Engineering has an intense development activity going through data product enablement, scale and sustainability as a function of the service. The capabilities of data technology for data engineering within the enterprise is at scale but limited to initial ingest activities while the capabilities of Information and Data Management Organizations lag farther behind on supportive technology enabling strategies of automating and doing this work at scale (Amer-Yahia et al., 2022; Ates et al., 2022; Chamari et al., 2023). Healthcare, especially Medical Research has made only small improvements in roads at enabling data engineering to support effectively through Data Science Solutions, high quality management and delivery services on data for Data Science solution product creation. Low hanging fruit such as Electronic Medical Record and Patient Health Information have been addressed in initial constructs. Small healthcare organizations have very few resources for Data Engineering Services and Solutions building and primarily operate in a physical model for Data Organization, Management and Delivery. Larger organizations operate at scale for initial ingest building pipelines and storage platform constructs but very few enable scalable model management capabilities or data pipeline reduction management processes for easy, self-guided exploration and validation of data for data product creation involving Data Science Solutions or any other type of discipline and therefore they have low levels of Data Product Creation Work Quality.

Downloads

Published

6 June 2025

How to Cite

Chava, K. . (2025). Building end-to-end data engineering solutions for medical research and analysis . In Revolutionizing Healthcare Systems with Next-Generation Technologies: The Role of Artificial Intelligence, Cloud Infrastructure, and Big Data in Driving Patient-Centric Innovation (pp. 92-104). Deep Science Publishing. https://doi.org/10.70593/978-81-988918-5-3_8