Data harmonization is the process of aligning and integrating data from multiple sources to create a singular, unified dataset.
The process involves:
Data Standardization: aligning the data so it follows a consistent format and structure.
Example: Converting date and time in one standard format.
Data Cleaning: finding and correcting errors in a data set.
Example: Removing duplicate records and correcting typos.
Data Transformation: converting data entries into a singular format
Example: Altering categorical values like turning “M” or “Male” both in “Male” to Standardize the inputs.
Data Integration: combining data from multiple sources into a singular, yet coherent dataset
Example: Combining the data air particle counters from different manufacturers into one data set that makes sense.
Metadata Management: Managing the data about the data, including definitions, formats and relationships.
Example: Creating a data dictionary that defines every data element and its specific format.
The Overall Goal of Data Harmonization is to make unique data formats, compatible and comparable, in order to enable more effective analysis, reporting and at the highest level decision making.
So what does this mean in an actual GMP setting?
When a facility has multiple air particle counters from different manufacturers…
Even though these two devices are sampling for the same thing, they’re data format is quite different.
These differences force either manual corrections when a lab tech inputs them into their database…
Or in the case of automated data collection it requires a data engineer to standardize data inputs from multiple sources into a unified field with a singular format.
Because standardized data allows for data analysis and improved decision making.
So why is it so important, especially in the Pharma manufacturing industry?
I’ve heard the term data harmonization thrown hundreds of times by our team at Phizzle, several device manufacturers, and prospective clients.
Although I understood the general concept, I knew that was not enough…
So I went down a data harmonization rabbit hole…
Researching everything from what exactly it is, how it's implemented and how it's applied across various industries.
More and more it became glaringly obvious why it's such an overwhelming concern in pharma and life sciences manufacturing specifically.
Where operations are producing mission critical drugs and vaccines that possess the ability to change and save people’s lives…
These manufacturing facilities have hundreds if not thousands of quality assurance instruments…
That produces more and more data every year…
These QA instruments like air particle counters have dozens if not hundreds of different data formats, data collection methods and unique quirks that need to be accounted for and ironed out in order to achieve data harmonization.
Without non-uniform data, sample data is unusable in the context of data analysis (siloed data).
Data harmonization is absolutely essential for creating a database with multiple data sources, that consistently enables reliable analysis and decision making.
Comentários