Process

supply chain
data engineering
Author

Ryan Garnett

Published

February 7, 2023

After being shipped from the farm the plants are evaluated to identify their relative quality, which determines the potential usage. Consider an apple, various factors (i.e. shape, visual appearance, size, number of bruises, etc.) will decide its end usage. That decision will set in motion if the apple should be juiced, pureed into apple sauce, cut and frozen, or used as-is.


The data equivalent is data transformation within the area of data engineering. In many cases the data collected is not ready for consumption in all use cases. Like with food, data needs to be profiled to identify any defects and determine what transformations (reshape, filter, aggregate, summarize, etc.), what data quality issues (data types, missing values, values out-of-range, etc.), and how to clean identified dirty data.


Key points