As a Data Engineer, I want to be able to understand the data vocabulary, so that I can communicate about the data more meaningfully and find tools to deal with the data for computing – Data Engineer
Let’s start with this: Binary Data, Non-binary Data, Structured Data, Unstructured Data, Semi-structured Data, Panel Data, Image Data, Text Data, Audio Data, Categorical Data, Discreet Data, Continuous Data, Ordinal Data, Numerical Data, Nominal Data, Interval Data, Sequence Data, Time-series Data, Data Transformation, Data Extraction, Data Load, High Volume Data, High Velocity Data, Streaming Data, Batch Data, Data Variety, Data Veracity, Data Value, Data Trends, Data Seasonality, Data Correlation, Data Noise, Data Indexes, Data Schema, BIG Data, JSON Data, Document Data, Relational Data, Graph Data, Spatial Data, Multi-dimensional Data, BLOCK Data, Clean Data, Dirty Data, Data Augmentation, Data Imputation, Data Model, Object (Blob) Data, Key-value Data, Data Mapping, Data Filtering, Data Aggregation, Data Lake, Data Mart, Data Warehouse, Database, Data Lakehouse, Data Quality, Data Catalog, Data Source, Data Sink, Data Masking, Data Privacy
Now let’s go here: High volume time-series unstructured image data, High velocity semi-structured data with trends and seasonality without correlation, High volume Image data with Pexels Data source masked and stored in Data Lake as the Data Sink.
The vocabulary is daunting for a beginner. These 10 categories (ways of bucketizing) would be a good place to start:
- Data Representation for Computing: How is Data Represented in a Computer?
- Binary Data, Non-binary Data
- Data Structure & Semantics: How well is the data structured?
- Structured Data, Unstructured Data, Semi-structured Data
- Sequence Data, Time-series Data
- Panel Data
- Image Data, Text Data, Audio Data
- Data Measurement Scale: How can data be reasoned with and measured?
- Categorial Data, Nominal Data, Ordinal Data
- Discreet Data, Interval Data, Numerical Data, Continuous Data
- Data Processing: How is the data processed?
- Streaming Data, Batch Data
- Data Filtering, Data Mapping, Data Aggregation
- Clean Data, Dirty Data
- Data Transformation, Data Extraction, Data Load
- Data Augmentation, Data Imputation
- Data Attributes: How can data be broadly characterized?
- Velocity, Volume, Veracity, Value, Variety
- Data Patterns: What are the patterns found in data?
- Time-series Data Patterns: Trends, Seasonality, Correlation, Noise
- Data Relations: What are the relationships within data?
- Relational Data, Graph Data, Document Data (Key-value Data, JSON Data)
- Multi-dimensional Data, Spatial Data
- Data Storage Types:
- Block Data, Object (Blob) Data
- Data Management Systems:
- Filesystem, Database, Data Lake, Data Mart, Data Warehouse, Data Lakehouse
- Data Indexes
- Data Governance, Security, Privacy:
- Data Catalog, Data Quality, Data Schema, Data Model
- Data Masking, Data Privacy
More blogs to deep dive into each category and the challenges involved. Let’s peel this onion.