Extract, Transform, and Load
Extract, Transform, and Load (ETL) Your Data
ETL is an acronym for extract, transform, and load – a critical process when data is loaded from a source system into a data warehouse.
Step 1 – Extract
Extracting data from different source systems is the first step in the ETL process.
Most data warehousing projects involve consolidating data from different data sources.
Step 2 – Transform
The transform stage refers to a set of rules to transform the data from the derived source into the end target. This step also involves joining data from multiple sources, sorting data, generating surrogate-key values, turning multiple columns into multiple rows, and aggregating data.
Data cleansing can also occur during this step in the ETL process to ensure accurate data is loaded into the data warehouse. Some of the data cleaning tasks include:
- Name parsing and genderizing
- Verifying, standardizing street address, phone and email addresses
- Enriching data with ZIP+4 and lat/long coordinates info
- Eliminates duplicate records and/or match records
Step 3 – Load
This last stage in the ETL process essentially loads the data into the end target (usually the data warehouse), according to Wikipedia.
Data Quality Components for Pentaho - Powerful ETL and Data Quality Capabilities
Melissa Data has teamed up with Pentaho to offer powerful Extraction, Transformation and Loading (ETL) software with data quality capabilities – making any enterprise data quality initiative as easy as 1-2-3. Data Quality Components for Pentaho offers an intuitive, graphical, drag and drop design environment and a proven, scalable, standards-based architecture. A growing list of organizations big and small are making Data Quality Components for Pentaho the solution of choice over traditional, propriety ETL or data integration tools.
For more information on Data Quality Components for Pentaho,