News
Trial-and-Error
Method of ETL
Extract, transform and load (ETL) tools offer many
benefits to businesses by establishing data services
that can be tapped throughout the enterprise and
offer benefits to IT through productivity gains.
Rick Sherman points out the three primary areas
where companies tend to blunder in their initial use
of ETL. To make sure your company isn’t making these
errors.
There are two opposing views of ETL processing:
First, ETL processing is a paradigm shift from
custom SQL coding. Second, the more things change,
the more they stay the same. What does not change
with the use of ETL tools is the need for
operational processing best practices for your ETL
procedures.
These operational controls and practices have
evolved over the years and should be leveraged in
any new data integration processes. After all,
implementing the highest quality applications and
data is not something new.
ETL Usage Mistakes
1. Turning a blind eye to data quality measures and
metrics in ETL processing. I see it all the time
when reviewing an enterprise's data integration
architecture: data quality processing that's a
tangled mess or totally nonexistent. Why is this
common? People either fail to do data profiling to
understand the true extent of data quality issues in
the source systems, or they assume any data quality
issues are the sources' problem, not theirs.
The latter is a weak excuse. Data quality is
everyone's problem. You need to get the business
requirements for data quality metrics, profile your
data sources to create a baseline, build the metrics
into your ETL processes, monitor them, report on the
data quality levels and finally work with the
business users if corrective action is needed.
2. Using ETL tools without understanding ETL
processing and best practices. I've found that many
enterprises' first experience with ETL tools is
simply to use them to execute SQL code or stored
procedures. Using the ETL tool as a scheduling
process to run custom SQL scripts or procedures is a
waste of an essential tool in your data warehouse (DW)/business
intelligence (BI) implementations. With this
approach you get none of the benefits of the ETL
tool - no workflow, no data lineage, no reuse and no
productivity boost - and you've probably been slowed
down in writing the custom SQL code.
Although they're using the tool, these firms don't
understand ETL best practices and the data
integration paradigm it makes possible. The first
inclination of developing source-to-target mappings,
especially if you come from the custom SQL coding
world, is to try to do everything in one operation.
The reality is that stored procedure is a series of
steps contained in one physical program. The ETL
workflow is the analogous container, but it can do
much more. Reuse of common error and exception
processing, conditional logic and parallel
processing are all examples of what you may build
into that ETL workflow.
Also, when you build custom code invoked by the ETL
workflow, it is just like a "black box" to the
workflow. If the same processing was created using
the ETL tool, then the process would be documented,
repeatable and visible to your data integration
processes from the data lineage and metadata
management capabilities of the ETL tools.
3. Treating ETL processing differently from other
operational processes. When any of your enterprise
applications transfer or load data, there are always
(or should be) operational controls for how it is
defined, implemented and monitored. But, too often
this same level of operational control is not
implemented in the initial loading of your DW or
subsequent data marts.
Why is this? Let's take the DW and data mart as
separate topics. For the DW, often the ETL
developers were not from the "old guard" enterprise
applications group, but rather a new crop of
developers experienced in database development.
These new ETL developers do not have experience in
operational processing. In addition, no one from the
business is asking for these controls because they
don't know they're supposed to. When the DW does go
into production and there are problems, everyone is
surprised that these controls were not built in.
With regard to creating and loading data marts,
there is often a much more relaxed approach to
development. Many times data marts are built with
custom code, but even when an ETL tool is used, data
marts are built as if they were "sandboxes" (a
testing environment that isolates untested code
changes and outright experimentation from the
production environment). Too often, they are not
viewed as production applications that facilitate
real business decisions - but they are! If you build
data marts and businesspeople are using them for
decision-making, then they need all the same
operational control and ETL best practices as the
data warehouse.
---Source: Rick
Sherman has more than 20 years of business
intelligence and data warehousing experience and is
the founder of Athena IT Solutions. You can reach
him at rsherman@athena-solutions.com or (617)
835-0546. This article was originally published in
the April issue of DM Review Magazine (www.dmreview.com).
|
|
|
Melissa Data
|
 |

| Enhance your
website, software or database with
easy-to-integrate data quality programming tools
and web services. |
|
|
|
|
 |

|
Save money on postage using leading
mail preparation software and other
direct marketing products. |
|
|
|
|
 |

Update & standardize addresses and
find out more about contacts in your
database.
|
|
|
|
|
 |

Find new customers perfect for your
business with our online and
specialty mailing lists.
|
|
|
|
|
 |

Locate the business information you
need such as ZIP Codes, address
verification, maps.
|
|
|
|
|