News
|
Proven Strategies for Large-Scale Data Migration Projects
By Satyajeet Dhumne, data consultant
As one of the core data management activities, data
migration has been practiced ever since the
invention of computers. However, it can be the most
neglected task on IT managers’ lists of things to
do, resulting in poor quality data in the target
system. The observation is not new, but is commonly
seen throughout the industry. It is estimated that
84% of data migration projects fail. The impact of
data migration project failure can be numerous
ranging from:
• Breakdown of target systems
• Poor data quality in the target environment
• Loss of business opportunity
• Cost overruns, etc.
What is Data Migration?
The term “data migration” is used in several
contexts for data movement activities. Let’s look at
the definition of data migration from Wikipedia:
Data migration is the process of transferring data
between storage types, formats or computer systems.
Data migration is usually performed programmatically
to achieve an automated migration, freeing up human
resources from tedious tasks. It is required when
organizations or individuals change computer systems
or upgrade to new systems, or when systems merge
(such as when the organizations that use them
undergo a merger/takeover).
This article will address the large-scale data
migration projects, where data is to be moved from
source (old) system(s) to the target (new) system(s)
on a one-time basis, usually as a result of
application or technology upgrade initiative.
The business objective of a data migration project
is to move the data set of interest from the source
system to the target system, while improving data
quality and maintaining business continuity.
Proven Strategies
In this article we discuss the proven strategies for
executing large-scale data migration projects. The
list has been compiled over years, after working on
numerous large-scale critical data migration
projects. Each strategy can be adopted with some
level of customization, as per an individual
organization’s needs.
Strategy 1: Invest in Profiling Source Data
Source data is the starting point for any data
migration effort. Understanding characteristics of
source data is paramount for the success of the data
migration project for several reasons – to uncover
undocumented data relationships, data quality, data
volume, data anomalies, etc.
Data profiling
essentially provides x-ray vision of the source data
sets, which helps to understand the strengths and
weaknesses of the data sets. The investment made
will have direct impact on the effectiveness of
downstream processes and software code components.
Also, it is important to define the scope of the
data profiling exercise up front to avoid any
overspending on this task. Basic data profiling can
be performed by developing scripts; however, for
highly complex and large data sets, using an
industrial strength data profiling tool is worth the
investment.
Strategy 2: Create a Data Migration Process Model
Data migration can be as simple as a single step
with just one source and one target system, or it
can be a highly complex process involving multiple
source systems, multiple steps and multiple target
systems. Create an elaborate process model depicting
every step of the migration process. The artifact
serves as the road map for moving data, as well as
an agreement among the stakeholders involved. The
process model also serves as input to the downstream
administration, configuration management and
software development processes. The process model
should have interim steps to validate volume and
quality of data that is flowing through the process.
By having the embedded checkpoints, data analysts
can make sure the exceptions are within accepted
limits and there are no hidden surprises.
Strategy 3: Define Roles and Responsibilities Up
Front
Data migration can be a complex and daunting project
involving several stakeholders and IT task
managers/team leaders. A formal handshake at every
critical step of the data migration process is
imperative. The architects and the project manager
should identify all possible roles and assign
responsibilities to the roles as part of project
planning. The project manager then should formally
assign these roles to all project staff members. By
assigning roles and responsibilities up front,
project leadership can ensure that entire data
migration life cycle is supported with appropriate
accountability established. Conduct a formal
walkthrough of the “Roles and Responsibilities”
document to get buy-in from all stakeholders and
project staff members.
Strategy 4: Divide and Conquer
Just like any other large and complex task, data
migration also should be divided around logical
grouping of data – such as business area, geography,
cost center, etc. The choice of such logical
groupings depends on the business context for data
migration task. It is recommended to choose the
smallest data set (Hawaii) first and then move on to
larger data sets (California). By following such
methodology, the team can learn and fine-tune the
migration process early on with smaller data sets,
thus minimizing the risks. Migration of each data
set can be treated like a release, which will help
the team immensely in communication. Each release
should be followed by a formal release evaluation
step – to document and to educate – and refinements
for next release.
Strategy 5: Invest in Technology/Tool Training
For large-scale migration projects, it is
recommended to invest in proven technology/tools for
obvious reasons – automation, metadata collection,
scheduling, error handling, etc. If the team is new
to such technology/tools, then it is highly
recommended to invest in formal training for the
staff that is responsible for development and
execution of data migration code components. By
investing in such training, project leadership can
minimize the risk associated with the learning curve
involved in the project. Also during the training
process, the team gets the opportunity to establish
a relationship with vendor’s technical support
staff.
Strategy 6: Conduct Performance Testing
For large-scale data migration efforts, the size of
the data sets being moved from source to destination
data stores can be overwhelming. Due to business
and/or operational requirements, most of the data
migration projects have predefined and short time
windows for moving data. Hence, it is imperative for
the code components to have acceptable performance
levels. It is highly recommended to fully test the
code components for performance at production scale.
The project staff should continue to tune the
software and/or configuration parameters until the
desired throughput has been achieved. The repetitive
performance testing will also help staff get
acquainted with the technology and the migration
process.
Strategy 7: Have a Plan B
Last, but not least, just like for any
mission-critical project, have a plan B. Even after
significant planning, testing and rehearsal,
migration projects tend to face surprises. Hence,
from a business continuity standpoint, it is
imperative to have an alternative solution planned
and tested before the project begins. The plan B
must be formulated with inputs from all stakeholders
including business leadership, business users,
operational IT staff and migration project
leadership. The migration project leadership should
get sign off from all stakeholders for the plan B
and communicate any changes thereafter. By
communicating the plans and intentions to all
stakeholders, the project leadership can ensure that
all dependent business processes are prepared for
the change.
Conclusion
As we learned, most of the data migration projects
fail for various reasons. One of the primary reasons
for such failure is underestimation of the scale and
complexity of the data migration effort. By
proactively investing in estimation and planning, IT
managers can get a good handle on the project. Data
migration is a multidimensional effort, which can be
time sensitive and mission critical. By following
these simple and proven strategies, IT managers can
certainly improve the probability of success.
|
Turn your ETL tool into a data quality toolkit
|
Affordable data quality and enrichment transforms for SQL Server Integration Services and allow you to profile, monitor, cleanse, match, and standardize your data.
Click here for a free demo |
---Source: The Data Administration
Newsletter Sept 1, 2009 (www.tdan.com). Satyajeet
Dhumne is an experienced consultant in the fields of
data warehousing, business intelligence, and data
management. Reach him at sgdhumne@yahoo.com.
|
|
|
Melissa Data
|
 |

| Enhance your
website, software or database with
easy-to-integrate data quality programming tools
and web services. |
|
|
|
|
 |

|
Save money on postage using leading
mail preparation software and other
direct marketing products. |
|
|
|
|
 |

Update & standardize addresses and
find out more about contacts in your
database.
|
|
|
|
|
 |

Find new customers perfect for your
business with our online and
specialty mailing lists.
|
|
|
|
|
 |

Locate the business information you
need such as ZIP Codes, address
verification, maps.
|
|
|
|
|

Download
your free copy of the Melissa Data product catalog.
|
|