News
|
How to Improve Customer Data Quality
By Saumya Chaki, principal
consultant, BI practice at PwC India.
In any customer-centric business, be it hospitality,
banking, retail, or insurance, there are numerous
touchpoints where the consumer interacts with the
business. Many interactions take place between the
consumer and the business through various direct and
indirect channels: direct marketing campaigns
(email, mailers, telemarketing, etc.); points of
sale; information kiosks; online shopping portals;
and feedback forms for services rendered.
During all these transactions or points of contact,
consumer data is collected in varying ways. The
trouble lies in the lack of a consistent framework
in collecting consumer attributes. Most
organizations collect the same consumer data through
multiple channels with no consistency in the
attributes collected. Hence, when these
organizations build data warehouses and data marts
to study consumer behavior, they lead to a large
number of duplicates in the consumer tables in the
warehouse or mart. This can be disastrous for any
business.
It can result in multiple mailers to the same
consumers or to consumers who have opted out of
direct marketing campaigns, resulting in legal
complications and loss of consumer loyalty. Any ROI
analysis would yield skewed figures if consumer data
is not consistent. A consistent or single view of
consumer data across the enterprise is necessary to
prevent such scenarios.
Consumer Deduplication Strategy
Data deduplication is the process of defining
duplicate consumer data in consumer-centric
databases, seeking corrective action to cleanse the
data from the duplicates, and ensuring that no
coherent, accurate, and relevant data is lost in the
process.
Follow these steps to formulate and implement a
successful consumer deduplication strategy:
1. Understand Data Quality
Data quality issues include inconsistency in
attributes, invalid data, and duplicate records. It
is recommended that data quality be enhanced and
issues be resolved before a deduplication process is
run. This ensures that the deduplication process
runs on
better quality data.
2. Investigate Data and Data Quality Issues
Data investigation is important not only to
determine the data quality issues, but also to
understand the key attributes needed to define a
consumer uniquely based on data profiling. The
records in the data environment under investigation
must be a good representative sample of data quality
issues and deduplication scenarios in the production
database. Data investigation can be done with tools
or manually, using written SQLs. Data patterns are
better exposed by automated tools and may be a
preferred approach.
3. Determine Match Rules and Criteria
Results of the data profiling exercise should be
published and proposed. Consumer attributes to be
used to match records must be understood and
confirmed by business users of the system. This is
important to ensure that the match criteria make
business sense. Typically, matching can be of three
types, namely commercial matching, household
matching, and individual matching.
Commercial matching involves matching businesses or
consumers belonging to business houses. Household
matching involves matching consumers to households.
Often, country specific third-party data is used to
do household or family matching. There are, however,
some scenarios which need to be handled when one
deals with third-party data:
• Third-party data providers normally charge for
each instance of consumer verification. This may
turn out to be a costly, time-consuming exercise,
and is usually done once a month or at larger time
intervals (like bimonthly or quarterly).
• Third-party consumer data may not exactly match
the consumer data that an organization builds up
over a period of time. When no data is found in the
third-party database corresponding to consumer data
in an organizational database, a decision needs to
be made on how these consumers will be matched.
Individual matching involves matching consumers
belonging to the same household, and is usually done
after household matching. In some cases, it may be
useful to match based on other useful attributes of
a consumer, such as number; name suffix; gender;
etc. Usually, matching is performed by data
cleansing tools.
4. Identify Survivorship Criteria
Now that records belonging to the same matched group
are identified, select a survivor record in each of
the matched groups. Survivorship criteria is a
product of the initial data investigation/data
profiling exercise. It is highly recommended that
business users agree with the survivorship criteria,
because identifying survivors based on attributes
that have limited business significance may be
detrimental to the efficiency and quality of the
deduplication process. The best way to identify the
survivor is to retain the record that best matches
the survivorship criteria. As consumer data often is
highly sensitive, it is important to retain the best
consumer data possible.
5. Determine Merge Rules and Criteria
Once the significant problem of finding the survivor
is resolved, it is now important to realize that
some attributes of the records marked as duplicates
may be more recent, more complete, or of better
quality. In such an instance, it is necessary to
merge these better attributes of the duplicate
records into the survivor record. Again, this action
needs to be performed on the basis of merge
rules/criteria. Merge rules are also defined on the
basis of data investigation and data profiling. For
instance, a merge rule could be to update the
survivor records address field with the address
field of the record with the longest address field
values. Where there are date fields, it is necessary
to retain the latest date (i.e., indicating last
change of address). It is highly recommended that
these merge rules are certified by business users.
6. Maintain Survivor Duplicates Trail History
It is important to note that while a set of records
may have been marked as duplicates and must be
purged from the consumer related tables in the
warehouse, it may be worthwhile to retain these
deleted records in trail tables, which store the
relationship of the duplicate record to the survivor
record, as well as the deleted records attributes.
7. Establish Match Rules Repository
It is highly recommended that all defined match
rules are stored in the warehouse as part of match
rules master table. This table captures the
relationship between duplicate records and reason
code for matching. The match rules master table
would be a one-stop shop, wherein one can analyze
the current match rules.
8. Establish a Reconciliation Report and Measurement
for Efficiency of Deduplication
Reconciliation reports are highly recommended in
consumer deduplication scenarios. These reports give
vital information about the dedupe process in terms
of the number of records marked as duplicates, due
to a given iteration of matching. Also, details
about survivor-duplicate relationship can be
determined from these reports. The overall
efficiency of the dedupe process can be measured in
terms of the percentage of duplicates that are
removed by the process. For example, if 15 percent
of the records in the warehouse were duplicate
records prior to running the deduplication process,
and after running the process the duplicates were
reduced to 10 percent, then the efficiency of the
deduplication process would be 33 percent, as shown
by the formula:
(% of duplicates before running dedupe - % of
duplicates after running dedupe) * 100% of
duplicates before running dedupe
The ROI on the dedupe process can be similarly
measured by finding how much the direct marketing
costs have gone down after dedupe. This can be
derived using the following equation:
(direct mail costs before running dedupe – direct
mail costs after running dedupe) * 100% direct mail
costs before running dedupe
Deduplication in Enterprises – A Status Check
While we addressed the strategy to capture the
deduplication in the previous sections, this
strategy can be applied in the data warehouse/data
mart or even in the source systems. The
implementation of deduplication in the source
systems should be seriously considered as a viable
option for two primary reasons:
• Integration of source systems with data cleansing
systems–for instance, some software provides
integration with transaction systems and customer
relationship management systems. This allows a lot
of flexibility to organizations in different
business domains to cleanse their systems at the
source layer.
• Organizations have realized the power of clean
data through an industry-wide effort to understand
the benefit of clean data in the source systems, and
downstream data warehouses and data marts. As a
result, lots of organizations are ready to invest in
data cleansing at the source layer. The benefits are
manifold. If data is cleaned in the source system,
any number of downstream systems benefit from better
quality data. If the deduplication strategy was done
at the warehouse level, the source would still have
duplicates, and any application that depended on the
source would have to cleanse the data separately in
each of the downstream applications.
There are, however, issues that need to be
considered when implementing deduplication in the
source systems. It is often a complex process and
may have some effect on the data availability
timelines. Hence, where transaction systems have
high performance benchmarks, it may be useful to do
the deduplication in the data warehouse or other
downstream systems.
For real-time data warehouses, it may be a better
strategy to implement the data cleansing in the
warehouse, because the data needs to flow from the
transaction system into the warehouse in real-time
or near real-time scenarios.
Deduplication Across Enterprise Solutions
Dedpulication processes can be applied to the
following enterprise areas to ensure that businesses
get the best out of their data:
Supply chain management. Data quality issues are of
key importance in managing supply chains because
data quality affects one’s ability to support not
only your own, but also the network’s business
processes with reliable, useable data. Typically,
modern supply chains involve interactions between
not only companies whose operations are run by the
supply chain, but also partners, suppliers,
retailers, and warehouse operators. Good quality
data ensures the ability to track vendors, inventory
management, customer invoicing, business analytics,
and business effectiveness, based on more accurate
and timely information. Deduplication helps improve
the data quality in supply chains by standardizing
product information, unifying complex vendor views,
and improving contact information for increasing
efficiency of product delivery and services routing.
Enterprise resource planning. ERP systems are also
data centric and are normally designed around key
business processes like materials management,
financial planning, human resources, and inventory
control. Hence, data quality is of paramount
importance here, as these transactions systems are
also often the backbone of data flowing into
downstream data warehouses and data marts.
Business intelligence/data warehousing. BI
dashboards/portals and data warehouses are the
backbone of decision support systems used by
enterprises today. Hence, the quality of data in
these warehouses and marts are of vital importance
to accurate business information and business
strategy. Enterprises are increasingly trying to
integrate data across multiple source systems
to get
a consistent view of business process data, and deduplication plays a vital role in achieving this
goal.
Customer relationship management. As mentioned,
customer data is sensitive and holds key information
about customer preferences. This makes it imperative
for companies to enhance the quality of customer
data to build better relationships with customers.
The key to more efficient CRM is a unified view of
the customer across the enterprise.
Regulatory compliance. Local and global legislation
around financial asset control and privacy
supervision are forcing enterprises to reexamine the
accuracy and reliability of information.
Traditionally, enterprises have tackled compliance
projects serially, reacting to regulations with new
IT initiatives. However, increased regulatory
demands such as Basel II and Sarbanes-Oxley are
demanding a reconsideration of strategy. Multiple
data integration efforts can leverage common
metadata and business rules that allow better data
to benefit all business processes. In this manner,
compliance actually moves from a business cost to
become a competitive advantage.
---Source: Information Management
Jan. 19, 2010 newsletter (http://www.information-management.com).
Saumya Chaki is a principal consultant with the
business intelligence practice at PwC India. He can
be reached at saumya.darsan.chaki@in.pwc.com.
|
|
|
Melissa Data
|
 |

| Enhance your
website, software or database with
easy-to-integrate data quality programming tools
and web services. |
|
|
|
|
 |

|
Save money on postage using leading
mail preparation software and other
direct marketing products. |
|
|
|
|
 |

Update & standardize addresses and
find out more about contacts in your
database.
|
|
|
|
|
 |

Find new customers perfect for your
business with our online and
specialty mailing lists.
|
|
|
|
|
 |

Locate the business information you
need such as ZIP Codes, address
verification, maps.
|
|
|
|
|

Download
your free copy of the Melissa Data product catalog.
|
|