News
|
Are Duplicate Records A Blow to Your Bottom Line?
By Abby Garcia Telleria,
copywriter & editor of Melissa Data’s DQ Insider
newsletter
Is the problem of poor data quality an ongoing
challenge for your organization?
A recent TDWI study reported that organizations lose
more than $611 billion each year due to bad data.
Even more shocking – 42 percent of organizations
have made no effort to monitor the quality of their
data, according to a study by The Information
Difference.
One major root cause of poor data quality is
duplicate records. Duplicate records – at any level
or amount – can adversely affect the performance of
your database, as it can tarnish vital customer
contact information. Conflicting data prevents your
organization from gaining a single, accurate view of
your customer – creating the risk of making poor
business decisions.
Duplicate records also lead to increased manual
labor, lackluster customer response-driven
initiatives, and storage and retrieval optimization
concerns.
But there is a way you can turn chaotic data into
actionable information – by identifying and
eliminating duplicate records.
The Problem of Finding Non-Exact Matches
Deduplication of data through the use of merge/purge
solutions is one of the critical components in the
data cleansing process. But the biggest roadblock to
identifying duplicates lies in detecting non-exact
matching duplicate records –data that appears to be
multiple sets of unique information, but are
actually duplicate records. These “non-exact
matching” duplicates are very difficult to identify.
For example, a ‘Beth Smith’ at ‘United Data
Machines’ can be recorded in the same or different
database as ‘Smithe, Elizabeth’ at ‘UDM’. In
reality, Beth Smith and Elizabeth Smithe are the
same person, but your organization might identify
the data as two different contacts, with two
different order histories, two different buying
patterns, etc.
The most effective merge/purge applications use
fuzzy matching algorithms that identify these
hard-to-spot, non-exact matching duplicates.
Fuzzy matching is a mathematical process that
determines the similarities between data sets,
information and facts – where the outcome is neither
true nor false, nor 100 percent certain, hence the
word, “fuzzy.” The process compares any data type of
any length and from any place in a field to find
non-exact matches.
Types of Fuzzy Matching Algorithms
There are several different algorithms that can be
implemented as part of the deduplication process.
Phonetic matching - Utilizes the phonetic
algorithm to detect “alike-sounding” relationships
between words. Phonetic matching allows you to
perform approximate searches, instead of just
‘exact’ matches – thus enabling your organization to
find variations of a name.
N-gram or Q-gram-based - The linear n-gram or
q-gram-based algorithm models are primarily used in
statistical, natural language processing. An n-gram
is a subsequence of n items from a given sequence –
which can be phonemes, syllables, letters, words or
base pairs, as defined by Wikipedia.
Jaro-Winkler - The Jaro-Winkler distance is a
measure of similarity between two strings. It is
mainly used in the area of record linkage for
duplicate detection.
Power and Performance in One Package
Most mailing software packages integrate a
deduplication function as part of the address
hygiene process to not only eliminate bad addresses
from the mailing, but find and eliminate duplicate
records. The combined process ultimately saves the
mailer maximum dollars on both printing and postage.
For mailers needing a more potent merge/purge
deduping process, they can choose to incorporate a
more sophisticated program that identifies the most
difficult-to-detect duplicate records and also
enables USPS® CASSTM address verification for
superior list hygiene.
Melissa Data’s MatchUp software facilitates the most
sophisticated record matching process available with
its advance fuzzy matching algorithms, along with
CASS address verification routines – for
organizations that are serious about cleaning up and
clearing out the waste.
Protecting the Value of Your Database
The true value of your database is determined by one
fundamental component – the quality of your data.
Without data that is reliable, accurate and updated,
your organization can’t deliver trusted customer
information throughout the enterprise. Detecting the
most difficult-to-detect duplicate records through
the use of deduplication solutions like MatchUp
helps streamline your database, improve marketing
efficiency, and achieve a unified view of your
customer.
---Source: To learn more about how to
implement deduplication technology into your
operations, download the white paper, “Are
Duplicate Records Eroding Your Bottom Line?”
Or call 1-800-MELISSA to get a free trial of MatchUp.
|
|
|
Melissa Data
|
 |

| Enhance your
website, software or database with
easy-to-integrate data quality programming tools
and web services. |
|
|
|
|
 |

|
Save money on postage using leading
mail preparation software and other
direct marketing products. |
|
|
|
|
 |

Update & standardize addresses and
find out more about contacts in your
database.
|
|
|
|
|
 |

Find new customers perfect for your
business with our online and
specialty mailing lists.
|
|
|
|
|
 |

Locate the business information you
need such as ZIP Codes, address
verification, maps.
|
|
|
|
|

Download
your free copy of the Melissa Data product catalog.
|
|