Intelligent fuzzy matching Identify duplicates records and enables an accurate single customer view
Identify duplicate records in any data domain and gain an aggregate understanding of your
data by consolidating or linking data from all sources in your organization.
Misc. data of any type, such as departments, titles,
companies, products, etc.; or string data of any kind
can be matched and duplicates eliminated.
Fuzzy Match leverages
a toolbox of state-of-the-art fuzzy matching algorithms
and allows user-specified granular control on match
thresholds and even fine tuning of the algorithms. Match
rules can be set for each column of data and the rules
daisy chained for catch all situations. Fuzzy
Match streams the results into three
destinations: Matches, Possible Matches and Non-Matches.
Based off of the percentage score between records, the
component will direct the match results to the
appropriate stream as configured.Fuzzy Matching
algorithms employed include:
| • Exact |
• Jacard |
| • Jaro |
• Overlap |
| • Jaro-Winkler |
• Longest Common String |
| • n-Gram |
• Soundex |
| • Levenshtein |
• Phonetex |
| • Needleman-Wunch |
• Frequency |
| • Smith-Waterman |
• Frequency Near |
| • Smith-Waterman-Gotoh |
• Containment |
| • Dice's Coefficient |
• MD Keyboard |
Regular Expression Builder
Fuzzy Match comes with a Regular Expression builder that helps users with RegEx syntax
and enables processing of multiple expressions in a
single pass. Built Regular Expressions can be saved in a
library for reuse. Cleansing the data prior to matching
is a crucial step to get the most accurate results.
Maintain Full Lineage Through Pipeline Metadata
Fuzzy Match outputs full metadata to
provide lineage on what columns were compared and match
rules used.
System Requirements
Data Quality Components for SSIS are 32/64 bit tools available for Windows
XP/2003/2008/Vista/7 and Microsoft SQL Server 2005/2008
Minimum Requirements (Not Recommended)
·
32-Bit Windows XP/2003/2008/Vista/7
·
Microsoft SQL Server 2005/2008
·
6 GB hard-disk space
Recommended Requirements
·
64-Bit Windows XP/2003/2008/Vista/7
·
Microsoft SQL Server 2008
R2
·
6 GB hard-disk space
|