Fuzzy Match Component for SQL Server
Intelligent fuzzy matching component identifies duplicates records and enables an accurate single customer view
Identify duplicate records in any data domain and gain an aggregate understanding of your data by consolidating or linking data from all sources in your organization. Misc. data of any type, such as departments, titles, companies, products, etc.; or string data of any kind can be matched and duplicates eliminated.
Fuzzy Match leverages a toolbox of state-of-the-art fuzzy matching algorithms and allows user-specified granular control on match thresholds and even fine tuning of the algorithms. Match rules can be set for each column of data and the rules daisy chained for catch all situations. Fuzzy Match streams the results into three destinations: Matches, Possible Matches and Non-Matches. Based off of the percentage score between records, the component will direct the match results to the appropriate stream as configured.
Fuzzy Matching algorithms employed include:
- Exact
- Jaro
- Jaro-Winkler
- n-Gram
- Levenshtein
- Needleman-Wunch
- Smith-Waterman
- Smith-Waterman-Gotoh
- Dice’s Coefficient
|
- Jacard
- Overlap
- Longest Common String
- Soundex
- Phonetex
- Frequency
- Frequency Near
- Containment
- MD Keyboard
|
Regular Expression Builder
Fuzzy Match comes with a Regular Expression builder that helps users with RegEx syntax and enables processing of multiple expressions in a single pass. Built Regular Expressions can be saved in a library for reuse. Cleansing the data prior to matching is a crucial step to get the most accurate results.
Maintain Full Lineage Through Pipeline Metadata
Fuzzy Match outputs full metadata to provide lineage on what columns were compared and match rules used.
System Requirements
Data Quality Components for SSIS are 32/64 bit tools available for Windows XP/2003/2008/Vista/7 and Microsoft SQL Server 2005/2008
Minimum Requirements (
Not Recommended)
- 32-Bit Windows XP/2003/2008/Vista/7
- Microsoft SQL Server 2005/2008/2012
- 6 GB hard-disk space
Recommended Requirements
- 64-Bit Windows XP/2003/2008/Vista/7
- Microsoft SQL Server 2008 R2
- Microsoft SQL Server 2012
- 6 GB hard-disk space