MatchUp Component for Contact Zone
Advanced fuzzy matching for a single view of data assets
MatchUp makes use of a tool box of advanced fuzzy matching algorithms, deep domain knowledge, and a custom lexicon to granularly identify matches between names and nicknames, street addresses and abbreviated addresses, companies, cities, states, postal codes, phones, emails, and other contact data components.
- Compare records in one or more databases at once and process entire lists
- Use two input pins (i.e. source and lookup) to suppress duplicates or list out intersecting records
- Recognize any combination of over 35 data types, including ZIP Code, Address, and Last Name, Email Address, Social Security Number, and more
MatchUp employs state-of-the-art fuzzy matching algorithms, including:
- Exact
- Jaro
- Jaro-Winkler
- n-Gram
- Levenshtein
- Needleman-Wunch
- Smith-Waterman
- Smith-Waterman-Gotoh
- Dice’s Coefficient
|
- Jacard
- Overlap
- Longest Common String
- Soundex
- Phonetex
- Frequency
- Frequency Near
- Containment
- MD Keyboard
|
Identify Obvious and Not So Obvious Duplicates
MatchUp’s advanced fuzzy matching algorithms can identify obvious duplicates like:
And not-so-obvious duplicates like:
Create Your Own MatchCodes
With MatchUp you can set up your own matching rules (called Matchcodes) in any combination of over 35 components from common ones like Zip Code, Address, and Last Name – to not-so-common elements like Email Address, Company, and Social Security Number. You can even specify your own proprietary data component, such as an account number, using the user-interface-driven MatchCode editor.
Regular Expression Builder
MatchUp comes with a Regular Expression builder that helps users with RegEx syntax and enables processing of multiple expressions in a single pass. Built Regular Expressions can be saved in a library for reuse. Cleansing the data prior to matching is a crucial step to get the most accurate results.
Maintain Full Lineage Through Pipeline Metadata
MatchUp outputs full metadata to provide lineage on what columns were compared and match rules used.