|
|
What is Fuzzy Matching?
Fuzzy matching is an advanced mathematical process that determines the
similarities between data sets, information, and facts – where the outcome is
neither true nor false, or 100 percent certain, hence the word, “fuzzy.” The
process compares any data type of any length and from any place in a field to find
non-exact matches.
According to data mining consulting firm Two Crows Consulting, fuzzy matching or
fuzzy logic is “applied to fuzzy sets where membership in a fuzzy set is a
probability, not necessarily 0 or 1… Fuzzy logic needs to be able to manipulate
degrees of maybe, in addition to true and false.”
For every piece of data examined, the fuzzy matching process will give a
probability score to determine the accuracy of the match. For example, ‘Tomas
Jones’ might get a 90 percent score of similarity, while ‘Tom Jones’ might
receive a 75 percent score, as compared to the actual name of Thomas Jones.
To demonstrate how duplicate records are identified through fuzzy matching, here
is a sample
list of prospective customers. As you can see, there are
duplicate
records due to misspelling or typos. Customer # 11 and Customer #111, and
Customer #1111 are most likely the same person.
Still, there are other methods – such as the use of more advanced fuzzy matching
algorithms – that can identify whether Customer #11, Tomas Jones, Customer #111,
Tom Jones, and Customer #1111, Thomas Jones are indeed the same person.
Next Article: Different Fuzzy Matching Algorithms
|
|
|
|