FAQs
1. MatchUp Content
1.1 MatchUp Basics
1.1.1 What is the MatchUp Object?
The MatchUp Object is
a professional programming library used to match
database records. Typical uses of the MatchUp Object
are:
- Real-time lookup of customer information during
data-entry.
- Processing files with custom file structures (i.e.,
files that cannot be exported into a format that the
MatchUp GUI version could use).
- Providing deduplication for a custom-written
application seamlessly (i.e., without shelling to
another application).
- Providing custom functionality that cannot be achieved
with MatchUp GUI.
It can match files with different structures, different
field names, different field types, and different field
lengths. You can process up to 16 matchcodes in a single
pass, each of which can be made of any part(s) of any
field(s) you designate
1.1.2 What is a Matchcode?
A Matchcode is a set of rules which allow you to
determine if two records should be considered
duplicates. MatchUp uses a predefined Matchcode, or one
you have created using the windows Matchcode editor (or
programmatically using the MatchCode Interface),to
create a matchkey for each record.
1.1.3 What is a Matchcode Key?
A string of data, determined by your matchcode,
extracted from each record and is used to compare
records when deduping.
1.1.4 Can I create my own Matchcodes,
or edit existing ones?
Yes, Using
MatchUp's Matchcode Editor, you can create your own
matching criteria (your own Matchcode), or copy and edit
one of the basic matchcodes shipped with MatchUp, to
determine whether two records should match. This tool
lets you match on anything! The Matchcode Editor can be
accessed from the MatchUp Object data folder – running
Matchcode Editor.exe. You can also create new, or edit
or remove existing matchcodes programmatically using the
Matchcode Interface.
1.1.5 What if my source tables don’t have the same data
MatchUp uses to match?
For known datatypes, most types of
data you would use to match on with a merge purge
application, all you have to do is tell MatchUp what
type of data is in a field and the format of that data.
MatchUp will extract the relevant data needed to build
the matchccode keys. In the example function call below,
the developer has specified that the source data mapped
to one of the matchcode components contains Full Names,
even though his matchcode in another function call
states he is matching on last name. MatchUp will extract
the last name out of the full name data to build the
key.
mu.AddMapping(mdMUReadWrite.MatchcodeMapping.FullName);
1.1.6 Do I have any control over how
MatchUp matches my
file?
In addition to the
Matchcode Editor - where you determine the match rules,
you also have complete control of the mdMatchUp.dat – a
comprehensive list of key words are associated with
different datatypes.
1.1.7 What's the mdMatchUp.dat file?
This is a compiled
list of known keywords associated with a specific data
type. This helps MatchUp to process different data types
using advanced methods, because it recognizes keywords
and knows how they can be treated. Entries in theis
table allow you to match ‘Charles’ and ‘Chuck’, ‘North
Main Street’ to ‘N Main St’, or ‘UDM’ to ‘United Data
Machines Inc.’ for example. The user may edit or append
this database using the mdMatchUp.cfg file to help
MatchUp decide how to more accurately process your data.
1.1.8 What's the mdMatchUp.cfg file?
Occasionally, we have a customer who always
processes a database specific to an industry or
geographic area where proprietary keywords are not in
the .dat file, or have a different meaning. The
mdMatchUp.cfg file allows you to override the existing
behavior of the mdMatchUp.dat file, or add new entries.
Caution: When you edit this file, you may be overwriting
years of programming experience on how to best handle
these common data types.
1.1.9 What are the different methods of deduping?
MatchUp Object has three different Interfaces, each
designed to match (dedupe) your data in a different way.
The ReadWrite Interface is most used for matching entire
databases at one time. The Incremental Interface enables
real-time matching – like an incoming record from a web
form or call center – which can be compared to an
existing master database. The third, the Hybrid
Interface, provides a combination and flexibility of the
first two methods, matching an incoming record against a
small cluster of potentail matches. Hybrid deduping also
allows the developer to store the match keys in a
proprietary manner.
2. Technical Content
2.1 Installation
2.1.1 What do I need to do when installing the MatchUp
Object DVD?
When you install the MatchUp Object DVD, please pay
attention to changes.txt on the root of the DVD. It will
contain any changes that have been made to the object
which may require you to rewrite/recompile your
application. Also, when using the wrappers in the
Interfaces folder, please read the readmes for each
wrapper you are using as they may contain changes to the
build process you have to make. Running the setup.exe
will automatically register the 32 bit COM version, even
on a 64 bit machine. You will have to manually register
the 64 bit COM version
2.1.2 How Do I locate the path to MatchUp Object?
If you get the Windows message "This application has
failed … mdMatchUp.Dll could not be found...", then you
will have to modify your path to include the location of
the API.
Windows NT, Windows 2000, Windows XP:
Go to the Control Panel, double-click System, then click
the Advanced tab. Click the Environment Variables
button. In System Variables, locate the "Path" variable.
Click the Edit button. At the end of the existing value,
add:
;c:\Program Files\Melissa Data\DQT\MatchUp
Where c:\Program Files\Melissa Data\DQT\MatchUp is where
you have installed the MatchUp Object. Note that the
first character is a semicolon, not a colon. Once you've
made this addition, click OK until you're back to the
Control Panel.
Alternatively, many IDEs require you to actually copy
the mdMatchUp.dll into the current working debug,
release, or working directory.
Check your project settings, or use a Registry viewer to
see where the MatchUp COM Object was installed and
registered.
2.1.3 How do I make sure that MatchUp Object is calling
the correct DLL?
While the naming convention of MatchUp Object is
different than the previous MatchUp API, subsequent
updates will replace the DLLs in the MatchUp
installation directory structure. It is a good idea to
perform a file search on the machine to ensure that the
old version's DLL's are no longer present, replaced in
application folders, or are located where they won't be
found when Windows looks for the DLLs.
Often it’s easiest to eliminate all but a single copy of
the Dlls to ensure that your program is calling the
correct one.
Also, the Object will always send a message to DebugView
when the DLL is first loaded (“loading mdMatchUp:
C:\Program Files\MelissaData\DQT\MatchUp\mdMAtchUp.dll”)
so that you can confirm that the correct Dll is being
loaded
2.1.4 Why I am having difficulties with Visual Basic
trying
to locate MatchUp DLLs?
One particularly bothersome problem with Visual Basic is
that it has difficulty locating DLLs. In most compiled
languages, the first place that Windows will look for a
called DLL is the folder where the compiled .exe is
located. So most people will install the MatchUp DLLs in
the same folder as their executable and never have a
problem.
Visual Basic.NET seems to have alleviated most of the
past problems in locating the dll. Check the reference
properties to find the path of the .NET dll. Make sure
you have copied mdMatchUp.dll to that folder.
2.1.5 How do I install the COM version DLLs?
Running the DVD or demo installation will automatically
register the 32 bit COM Objects. On a 64 bit machine you
will have to manually register the 64 bit COM dll, as by
default, only the 32 bit version gets registered.
2.1.6 Can I cut and paste the activation code from
Windows into the AP or the opposite?
The MatchUp Object license codes are NOT compatible with
neither MatchUp for Windows nor the previous version,
the MatchUp API. Do not try to cut and paste one of
these other license codes into the MatchUp object, you
will get an invalid license error.
2.2 Basic Usage
2.2.1 How do I initialize properly?
To initialize the MatchUp Object you must specify a
valid license, path to the matchUp data files, the
matchcode to be used in the process, and the location
where each records matchcode key is stored. Without
calling these methods (or setting as properties in some
languages), the subsequent call to Initialize will
return the respective error.
matchup.SetLicenseString(license);
matchup.SetPathToMatchUpFiles(PATH);
matchup.SetMatchcodeName(MATCHCODENAME);
matchup.SetKeyFile(KEYFILE);
matchup.InitializeDataFiles();
For more information on these properties and methods,
please reference your MatchUp Object manual
2.2.2 What files are required in order to initialize the
MatchUp Object?
The files that must be present to initialize a deduping
session are:
mdMatchUp.dat
mdMatchup.mc
Optionally, the data files for editing matchcodes or
editing .dat entries are:
mdMatchup.cfg
MatchccodeEditor.exe
2.2.3 What are the interfaces of MatchUp Object and what
do they do?
MatchUp Object has 5 interfaces, two for handling
matchcodes and three for providing different methods of
deduping.
Matchcode Interface creates or references a Matchcode
Object. You can programmatically read or edit a
matchcodes properties.
Matchcode Component Interface allows you to
programmatically read the properties of a Matchcodes
individual components, or edit, add or remove the
component.
ReadWrite Interface is used when deduping entire
databases. A matchkey is built for each record, then all
keys are compared against each other. When the
ReadRecord method is called, the disposition (unique,
record with duplicates, or duplicate record) is returned
for each record.
Incremental Interface is used when comparing an
individual incoming record to an existing master
database. The key is built for incoming record and
compared against a historical (existing) key file Common
usage is for a new record being entered on a web form or
a call center.
Hybrid Interface is used when the developer requires
more control when comparing a single record to an
existing cluster of records. This method allows the keys
to be stored in a proprietary keyfile or even the actual
database. Therefore a group of potential matches (a
cluster of records with the same zip code for example)
can be compared, preventing the entire database from
being compared.
2.2.4 How Do I choose a Matchcode?
The matchcode you select for deduping has a great effect
on your returned results. A matchcode with a small
number of components may find a lot of duplicates,
whereas a matchcode with too many may return too few
duplicates. A simple matchcode may process very fast,
whereas an advanced matchcode using fuzzy logic may take
much longer, but catching more duplicates. A good rule
of thumb is to create the criteria and precision
required, and test.
2.2.5 Is passing in a different Matchcode name all
that’s
required to change the
matching strategy?
No, since you most likely will be using different
components, you will have to map them differently.
Meaning, you will have to programmatically tell the
Object what component types you are using and then link
your source datatype to the respective component. This
is accomplished via the AddMapping and AddComponent
function calls.
2.3 Matching Issues
2.3.1 Why do I get unhandled exception error?
The matchcode name, the string passed to the
SetMatchcode method or MatchcodeName property, must be
spelled exactly as listed in the matchcode editor. When
using an existing matchcode, misspelling the name will
cause an initialization error. Whitespace needs to be
retained in matchcode names with multiple words.
2.3.2 Why do I get Matchcode Mapping error?
Getting this error could mean you did not sequence the
components in a linear order when calling the AddMapping
method, or you have coded a data type (enum value or
MatchcodeMapping.property) which is incompatible for the
respective matchcode component.
There are two easy ways to determine how to sequence
your MapComponent calls:
1 Use the MatchUp GUI’s “Matchcode Mapping” setup tab.
2 Call GetComponentType() and/or GetComponentLabel() to
determine what the Object is expecting.
These methods are discussed in detail in the
documentation under Matchcode Interface.
2.3.3 Why do I get matchKeys whose Address parts are
not
getting built correctly?
The MatchUp Object uses an address parser used to match
inexact addresses, i.e. keyed in differently. This
allows us to match records like ’12 North Main St.’ to
’12 N Main Street’. It relies on known address key words
and patterns in the mdMatchUp.dat file. Therefore typos
or unknown address words need to be added or processed
with a fuzzy matchcode. Some words are problematic, as
they can represent a street name, po box, directional
and a highway! This makes recognizing patterns
difficult, potentially causing records to be missed as
duplicates. A few examples…
6547 Box Elder Loop
821 Sixty Six Rd
431 Shelbourne Four Corners
If you find records whose addresses whose keys are not
getting built correctly, i.e. addresses are not getting
parsed correctly, let us know, we’re sure there are
still some obscure patterns out there.
2.3.4 Why do I run into matchkey storage issues?
Take the following 3 records as example. A)12 Main B) PO
Box 44 and C) 12 Main PO Box44. A matches C, and C
matches B, so therefore A matches B. The windows version
and the ReadWrite Interface method catch these by
inferred matching. But the incremental and Hybrid
methods are a different story. Say Record A arrives on
Monday and Record B on Tuesday (Record C hasn't arrived
yet). Record A would not match Record B, they're just
not alike, and so they both get added to the historical
keyfile. Record C arrives on Wednesday. The Object
reports that Record C matches Record A and Record B, but
it can't do anything about the mistake that was made on
Tuesday. And, of course, on Tuesday, there was no way of
seeing the arrival of Record C on Wednesday.
2.3.5 Why do I run into matchkey storage issues?
If you have one dedupe process (one merge purge session)
storing keys and adding records, and another developer
or end user writes to the key file using a different
matchcode, you will have, in short, changed the matching
rules midstream, regardless of how briefly or long ago
it was done. Take great care in naming your .key files
and only synching with the proper matchcode.
2.4 Environment Variables
2.4.1 What environment variables are available and why
use them?
Currently, you can set MDMATCHUP_LICENSE. This
environment variable is made available so that you can
set your license string without recompiling your code.
If MDMATCHUP_LICENSE is set, an application must still
call the SetLicenseString function, but it does not need
to pass the license string to the object. Instead,
simply call the function and pass an empty string as the
parameter.
2.4.2 How do I use MDMATCHUP_LICENSE?
Windows
Windows users can set environment variables by doing the
following:
1 Select Start > Settings, and then click Control Panel.
2 Double-click System, and then click the Advanced tab.
3 Click Environment Variables, and then select either
System Variables or
Variables for the user X.
4 Click New.
5 Enter “MDMATCHUP_LICENSE” in the Variable Name box.
6 Enter the license string in the Variable Value box and
then click OK.
Please remember that these settings take effect only
upon start of the program. It may
be necessary to quit and restart the development
environment to incorporate the
changes.
Linux/Solaris/HP-UX/AIX
Unix-based OS users can simply set the license string
via the following:
export MDMATCHUP_LICENSE=A1B2C3D4E5 (not the actual
license string).
After putting this setting in the .profile, remember to
restart the shell.
Remember to set the SetLicenseString method in your
application with an empty string (ex: SetLicenseString(“”)).
2.5 Speed
2.5.1 What affects the speed MatchUp Object?
Obviously, the hardware specifications of the machine
that is running MatchUp Object is a major determinant in
how it will run. However, there are also other factors
that can affect the speed.
Accessing the Objects data files over the network can
increase the processing time, especially during
initialization.
Database access – reading source databases over a
network, writing out over a network, the database access
engine, and file type all contribute to your
environment. So test different access methods for
optimal speed.
If you index your data by ZIP Code you should see an
increase of performance, as most processes use a zip
code in the matchcode.
2.5.2 Why is the Object taking too long to process?
Merge Purge is a memory intensive, complex process. But
you can help speed up the process by designing and using
a matchcode which takes advantage of MatchUp’s group
clustering. Simply put, a matchcode with a lot of
components, all set using fuzzy matching algorithms will
take longer to process than a matchcode using a small
number of components using exact comparisons. See
Optimizing Matchcodes in the documentation, for tips
which have turned 56 hour processes into 4 hour
processes.
2.6 Advanced
2.6.1 What is the difference between the COM and
Standard version of MatchUp Object?
There are essentially no differences in the underlying
code for the COM version and the Standard version of
MatchUp Object. The COM version has a COM interface
layer used to communicate between your code and MatchUp
Object, and it is supported by many different languages.
The Standard version of MatchUp Object is an unmanaged
dll that must be included into your program which
eliminates the extra latency created by the COM layer.
2.6.2 What is a wrapper and how do I use it?
A wrapper is an additional layer of code that acts as an
interface from the standard mdMatchUp dll to the target
programming language. Currently, wrappers are available
for Java, PHP, PERL, Python and Ruby. In order to use
the wrapper, both the underlining code and the wrapper
itself must be installed. When running the install for
MatchUp Object, make sure the respective Interface is
checked. After Installation, navigate to the interfaces
directory and follow the readme for instructions to
setup and run.
2.6.3 Is there a way to use the standard dll in .NET
without
having to call Dll Import?
Yes, MatchUp Object has a .NETdll which enables you to
call the standard dll instead of the COM object from C#
or VB.NET. The mdMatchUpNET.dll creates the Managed
Assembly around the Standard DLL saving the developer
from creating the Pinvoke calls to the MatchUp
interfaces and providing an easy reference to the std
dll.
2.6.4 Can I transfer the Matchcodes from the DoubleTake
API or MatchUp API over to the MatchUp Object?
There are a few slight differences in the
MatchcodeEditor between the older DoubleTake API /
MatchUp API and the new MatchUp Object. Links to the
old-style Help file and a Matchcode tutorial, and a
DoubleTake 2 (GUI - old old old) import button have been
removed. But the one difference which effects
functionality is that MatchUp Object no longer allows a
Custom Table component. Since this version works with
DBMSs and OSs may not shell out to external Windows
executables, so we have added the ability for users to
programmatically create or edit their own matchcodes -
hence they can do a Custom Table substitution
programmatically.
Given those slight differences, they can copy the old
API matchcode file - DTake.mc (now mdMatchup.mc) into
the MatchUp Object data directory and rename it
mdMatchup. Not a utility, but very easy to do.
But caution - VERY IMPORTANT - check the matchcodes
thoroughly before using!!!!
Alternately, you can import a single matchcode from your
old Dtake.mc and add it to the mdMatchup.mc
programmatically…
mdMUMatchcode Source,Target;
mdMUMatchcodeComponent *Component;
Source.SetPathToMatchupFiles(SOURCE_MATCHUPLOC);
Source.InitializeDataFiles();
Target.SetPathToMatchupFiles(MATCHUPLOC);
Target.InitializeDataFiles();
cout << "Enter Existing Matchcode to Import: ";
cin.getline (MC_NAME,32);
Source.FindMatchcode(MC_NAME);
Target.CreateNewMatchcode(MC_NAME);
for (int i=1;i<=Source.GetMatchcodeItemCount();i++)
{
Component=GetMatchcodeItem(i);
Target.AddMatchcodeItem(Component);
delete Component;
}
Target.SaveToFile(MATCHUPLOC + \\mdMatchup.mc");
2.7 Results
2.7.1 How can I determine which record in a group will
be
tagged as the Output record?
Unlike the windows version of MatchUp, which lets you
pre-determine a priority between matching records in a
number of ways, the developer must use data returned
from the deduping methods – GetUserInfo and
GetStatusCode to programmatically handle the output and
duplicate records.
2.7.2 Why is MatchUp Object taking too long to process?
Merge Purge is a memory intensive, complex process. But
you can help speed up the process by keeping data local,
designing and using a matchcode which takes advantage of
MatchUp’s group clustering, and most importantly,
developing your application with the most efficient file
handling, data storage, and read and write access
methods.
2.7.3 Why did my process crash?
A program crash could be anything from a corrupt data
source, a read-only file you are attempting to write to,
a network connection drop out, operating system error,
user error, or once in a blue moon – a bug in the
program. One of the advantages of the Object is that you
have more control over debugging, and adding error
handling and trapping into your code.
2.7.4 Why did MatchUp not catch some duplicates?
MatchUp can only use the match rules and settings which
the end user has provided, so verify the matchkeys got
built correctly. If this wasn’t the source of your
problem, check to see if your matchcode rules were
satisfied – the keys may be the same, but you may not
have met the conditions of any matchcode column. Because
the Object also allows for real time comparison,
inferred matching can not always be taken advantage of.
In other words, the sequence of a linking record is more
critical with the API.
2.7.5 I got way too many duplicates!
Most likely, your matchcode rules were too loose;
possibly one column of your matchcode was a subset of a
valid column. Another source of too many duplicates may
be that you mapped in the wrong datatype in the
AddMapping method, or supplied the incorrect source data
in the Addfield method. If you are using a Last Name as
part of your match, but you accidentally mapped in a
Full Name field and datatype, you will get too many
duplicates.
2.7.6 How can I tell which source file contributed to my
Output table?
The original data Source and Record Number can be passed
to the deduper by the SetUserInfo property and are
returned after processing and calling the ReadRecord
method and the GetUserInfo property. In addition,
GetStatusCode(), GetCount(), GetEntry(), and
GetCombinations() give you post processing information
about the output status of each record.
2.7.7 What reporting is available?
Since you do the file handling’ you are also responsible
for coding and counting methods for inter, intra file
counts, output totals, dupe counts, etc.
2.8 General
2.8.1 What type of hardware do I need?
Although you only need about 20MB hard disk space to
install, we recommend a larger drive share due to
program key files which grow large in proportion to the
amount of data processed. Of course, key files do not
have to be stored in the same location as the project
application.
MatchUp is very memory intensive (millions and millions
of record matchkeys being compared to each other), so we
recommend at least 1GB RAM, although you may process
with much less.
2.8.2 What platforms will MatchUp run on?
Windows:
2000, XP, 2003, Vistaa
Pentium Pro or higher processor (x86, x64)
Linux Distributions:
Red Hat 8.0 (gcc 3.3) and above 32/64 bit (x86, x64)
Solaris™ versions:
Solaris 8,9,10, SPARC platform, 32/64 bit.
AIX™ versions:
5.2, 5.3. POWER, rs/6000, PPC, 32/64 Bit
HPUX™ versions:
11.11, 11.23, PA-RISC, Itanium, 32/64 bit
2.8.3 Are their any different versions? Like a
standalone
windows interface?
Yes, if custom development or real time deduping isn’t
what you need, MatchUp is also available in a Windows
standalone version with real time analyzing, reporting
and many other database tools.
2.8.4 How many users can use my license?
A single license generally allows a single computer to
be running MatchUp. For questions regarding copyright,
licensing, and multiple licensing (or site - licenses),
contact Melissa Data Sales. This is an important topic
beyond the scope of the FAQ
2.8.5 What type of support do you offer?
Technical support is always free, as are the frequent
updates, and many online resources found on our website.
2.8.6 Can I process dual name fields?
When you want to match ‘John Smith’ to a record which
has ‘Mr. and Mrs. John and Mary Smith’, you may get
lucky, and catch these as dupes, but if the dual name
has different last names, you may not be so lucky. The
real solution is Personator4 for Windows, or the Name
Object API, which does parse dual names into separate
components, and gives you the flexibility to either
remove the second name or create another record with the
second name.
2.8.7 Can I assign a confidence percent to select
Duplicates?
MatchUp does not assign a confidence percent number
because a fuzzy match on name and address may be a 40%
match for customer A, but only 15% for customer B,
putting MatchUp in a precarious position of grading
matchcodes. Instead, we let you simultaneously match on
16 matchcodes, and return a status code stating which
matchcode combinations a record hit on. This lets the
user evaluate the status string and determine himself
that a match on combinations 123458 is a 99% match, and
a match on combinations 78 are only a 15% confidence.
2.8.8 Does it process International Data?
MatchUp processes US, Canadian, and UK addresses. Other
international data can be matched using a combination of
Full Address lines (we don’t know how to parse all of
those other countries) and the MatchUp Objects best
guess on how to process unrecognized international
address patterns. Of course if you use Names formatted
in the same order as domestic data, or other data types
as a general data type, you should be OK. Give MatchUp a
ride with a free demo version if you need to make sure.
2.8.9 Is there any sample code to get me started? d?
Our demo download provides working examples for many
languages and platforms. See the MatchUp Object page on
our website for the demo download link. In addition,
check our support pages for newly added sample code. If
you can not find the samples required by your
development environment, please contact us.
|