MatchUp Object:FAQ

From Melissa Data Wiki
Jump to navigation Jump to search

← MatchUp Object


General

What is the MatchUp Object?

The MatchUp Object is a professional programming library used to match database records. Typical uses of the MatchUp Object are:
  • Real-time lookup of customer information during data-entry.
  • Processing files with custom file structures (i.e., files that cannot be exported into a format that the MatchUp GUI version could use).
  • Providing deduplication for a custom-written application seamlessly (i.e., without shelling to another application).
  • Providing custom functionality that cannot be achieved with MatchUp GUI.
It can match files with different structures, different field names, different field types, and different field lengths. You can process up to 16 matchcodes in a single pass, each of which can be made of any part(s) of any field(s) you designate


What do I need to do when installing the MatchUp Object DVD?

When you install the MatchUp Object DVD, please pay attention to changes.txt on the root of the DVD. It will contain any changes that have been made to the object which may require you to rewrite/recompile your application. Also, when using the wrappers in the Interfaces folder, please read the readmes for each wrapper you are using as they may contain changes to the build process you have to make. You will have to manually register the 64 bit COM version.


What type of hardware do I need?

Although you only need about 20MB hard disk space to install, we recommend a larger drive share due to program key files which grow large in proportion to the amount of data processed. Of course, key files do not have to be stored in the same location as the project application.
MatchUp is very memory intensive (millions and millions of record matchkeys being compared to each other), so we recommend at least 1GB RAM, although you may process with much less.


Are their any different versions? Like a standalone windows interface?

Yes, if custom development or real time deduping isn’t what you need, MatchUp is also available in a Windows standalone version with real time analyzing, reporting and many other database tools. It is also available as a SSIS or Pentaho Contact Zone component for enterprise level needs.


What platforms will MatchUp run on?

Windows
2000, XP, 2003, Vistaa
Pentium Pro or higher processor (x86, x64)
Linux Distributions
Red Hat 8.0 (gcc 3.3) and above 32/64 bit (x86, x64)


How often is MatchUp updated?

MatchUp Object is updated on a quarterly basis. Updates include additional country support and occasional changes to the object.


What is the expected throughput?

The expected throughput is approximately 3,000,000 – 40,000,000 records per hour.
Matchcode Throughput (records / hour)
Address, Last Name, First Name 5,000,000 – 40,000,000
Custom Matchcode using Fuzzy 200,000 – 5,000,000


Factors affecting throughput
  • Distribution of the data.
If inputs are randomly distributed over a wide range of used Matchcode components, MatchUp can not take advantage of maximum optimized clustering.
  • Matchcode used.
The more advanced the matchcode – number of components, used combinations, etc., the more comparisons the object will have to make for each cluster (group of potentially matching records).
  • Fuzzy Logic used.
For each comparison the object makes between two potential duplicates, the longer it will take to perform a fuzzy algorithm as opposed to a straight comparison of two records.


Setting Up

What is the difference between the COM and Standard version of MatchUp Object? (Deprecated 2628)

There are essentially no differences in the underlying code for the COM version and the Standard version of MatchUp Object. The COM version has a COM interface layer used to communicate between your code and MatchUp Object, and it is supported by many different languages. The Standard version of MatchUp Object is an unmanaged dll that must be included into your program which eliminates the extra latency created by the COM layer.


How do I install the COM version DLLs? (Deprecated 2628)

On a 64 bit machine you will have to manually register the 64 bit COM dll.


Is there a way to use the standard dll in .NET without having to call Dll Import?

Yes, MatchUp Object has a .NETdll which enables you to call the standard dll instead of the COM object from C# or VB.NET. The mdMatchUpNET.dll creates the Managed Assembly around the Standard DLL saving the developer from creating the Pinvoke calls to the MatchUp interfaces and providing an easy reference to the std dll.


How do I make sure that MatchUp Object is calling the correct DLL?

While the naming convention of MatchUp Object is different than the previous MatchUp API, subsequent updates will replace the DLLs in the MatchUp installation directory structure. It is a good idea to perform a file search on the machine to ensure that the old version's DLL's are no longer present, replaced in application folders, or are located where they won't be found when Windows looks for the DLLs.
Often it’s easiest to eliminate all but a single copy of the Dlls to ensure that your program is calling the correct one.
Also, the Object will always send a message to DebugView when the DLL is first loaded (“loading mdMatchUp: C:\\\\Program Files\\\\MelissaData\\\\DQT\\\\MatchUp\\\\mdMAtchUp.dll”) so that you can confirm that the correct Dll is being loaded


Why I am having difficulties with Visual Basic trying to locate MatchUp DLLs?

One particularly bothersome problem with Visual Basic is that it has difficulty locating DLLs. In most compiled languages, the first place that Windows will look for a called DLL is the folder where the compiled .exe is located. So most people will install the MatchUp DLLs in the same folder as their executable and never have a problem.
Visual Basic.NET seems to have alleviated most of the past problems in locating the dll. Check the reference properties to find the path of the .NET dll. Make sure you have copied mdMatchUp.dll to that folder.


How Do I locate the path to MatchUp Object?

If you get the Windows message "This application has failed … mdMatchUp.Dll could not be found...", then you will have to modify your path to include the location of the API.
Windows NT, Windows 2000, Windows XP
Go to the Control Panel, double-click System, then click the Advanced tab. Click the Environment Variables button. In System Variables, locate the "Path" variable. Click the Edit button. At the end of the existing value, add:
c:\\Program Files\\Melissa Data\\DQT\\MatchUp
Where c:\\Program Files\\Melissa Data\\DQT\\MatchUp is where you have installed the MatchUp Object. Note that the first character is a semicolon, not a colon. Once you've made this addition, click OK until you're back to the Control Panel.
Alternatively, many IDEs require you to actually copy the mdMatchUp.dll into the current working debug, release, or working directory.
Check your project settings, or use a Registry viewer to see where the MatchUp COM Object was installed and registered.


What is a wrapper and how do I use it?

A wrapper is an additional layer of code that acts as an interface from the standard mdMatchUp dll to the target programming language. Currently, wrappers are available for .NET, Java, PHP, PERL, Python and Ruby. In order to use the wrapper, both the underlining code and the wrapper itself must be installed. When running the install for MatchUp Object, make sure the respective Interface is checked. After Installation, navigate to the interfaces directory and follow the readme for instructions to setup and run.


Initialize and Configure

How do I initialize properly?

To initialize the MatchUp Object you must specify a valid License Key, path to the matchUp data files, the matchcode to be used in the process, and the location where each records matchcode key is stored. Without calling these methods (or setting as properties in some languages), the subsequent call to Initialize will return the respective error.
matchup.SetLicenseString(License Key);
matchup.SetPathToMatchUpFiles(PATH);
matchup.SetMatchcodeName(MATCHCODENAME);
matchup.SetKeyFile(KEYFILE);
matchup.InitializeDataFiles();
For more information on these properties and methods, please reference your MatchUp Object manual


What files are required in order to initialize the MatchUp Object?

Answer
The files that must be present to initialize a deduping session are:
  • mdMatchUp.dat
  • mdMatchup.mc
Optionally, the data files for editing matchcodes or editing .dat entries are:
  • mdMatchup.cfg
  • MatchccodeEditor.exe


Can I override default Matching logic?

In addition to the Matchcode Editor - where you determine the match rules, you also have complete control of the mdMatchUp.dat – a comprehensive list of key words are associated with different datatypes.


What's the mdMatchUp.dat file?

This is a compiled list of known keywords associated with a specific data type. This helps MatchUp to process different data types using advanced methods, because it recognizes keywords and knows how they can be treated. Entries in theis table allow you to match ‘Charles’ and ‘Chuck’, ‘North Main Street’ to ‘N Main St’, or ‘UDM’ to ‘United Data Machines Inc.’ for example. The user may edit or append this database using the mdMatchUp.cfg file to help MatchUp decide how to more accurately process your data.

What's the mdMatchUp.cfg file?

Occasionally, we have a customer who always processes a database specific to an industry or geographic area where proprietary keywords are not in the .dat file, or have a different meaning. The mdMatchUp.cfg file allows you to override the existing behavior of the mdMatchUp.dat file, or add new entries. Caution: When you edit this file, you may be overwriting years of programming experience on how to best handle these common data types.

What environment variables are available and why?

Currently, you can set MD_LICENSE. This environment variable is made available so that you can set your License Key without recompiling your code.


How do I use MD_LICENSE in Windows?

Windows users can set environment variables by doing the following:
  1. Select Start > Settings, and then click Control Panel.
  2. Double-click System, and then click the Advanced tab.
  3. Click Environment Variables, and then select either System Variables or Variables for the user X.
  4. Click New.
  5. Enter “MD_LICENSE” in the Variable Name box.
  6. Enter the License Key in the Variable Value box and then click OK.
Please remember that these settings take effect only upon start of the program. It may be necessary to quit and restart the development environment to incorporate the changes.

How do I use MD_LICENSE Linux

Unix-based OS users can simply set the License Key via the following:
export MD_LICENSE=A1B2C3D4E5 (not the actual License Key).
After putting this setting in the .profile, remember to restart the shell.
Remember to set the SetLicenseString method in your application with an empty string (ex: SetLicenseString(“”)).


How many users can use my License Key?

A single License Key generally allows a single computer to be running MatchUp. For questions regarding copyright, licensing, and multiple licensing (or site - licenses), contact Melissa Data Sales. This is an important topic beyond the scope of the FAQ.

Can I cut and paste the activation code from Windows into the AP or the opposite?

Answer
The MatchUp Object License Key codes are NOT compatible with neither MatchUp for Windows nor the previous version, the MatchUp API. Do not try to cut and paste one of these other License Key codes into the MatchUp object, you will get an invalid License Key error.


Deduping

What are the different methods of deduping?

MatchUp Object has three different Interfaces, each designed to match (dedupe) your data in a different way.
ReadWrite Interface
  • The ReadWrite Interface is most used for matching entire databases at one time.
Incremental Interface
  • The Incremental Interface enables real-time matching – like an incoming record from a web form or call center – which can be compared to an existing master database.
Hybride Interface
  • The Hybrid Interface, provides a combination and flexibility of the first two methods, matching an incoming record against a small cluster of potentail matches. Hybrid deduping also allows the developer to store the match keys in a proprietary manner.


What are the interfaces of MatchUp Object and what do they do?

MatchUp Object has 5 interfaces, two for handling matchcodes and three for providing different methods of deduping.
  • Matchcode Interface creates or references a Matchcode Object. You can programmatically read or edit a matchcodes properties.
  • Matchcode Component Interface allows you to programmatically read the properties of a Matchcodes individual components, or edit, add or remove the component.
  • ReadWrite Interface is used when deduping entire databases. A matchkey is built for each record, then all keys are compared against each other. When the ReadRecord method is called, the disposition (unique, record with duplicates, or duplicate record) is returned for each record.
  • Incremental Interface is used when comparing an individual incoming record to an existing master database. The key is built for incoming record and compared against a historical (existing) key file Common usage is for a new record being entered on a web form or a call center.
  • Hybrid Interface is used when the developer requires more control when comparing a single record to an existing cluster of records. This method allows the keys to be stored in a proprietary keyfile or even the actual database. Therefore a group of potential matches (a cluster of records with the same zip code for example) can be compared, preventing the entire database from being compared.

What if my source tables don’t have the same data MatchUp uses to match?

For known datatypes, most types of data you would use to match on with a merge purge application, all you have to do is tell MatchUp what type of data is in a field and the format of that data. MatchUp will extract the relevant data needed to build the matchccode keys. In the example function call below, the developer has specified that the source data mapped to one of the matchcode components contains Full Names, even though his matchcode in another function call states he is matching on last name. MatchUp will extract the last name out of the full name data to build the key.
mu.AddMapping(mdMUReadWrite.MatchcodeMapping.FullName);

Can I process dual name fields?

When you want to match ‘John Smith’ to a record which has ‘Mr. and Mrs. John and Mary Smith’, you may get lucky, and catch these as dupes, but if the dual name has different last names, you may not be so lucky. The real solution is Personator4 for Windows, or the Name Object API, which does parse dual names into separate components, and gives you the flexibility to either remove the second name or create another record with the second name.

Does it process International Data?

MatchUp processes US, Canadian, and UK addresses. Other international data can be matched using a combination of Full Address lines (we don’t know how to parse all of those other countries) and the MatchUp Objects best guess on how to process unrecognized international address patterns. Of course if you use Names formatted in the same order as domestic data, or other data types as a general data type, you should be OK. Give MatchUp a ride with a free demo version if you need to make sure.

Matchcodes

What is a Matchcode?

A Matchcode is a set of rules which allow you to determine if two records should be considered duplicates. MatchUp uses a predefined Matchcode, or one you have created using the windows Matchcode editor (or programmatically using the MatchCode Interface),to create a matchkey for each record.


What is a Matchcode Key?

A string of data, determined by your matchcode, extracted from each record and is used to compare records when deduping.


Can I create my own Matchcodes, or edit existing ones?

Yes, Using MatchUp's Matchcode Editor, you can create your own matching criteria (your own Matchcode), or copy and edit one of the basic matchcodes shipped with MatchUp, to determine whether two records should match. This tool lets you match on anything! The Matchcode Editor can be accessed from the MatchUp Object data folder – running Matchcode Editor.exe. You can also create new, or edit or remove existing matchcodes programmatically using the Matchcode Interface.


How Do I choose a Matchcode?

The matchcode you select for deduping has a great effect on your returned results. A matchcode with a small number of components may find a lot of duplicates, whereas a matchcode with too many may return too few duplicates. A simple matchcode may process very fast, whereas an advanced matchcode using fuzzy logic may take much longer, but catching more duplicates. A good rule of thumb is to create the criteria and precision required, and test.


Why do I get matchKeys whose Address parts are not getting built correctly?

The MatchUp Object uses an address parser to match inexact addresses, i.e. keyed in differently. This allows us to match records like ’12 North Main St.’ to ’12 N Main Street’. It relies on known address key words and patterns in the mdMatchUp.dat file. Therefore typos or unknown address words need to be added or processed with a fuzzy matchcode. Some words are problematic, as they can represent a street name, po box, directional and a highway! This makes recognizing patterns difficult, potentially causing records to be missed as duplicates. A few examples…
  • 6547 Box Elder Loop
  • 821 Sixty Six Rd
  • 431 Shelbourne Four Corners
If you find records whose addresses whose keys are not getting built correctly, i.e. addresses are not getting parsed correctly, let us know, we’re sure there are still some obscure patterns out there.


Can I transfer the Matchcodes from the DoubleTake API or MatchUp API over to the MatchUp Object?

There are a few slight differences in the MatchcodeEditor between the older DoubleTake API / MatchUp API and the new MatchUp Object. Links to the old-style Help file and a Matchcode tutorial, and a DoubleTake 2 (GUI - old old old) import button have been removed. But the one difference which effects functionality is that MatchUp Object no longer allows a Custom Table component. Since this version works with DBMSs and OSs which may not shell out to external Windows executables, we have added the ability for users to programmatically create or edit their own matchcodes - hence they can do a Custom Table substitution programmatically.
Given those slight differences, they can copy the old API matchcode file - DTake.mc (now mdMatchup.mc) into the MatchUp Object data directory and rename it mdMatchup. Not a utility, but very easy to do.
But caution - VERY IMPORTANT - check the matchcodes thoroughly before using!!!!
Alternately, you can import a single matchcode from your old Dtake.mc and add it to the mdMatchup.mc programmatically:
mdMUMatchcode Source,Target;
mdMUMatchcodeComponent *Component;

Source.SetPathToMatchupFiles(SOURCE_MATCHUPLOC);
Source.InitializeDataFiles();

Target.SetPathToMatchupFiles(MATCHUPLOC);
Target.InitializeDataFiles();

cout << "Enter Existing Matchcode to Import: ";
cin.getline (MC_NAME,32);
Source.FindMatchcode(MC_NAME);
Target.CreateNewMatchcode(MC_NAME);

for (int i=1;i<=Source.GetMatchcodeItemCount();i++)
{
Component=GetMatchcodeItem(i); 
Target.AddMatchcodeItem(Component);
delete Component; 
}

Target.SaveToFile(MATCHUPLOC + \\\\mdMatchup.mc");


Is passing in a different Matchcode name all that’s required to change the matching strategy?

No, since you most likely will be using different components, you will have to map them differently. Meaning, you will have to programmatically tell the Object what component types you are using and then link your source datatype to the respective component. This is accomplished via the AddMapping and AddComponent function calls.


Output

I got way too many duplicates!

Most likely, your matchcode rules were too loose; possibly one column of your matchcode was a subset of a valid column. Another source of too many duplicates may be that you mapped in the wrong datatype in the AddMapping method, or supplied the incorrect source data in the Addfield method. If you are using a Last Name as part of your match, but you accidentally mapped in a Full Name field and datatype, you will get too many duplicates.


How can I tell which source file contributed to my Output table?

The original data Source and Record Number can be passed to the deduper by the SetUserInfo property and are returned after processing and calling the ReadRecord method and the GetUserInfo property. In addition, GetResults(), GetCount(), and GetEntry() give you post processing information about the output status of each record.


Why did MatchUp not catch some duplicates?

MatchUp can only use the match rules and settings which the end user has provided, so verify the matchkeys got built correctly. If this wasn’t the source of your problem, check to see if your matchcode rules were satisfied – the keys may be the same, but you may not have met the conditions of any matchcode column. Because the Object also allows for real time comparison, inferred matching can not always be taken advantage of. In other words, the sequence of a linking record is more critical with the API.


Which record in a group will be tagged as the Output record?

Unlike the windows and SSIS versions of MatchUp, which let you pre-determine a priority between matching records in a number of ways, the developer must use data returned from the deduping methods – GetUserInfo and GetResults to programmatically determine the output and duplicate records.


What type of matchkey storage issues can arise?

If you have one dedupe process (one merge purge session) storing keys and adding records, and another developer or end user writes to the key file using a different matchcode, you will have, in short, changed the matching rules midstream, regardless of how briefly or long ago it was done. Take great care in naming your .key files and only synching with the proper matchcode.


Why does the Incremental Interface occasionally store duplicates?

Take the following 3 records as example. A)12 Main B) PO Box 44 and C) 12 Main PO Box44. A matches C, and C matches B, so therefore A matches B. The windows version and the ReadWrite Interface method catch these by inferred matching. But the incremental and Hybrid methods are a different story. Say Record A arrives on Monday and Record B on Tuesday (Record C hasn't arrived yet). Record A would not match Record B, they're just not alike, and so they both get added to the historical keyfile. Record C arrives on Wednesday. The Object reports that Record C matches Record A and Record B, but it can't do anything about the mistake that was made on Tuesday. And, of course, on Tuesday, there was no way of seeing the arrival of Record C on Wednesday.


What reporting is available?

Since you do the file handling’ you are also responsible for coding and counting methods for inter, intra file counts, output totals, dupe counts, etc.


Can I assign a confidence percent to select Duplicates?

MatchUp does not assign a confidence percent number because a fuzzy match on name and address may be a 40% match for customer A, but only 15% for customer B, putting MatchUp in a precarious position of grading matchcodes. Instead, we let you simultaneously match on 16 matchcodes, and return a status code stating which matchcode combinations a record hit on. This lets the user evaluate the status string and determine himself that a match on combinations 123458 is a 99% match, and a match on combinations 78 are only a 15% confidence.


Support Questions

What type of support do you offer?

Technical support is always free, as are the frequent updates, and many online resources found on our website.


Is there any sample code to get me started?

Our demo download provides working examples for many languages and platforms. See the MatchUp Object page on our website for the demo download link. In addition, check our support pages for newly added sample code. If you can not find the samples required by your development environment, please contact us.


Why is the Object taking too long to process?

Merge Purge is a memory intensive, complex process. But you can help speed up the process by designing and using a matchcode which takes advantage of MatchUp’s group clustering. Simply put, a matchcode with a lot of components, all set using fuzzy matching algorithms will take longer to process than a matchcode using a small number of components using exact comparisons. See Optimizing Matchcodes in the documentation, for tips which have turned 56 hour processes into 4 hour processes.


Why do I get unhandled exception error?

The matchcode name, the string passed to the SetMatchcode method or MatchcodeName property, must be spelled exactly as listed in the matchcode editor. When using an existing matchcode, misspelling the name will cause an initialization error. Whitespace needs to be retained in matchcode names with multiple words.


Why is MatchUp Object taking too long to process?

Merge Purge is a memory intensive, complex process. But you can help speed up the process by keeping data local, designing and using a matchcode which takes advantage of MatchUp’s group clustering, and most importantly, developing your application with the most efficient file handling, data storage, and read and write access methods.


Why did my process crash?

A program crash could be anything from a corrupt data source, a read-only file you are attempting to write to, a network connection drop out, operating system error, user error, or once in a blue moon – a bug in the program. One of the advantages of the Object is that you have more control over debugging, and adding error handling and trapping into your code.


What affects the performance MatchUp Object?

Obviously, the hardware specifications of the machine that is running MatchUp Object is a major determinant in how it will run. However, there are also other factors that can affect the speed:
  • Accessing the Objects data files over the network can increase the processing time, especially during initialization.
  • Database access – reading source databases over a network, writing out over a network, the database access engine, and file type all contribute to your environment. So test different access methods for optimal speed.
  • If you index your data by ZIP Code you should see an increase of performance, as most processes use a zip code in the matchcode.
  • The used Matchcode - the criteria used to determine if records are duplicates has a great effect on processing speed. The more complex the Matchcode, the more comparisons are needed to be evaluated.

Why do I get Matchcode Mapping error?

Getting this error could mean you did not sequence the components in a linear order when calling the AddMapping method, or you have coded a data type (enum value or MatchcodeMapping.property) which is incompatible for the respective matchcode component.
There are two easy ways to determine how to sequence your MapComponent calls:
  1. Use the MatchUp GUI’s “Matchcode Mapping” setup tab.
  2. Call GetComponentType() and/or GetComponentLabel() to determine what the Object is expecting.
These methods are discussed in detail in the documentation under Matchcode Interface.