MatchUp Object:Matchcode Editor:Interface

From Melissa Data Wiki
Revision as of 21:12, 28 October 2016 by Admin (talk | contribs)
Jump to navigation Jump to search

← MatchUp Object Reference

MatchUp Object Matchcode Editor Navigation
Overview
Interface
Matchcode Name
Matchcode List
Matching Strategies
Short-Empty Settings
Swap Match Pairs
Combinations
Matching Rules



The Matchcode Editor screen is divided into three distinct sections: a list of available matchcodes in the matchcode database; the properties of the selected Matchcode; and a description of the Matching Rules for the selected matchcode.

MU MatchcodeEditor Interface.png

Matchcode Name

The top portion of the screen contains a drop-down menu of all the matchcodes found in the current matchcode file.

Below this is a Description: section that contains the description for the currently selected matchcode.

To the right are the Create Matchcode, Remove Matchcode, Copy Matchcode, and Rename Matchcode buttons with which you can create and modify matchcodes. Copying a current matchcode is often the best starting point for creating new matchcodes.

Create Matchcode

To create a matchcode:

  1. Click the Create Matchcode button.
  2. Type a name for the new matchcode in the Matchcode Name dialog box and click OK.
  3. The Matchcode editor presents a blank matchcode screen with no components.
  4. Begin adding components. Once a Data Type is selected, click anywhere in the window, or press the Enter key. This will input that data type, and have another row appear that may be edited.

Remove Matchcode

To remove a matchcode:

  1. Select the matchcode to be deleted in the Matchcode Name: drop-down menu.
  2. Click the Remove Matchcode button.
  3. Click Yes in the Remove Matchcode dialog box to confirm the deletion.

Copy Matchcode

To make a copy of a matchcode:

  1. Select the matchcode to be copied in the Matchcode Name: drop-down menu.
  2. Click the Copy Matchcode button.
  3. Type a name for the new matchcode in the Matchcode Name dialog box and click OK.

Rename Matchcode

To rename a matchcode:

  1. Select the matchcode to be renamed in the Matchcode Name: drop-down menu.
  2. Click the Rename Matchcode button.
  3. Type a new name for the matchcode in the Matchcode Name dialog box and click OK.

Matchcode List

Below the matchcode name is the Matchcode List section, a list of components used by the currently selected matchcode.

This list shows the basic settings for each combination.

Field Description
Data Type The type of data used by this component. See Matchcode Components for a list of all available types.
Label (Optional) A description of the data found in this component. Not all component types use this field. Max size of description is 20 characters.
Size The maximum number of characters from this component to be used by this matchcode. If the data has fewer characters, it will be padded with spaces.
Start Sets where the current matchcode starts counting when selecting characters to use: the left (beginning); the right (end); a specific character position; or a specific word.
Fuzzy The type of matching to be used on the selected data type.
Distance Context sensitive, sets a range for specific data types or fuzzy matching.
Short/Empty These settings control matching between incomplete or empty fields.
Swap Swap matching is the ability to compare one component to another component.

For more information these settings, see Component Properties.

Following these fields, to the right side of the list, there is a grid of editable check boxes that shows the combinations in which component is used.

Add Component

To add a new component to the matchcode:

  1. Click the down arrow to open the drop-down menu named [Select Data Type]. (There will always be a [Select Data Type] below the last defined matchcode component.)
  2. Select the desired data type from the drop-down menu.
  3. The new component is added as the last component in the matchcode.
  4. Select the settings for the new component by clicking the field you want to change. See the sections below for more information on the controls within this dialog.

Remove Component

To remove a component from a matchcode:

  1. Click the down arrow to open the drop-down menu of the component to be deleted.
  2. Select [Remove Component] from the top of the list in the drop-down menu.
  3. Once selected, click anywhere in the window, or press the Enter key. This will confirm the removal, and remove the component from the matchcode list.

Change Component Order

To change the order of components in a matchcode:

  1. Click and drag the name of the component.
  2. Drag the component to the new position.

For more information on how combinations of components are used, see Component Combinations.

Matching Strategies

This setting controls what criteria the matchcode will use to determine how to compare this component of one match key to another match key.

Fuzzy Matching Strategies

  • Phonetex
  • Soundex
  • Containment
  • Frequency
  • Fast Near
  • Accurate Near
  • Frequency Near
  • Vowels Only
  • Consonants Only
  • Alphas Only
  • Numerics Only
  • Jaro
  • Jaro-Winkler
  • n-Gram
  • Needleman-Wunch
  • Smith-Waterman-Gotoh
  • Dice’s Coefficient
  • Jaccard Similarity Coefficient
  • Overlap Coefficient
  • Longest Common Substring
  • Double MetaPhone


Distance

This is the property where you set a range for which two records will still match. This field is context sensitive, depending on the Data Type and Fuzzy algorithm.

Data Type Description
Proximity Distance in miles. Range: 0-4000
Numeric Integer number.
Date Number of days.

For example: If the Distance is set to 60:

  • Two records with dates 20161225 and 20161031 will match. (they are within 60 days)
  • Two records with dates 20161225 and 20160430 will match. (they are further tham 60 days apart)


Algorithm Description
Fast Near Number of typographical errors. Range: Tight(1) - Loose(4)
Accurate Near Number of typographical errors. Range: Tight(1) - Loose(4)


NOTE: Since these algorithms are not published and the range was originally developed to represent a general sliding scale (narrow choice of precision), we recommend using Near:1 and carefully test before you consider using the higher settings in production, as doing so can quickly return false duplicates.


The following algorithms use a percentage range of 0-100%, indicating the minimum percentage of similarity which will return a match between two strings.

  • N-Gram
  • Jaro
  • Jaro-Winkler
  • LCS
  • Needleman-Wunch
  • MD Keyboard
  • Smith-Waterman-Gotoh
  • Dice’s Coefficient
  • Jaccard Similarity Coefficient
  • Overlap Coefficient
  • Double MetaPhone

More information on the publically published algorithms can be found here: Advanced Algorithms.


Short-Empty Settings

This setting controls whether blank or incomplete fields are considered matches to populated fields or other blank fields. These settings are not exclusive, so two or all three may be selected at one time.

Match if both fields are blank
If two records have the same empty component, that component will be counted as matching.
Match if one field is blank
Allows matching missing data with the full data. For example, “Smith” matches “John Smith.” However, two records with the same component missing will not match.
Match initial to full field
Allows matching abbreviated data with the full data. For example, “J Smith” matches “John Smith.”

Swap Match Pairs

The Swap Match section selects which combination belong to which swap pairs.

Swap Matching allows matching “John Smith” with “Smith John.”

The components must be of the same size and should have the same set of matching options (for example, one can’t use Phonetex the other SoundEx). Up to eight pairs, A through H, can be defined.

For more information on using swap pairs, see Swap Matching Uses.

Swap Pair Configuration

To configure a swap pair:

  1. Click the Swapping... button.
  2. The Matchcode Swap Pairs dialog will open.
  3. First select the pair tab you desire to edit. Pair A is selected by default.
  4. Select the two components that will be used for this swap pair by selecting them in their respective drop down menus.
  5. Then select the swapping rule:
    Both components must match
    The contents of both components must be a match according to fuzzy matching strategy in use for both components. “John Smith matches “Smith John” but not “Smith <blank>.”
    Either component can match
    At least one of the components must match. “John Smith matches both “Smith John” and “Smith <blank>.”
  6. Click OK.

Combinations

Use these check boxes to select which of the 16 possible combinations will use this component.

It is easier to visualize the effects of these boxes if you look at the list of matchcode components as well:

MU MatchcodeEditor Combinations.png

It is important to note that each VERTICAL column of check marks designates one matchcode. For example, the screen shot above shows a combination that is made up of 4 matchcodes:

  1. Zip5, Last Name, First Name, Street Number, Street Name
  2. Zip5, Last Name, First Name, PO Box
  3. Zip5, Company, Street Number, Street Name
  4. Zip5, Company, PO Box

Matching Rules

This section details the matching rules, depending on your selections under the Matchcode List.