Matchcode Optimization:Swap Matching

From Melissa Data Wiki
Jump to navigation Jump to search

← MatchUp Hub

Matchcode Optimization Navigation
Matchcode Optimization
First Component
Fuzzy Algorithms
Swap Matching
Blank Matching
Advanced Component Types
Algorithms
Accunear
Alphas
Consonants
Containment
Dice's Coefficient
Double Metaphone
Exact
Fast Near
Frequency
Frequency Near
Jaccard Similarity Coefficient
Jaro
Jaro-Winkler
Longest Common Substring (LCS)
MD Keyboard
Needleman-Wunsch
N-Gram
Numeric
Overlap Coefficient
Phonetex
Smith-Waterman-Gotoh
Soundex
UTF8 Near
Vowels


Swap Matching

Specifics

Summary

Swap matching is used to catch matches when two field values are flipped around. The most common occasion is catching the “John Smith” and “Smith John” records, or when the database contains multiple phone or email fields.

Returns

A match if configured for ‘Both’ components or configured as ‘Either’ component matches where ‘Both’ is defined as a match when both values match before being flipped, or when both values match after the second record has its field values flipped.

‘Either’ is defined as match when either of the two values match before being flipped, or when either of the two values match after the second record has its field values flipped.

Example Matchcode Usage 1

MCO Algorithm Swap Half.png

Example Data 1

STRING1 STRING2 RESULT
John Smith Match Found
Smith John Match Found


Example Matchcode Usage 2

MCO Algorithm Swap Full.png

Examlpe Data 2

STRING1 STRING2 RESULT
781-660-0004 Match Found
781-640-7777 781-660-0004 Match Found



Performance
Slower Faster
Matches
More Matches Greater Accuracy


Recommended Usage

Hybrid deduper, where a single incoming record can quickly be evaluated independently against each record in an existing large master database.

Small batch runs, or larger batch runs when higher listed matchcode components have efficiently grouped records by clustering and therefore reduced the number of records that need to have swapping attempted.

Not Recommended For

Large or Enterprise level batch runs. Since the swapping must be evaluated for each record comparison, throughput will be very slow. Each swapping attempt takes a late speed hit similar to when using a fuzzy algorithm.