Matchcode Optimization:Advanced Component Types

Matchcode Optimization Navigation

Matchcode Optimization
First Component
Fuzzy Algorithms
Swap Matching
Blank Matching
Advanced Component Types

Algorithms
Accunear
Alphas
Consonants
Containment
Dice's Coefficient
Double Metaphone
Exact
Fast Near
Frequency
Frequency Near
Jaccard Similarity Coefficient
Jaro
Jaro-Winkler
Longest Common Substring (LCS)
MD Keyboard
Needleman-Wunsch
N-Gram
Numeric
Overlap Coefficient
Phonetex
Smith-Waterman-Gotoh
Soundex
UTF8 Near
Vowels

Advanced Matchcode Data Type

Specifics

Matchcode Components

Summary

Most matchcode component data types specify the format of the source data, and any advanced operations that need to be performed on that component are specified in its properties. There are three exceptions, which also require a unit range of variance that will still constitute a match:

Date (days)
Numeric (units)
Proximity (miles)

Returns

A match if the distance between two records being matched is within the configured range.

Example Matchcode Usage 1

Example Data 1

NAME	DATE	RESULT
John	19980422	Match Found
John	19980426	Match Found
John	20181107	Unique

Example Matchcode Usage 2

Example Data 2

COMPANY	EMPLOYEES	RESULT
Wilson Elec	640	Match Found
Wilsons	15	Match Found
Wilson Corp	623	Match Found

Example Matchcode Usage 3

Example Data 3

LATITUDE	LONGITUDE	RESULT
33.63757	-117.6073	Match Found
33.637466	-117.609415	Match Found
33.650388	-117.837956	Unique

Performance
	Slower	Faster
Matches
	More Matches	Greater Accuracy

Recommended Usage

Hybrid deduper, where a single incoming record can quickly be evaluated independently against each record in an existing large master database.

Small batch runs or larger batch runs when higher listed matchcode components have efficiently grouped records by clustering and therefore reduced the number of records that need to have the unit difference math performed.

Not Recommended For

Large or enterprise level batch runs. Since the proximity must be evaluated for each record comparison, throughput will be very slow. Each swapping attempt takes a late speed hit similar to when using a fuzzy algorithm.

Matchcode Optimization:Advanced Component Types

Contents