Jump to navigation Jump to search
- Winkler Distance
- Gathers common characters (in order) between the two strings, then counts transpositions between the two common strings.
- Percentage of similarity
- 1/3 * (common/len1 + common/len2 + (common-transpositions)/common)
- Where common is defined as a character match if the distance within the 2 strings is within the algorithms defined range. Transpositions are defined as: a character match (but different sequence order) /2
Example Matchcode Component
STRING1 STRING2 RESULT Johnson Jhnsn Match Found Maguire Mcguire Match Found Beaumarchais Bumarchay Unique Deanardo Dinardio Unique
|More Matches||Greater Accuracy|
- Hybrid deduper, where a single incoming record can quickly be evaluated independently against each record in an existing large master database.
- Databases created with abbreviations or similar word substitutions.
Not Recommended For
- Large or Enterprise level batch runs. Since the algorithm must be evaluated for each record comparison, throughput will be very slow.
- Databases created via real-time data entry where audio likeness errors are introduced.
Do Not Use With
- UTF-8 data. This algorithm was ported to MatchUp with the assumption that a character equals one byte, and therefore results may not be accurate if the data contains multi-byte characters.