Matchcode Optimization:MD Keyboard
MD Keyboard
Specifics
An algorithm developed by Melissa Data which counts keyboarding mis-hits.
Summary
This is a typographical matching algorithm which counts keyboarding mis-hits with a weighted penalty based on the distance of the mis-hit and assigns a percentage of similarity between the compared strings. Thus two records with c > v or v > b typos are more likely to have an actual duplicate.
Returns
Percentage of similarity
Example Matchcode Component
Example Data
STRING1 STRING2 RESULT Johnson Jhnsn Unique Neumon Pneumon Match Found Hteberynost Theverymost Match Found Covert Coberh Match Found
Performance | |||||
---|---|---|---|---|---|
Slower | Faster | ||||
Matches | |||||
More Matches | Greater Accuracy |
Recommended Usage
Hybrid deduper, where a single incoming record can quickly be evaluated independently against each record in an existing large master database.
Batch processes where MDKEY is set on a single non-first matchcode component.
Databases where data entry is created real-time from call center or other inputs where keyboard mishits are more likely.
Not Recommended For
Databases where the number of errors with relation to the string length result is a small number of common substrings.
Do Not Use With
UTF-8 data. This algorithm was ported to MatchUp with the assumption that a character equals one byte, and therefore results may not be accurate if the data contains multi-byte characters.