Matchcode Optimization:MD Keyboard
Jump to navigation Jump to search
- An algorithm developed by Melissa Data which counts keyboarding mis-hits.
- This is a typographical matching algorithm which counts keyboarding mis-hits with a weighted penalty based on the distance of the mis-hit and assigns a percentage of similarity between the compared strings. Thus two records with c > v or v > b typos are more likely to have an actual duplicate.
- Percentage of similarity
Example Matchcode Component
STRING1 STRING2 RESULT Johnson Jhnsn Unique Neumon Pneumon Match Found Hteberynost Theverymost Match Found Covert Coberh Match Found
|More Matches||Greater Accuracy|
- Hybrid deduper, where a single incoming record can quickly be evaluated independently against each record in an existing large master database.
- Batch processes where MDKEY is set on a single non-first matchcode component.
- Databases where data entry is created real-time from call center or other inputs where keyboard mishits are more likely.
Not Recommended For
- Databases where the number of errors with relation to the string length result is a small number of common substrings.
Do Not Use With
- UTF-8 data. This algorithm was ported to MatchUp with the assumption that a character equals one byte, and therefore results may not be accurate if the data contains multi-byte characters.