Matchcode Optimization:MD Keyboard
MD Keyboard
Specifics
- An algorithm developed by Melissa Data which counts keyboarding mis-hits.
Summary
- This is a typographical matching algorithm which counts keyboarding mis-hits with a weighted penalty based on the distance of the mis-hit and assigns a percentage of similarity between the compared strings. Thus two records with c > v or v > b typos are more likely to have an actual duplicate.
Returns
- Percentage of similarity
Example Matchcode Component
Example Data
STRING1 STRING2 RESULT Johnson Jhnsn Unique Neumon Pneumon Match Found Hteberynost Theverymost Match Found Covert Coberh Match Found
Performance | |||||
---|---|---|---|---|---|
Slower | Faster | ||||
Matches | |||||
More Matches | Greater Accuracy |
Recommended Usage
- Hybrid deduper, where a single incoming record can quickly be evaluated independently against each record in an existing large master database.
- Batch processes where MDKEY is set on a single non-first matchcode component.
- Databases where data entry is created real-time from call center or other inputs where keyboard mishits are more likely.
Not Recommended For
- Databases where the number of errors with relation to the string length result is a small number of common substrings.
Do Not Use With
- UTF-8 data. This algorithm was ported to MatchUp with the assumption that a character equals one byte, and therefore results may not be accurate if the data contains multi-byte characters.