Matchcode Optimization:MD Keyboard: Difference between revisions
Jump to navigation
Jump to search
Created page with "{{MatchcodeOptimizationNav |AlgorithmsCollapse= }} ==MD Keyboard== ===Specifics=== An algorithm developed by Melissa Data which counts keyboarding mis-hits. ===Summary=== Th..." |
No edit summary |
||
Line 5: | Line 5: | ||
==MD Keyboard== | ==MD Keyboard== | ||
===Specifics=== | ===Specifics=== | ||
An algorithm developed by Melissa Data which counts keyboarding mis-hits. | :An algorithm developed by Melissa Data which counts keyboarding mis-hits. | ||
===Summary=== | ===Summary=== | ||
This is a typographical matching algorithm which counts keyboarding mis-hits with a weighted penalty based on the distance of the mis-hit and assigns a percentage of similarity between the compared strings. Thus two records with c > v or v > b typos are more likely to have an actual duplicate. | :This is a typographical matching algorithm which counts keyboarding mis-hits with a weighted penalty based on the distance of the mis-hit and assigns a percentage of similarity between the compared strings. Thus two records with c > v or v > b typos are more likely to have an actual duplicate. | ||
===Returns=== | ===Returns=== | ||
Percentage of similarity | :Percentage of similarity | ||
===Example Matchcode Component=== | ===Example Matchcode Component=== | ||
Line 20: | Line 20: | ||
|AdditionalRows= | |AdditionalRows= | ||
{{EDTRow|White|Johnson|Jhnsn|Unique}} | {{EDTRow|White|Johnson|Jhnsn|Unique}} | ||
{{EDTRow| | {{EDTRow|Green|Neumon|Pneumon|Match Found}} | ||
{{EDTRow|Green|Hteberynost|Theverymost|Match Found}} | {{EDTRow|Green|Hteberynost|Theverymost|Match Found}} | ||
{{EDTRow|Green|Covert|Coberh|Match Found}} | {{EDTRow|Green|Covert|Coberh|Match Found}} | ||
Line 34: | Line 34: | ||
===Recommended Usage=== | ===Recommended Usage=== | ||
Hybrid deduper, where a single incoming record can quickly be evaluated independently against each record in an existing large master database. | :Hybrid deduper, where a single incoming record can quickly be evaluated independently against each record in an existing large master database. | ||
Batch processes where MDKEY is set on a single non-first matchcode component. | :Batch processes where MDKEY is set on a single non-first matchcode component. | ||
Databases where data entry is created real-time from call center or other inputs where keyboard mishits are more likely. | :Databases where data entry is created real-time from call center or other inputs where keyboard mishits are more likely. | ||
===Not Recommended For=== | ===Not Recommended For=== | ||
Databases where the number of errors with relation to the string length result is a small number of common substrings. | :Databases where the number of errors with relation to the string length result is a small number of common substrings. | ||
===Do Not Use With=== | ===Do Not Use With=== | ||
UTF-8 data. This algorithm was ported to MatchUp with the assumption that a character equals one byte, and therefore results may not be accurate if the data contains multi-byte characters. | :UTF-8 data. This algorithm was ported to MatchUp with the assumption that a character equals one byte, and therefore results may not be accurate if the data contains multi-byte characters. | ||
[[Category:MatchUp Hub]] | [[Category:MatchUp Hub]] | ||
[[Category:Matchcode Optimization]] | [[Category:Matchcode Optimization]] |
Latest revision as of 14:25, 27 September 2018
MD Keyboard
Specifics
- An algorithm developed by Melissa Data which counts keyboarding mis-hits.
Summary
- This is a typographical matching algorithm which counts keyboarding mis-hits with a weighted penalty based on the distance of the mis-hit and assigns a percentage of similarity between the compared strings. Thus two records with c > v or v > b typos are more likely to have an actual duplicate.
Returns
- Percentage of similarity
Example Matchcode Component
Example Data
STRING1 STRING2 RESULT Johnson Jhnsn Unique Neumon Pneumon Match Found Hteberynost Theverymost Match Found Covert Coberh Match Found
Performance | |||||
---|---|---|---|---|---|
Slower | Faster | ||||
Matches | |||||
More Matches | Greater Accuracy |
Recommended Usage
- Hybrid deduper, where a single incoming record can quickly be evaluated independently against each record in an existing large master database.
- Batch processes where MDKEY is set on a single non-first matchcode component.
- Databases where data entry is created real-time from call center or other inputs where keyboard mishits are more likely.
Not Recommended For
- Databases where the number of errors with relation to the string length result is a small number of common substrings.
Do Not Use With
- UTF-8 data. This algorithm was ported to MatchUp with the assumption that a character equals one byte, and therefore results may not be accurate if the data contains multi-byte characters.