Matchcode Optimization:Frequency Near: Difference between revisions
Jump to navigation
Jump to search
Created page with "{{MatchcodeOptimizationNav |AlgorithmsCollapse= }} ==Frequency Near== ===Specifics=== The Frequency near algorithm will match the characters of one string to the characters o..." |
No edit summary |
||
Line 5: | Line 5: | ||
==Frequency Near== | ==Frequency Near== | ||
===Specifics=== | ===Specifics=== | ||
The Frequency near algorithm will match the characters of one string to the characters of another without any regard to the sequence while allowing a set number of differences. | :The Frequency near algorithm will match the characters of one string to the characters of another without any regard to the sequence while allowing a set number of differences. | ||
===Summary=== | ===Summary=== | ||
Frequency Near can be used when 2 strings are expected to have the same characters, but might be transposed or have an insertion or deletion. For example "abcdef" would be considered a 100% match to "badcfe" or “badcfx”. | :Frequency Near can be used when 2 strings are expected to have the same characters, but might be transposed or have an insertion or deletion. For example "abcdef" would be considered a 100% match to "badcfe" or “badcfx”. | ||
===Returns=== | ===Returns=== | ||
Boolean ‘match’ if the compared data has the same values. | :Boolean ‘match’ if the compared data has the same values. | ||
===Example Matchcode Component=== | ===Example Matchcode Component=== | ||
Line 19: | Line 19: | ||
{{ExampleDataTableV1|STRING1|STRING2|RESULT | {{ExampleDataTableV1|STRING1|STRING2|RESULT | ||
|AdditionalRows= | |AdditionalRows= | ||
{{EDTRow| | {{EDTRow|Green|Johnson|Jhnsn|Match}} | ||
{{EDTRow|Green|Lynda|Dylan|Match}} | {{EDTRow|Green|Lynda|Dylan|Match}} | ||
{{EDTRow|Green|A B D H T|A T H D X|Match}} | {{EDTRow|Green|A B D H T|A T H D X|Match}} | ||
Line 34: | Line 34: | ||
===Recommended Usage=== | ===Recommended Usage=== | ||
Batch processing—this is a fast algorithm which will identify a greater percentage of duplicates because it will count exact matches and minor character transpositions. | :Batch processing—this is a fast algorithm which will identify a greater percentage of duplicates because it will count exact matches and minor character transpositions. | ||
This algorithm is also recommended when the data is comprised of single character dictionary values like ‘A B C’. | :This algorithm is also recommended when the data is comprised of single character dictionary values like ‘A B C’. | ||
===Not Recommended For=== | ===Not Recommended For=== | ||
Short name data types where a simple character transformation would represent a different value. This algorithm is also not recommended when trying to identify differences in long strings. | :Short name data types where a simple character transformation would represent a different value. This algorithm is also not recommended when trying to identify differences in long strings. | ||
===Do Not Use With=== | ===Do Not Use With=== | ||
UTF-8 data. This algorithm was ported to MatchUp with the assumption that a character equals one byte, and therefore results may not be accurate if the data contains multi-byte characters. | :UTF-8 data. This algorithm was ported to MatchUp with the assumption that a character equals one byte, and therefore results may not be accurate if the data contains multi-byte characters. | ||
[[Category:MatchUp Hub]] | [[Category:MatchUp Hub]] | ||
[[Category:Matchcode Optimization]] | [[Category:Matchcode Optimization]] |
Latest revision as of 23:20, 26 September 2018
Frequency Near
Specifics
- The Frequency near algorithm will match the characters of one string to the characters of another without any regard to the sequence while allowing a set number of differences.
Summary
- Frequency Near can be used when 2 strings are expected to have the same characters, but might be transposed or have an insertion or deletion. For example "abcdef" would be considered a 100% match to "badcfe" or “badcfx”.
Returns
- Boolean ‘match’ if the compared data has the same values.
Example Matchcode Component
Example Data
STRING1 STRING2 RESULT Johnson Jhnsn Match Lynda Dylan Match A B D H T A T H D X Match A B D H T A T H D B Match
Performance | |||||
---|---|---|---|---|---|
Slower | Faster | ||||
Matches | |||||
More Matches | Greater Accuracy |
Recommended Usage
- Batch processing—this is a fast algorithm which will identify a greater percentage of duplicates because it will count exact matches and minor character transpositions.
- This algorithm is also recommended when the data is comprised of single character dictionary values like ‘A B C’.
Not Recommended For
- Short name data types where a simple character transformation would represent a different value. This algorithm is also not recommended when trying to identify differences in long strings.
Do Not Use With
- UTF-8 data. This algorithm was ported to MatchUp with the assumption that a character equals one byte, and therefore results may not be accurate if the data contains multi-byte characters.