Matchcode Optimization:Frequency Near
Frequency Near
Specifics
- The Frequency near algorithm will match the characters of one string to the characters of another without any regard to the sequence while allowing a set number of differences.
Summary
- Frequency Near can be used when 2 strings are expected to have the same characters, but might be transposed or have an insertion or deletion. For example "abcdef" would be considered a 100% match to "badcfe" or “badcfx”.
Returns
- Boolean ‘match’ if the compared data has the same values.
Example Matchcode Component
Example Data
STRING1 STRING2 RESULT Johnson Jhnsn Match Lynda Dylan Match A B D H T A T H D X Match A B D H T A T H D B Match
Performance | |||||
---|---|---|---|---|---|
Slower | Faster | ||||
Matches | |||||
More Matches | Greater Accuracy |
Recommended Usage
- Batch processing—this is a fast algorithm which will identify a greater percentage of duplicates because it will count exact matches and minor character transpositions.
- This algorithm is also recommended when the data is comprised of single character dictionary values like ‘A B C’.
Not Recommended For
- Short name data types where a simple character transformation would represent a different value. This algorithm is also not recommended when trying to identify differences in long strings.
Do Not Use With
- UTF-8 data. This algorithm was ported to MatchUp with the assumption that a character equals one byte, and therefore results may not be accurate if the data contains multi-byte characters.