Matchcode Optimization:Longest Common Substring (LCS): Difference between revisions
Jump to navigation
Jump to search
Created page with "{{MatchcodeOptimizationNav |AlgorithmsCollapse= }} ==Longest Common Substring (LCS)== ===Specifics=== Finds the longest common substring between the two strings. ===Summary=..." |
No edit summary |
||
Line 5: | Line 5: | ||
==Longest Common Substring (LCS)== | ==Longest Common Substring (LCS)== | ||
===Specifics=== | ===Specifics=== | ||
Finds the longest common substring between the two strings. | :Finds the longest common substring between the two strings. | ||
===Summary=== | ===Summary=== | ||
This algorithm finds the longest common substring between two values. For example, the longest common substring between “ABCDE” and “ABCEF” is “ABC” | :This algorithm finds the longest common substring between two values. For example, the longest common substring between “ABCDE” and “ABCEF” is “ABC” | ||
===Returns=== | ===Returns=== | ||
lenLCS / maxLen | :lenLCS / maxLen | ||
===Example Matchcode Component=== | ===Example Matchcode Component=== | ||
Line 19: | Line 19: | ||
{{ExampleDataTableV1|STRING1|STRING2|RESULT | {{ExampleDataTableV1|STRING1|STRING2|RESULT | ||
|AdditionalRows= | |AdditionalRows= | ||
{{EDTRow| | {{EDTRow|Green|Abcd|Abce|Match}} | ||
{{EDTRow|White|Abcde|Abcef|Unique}} | {{EDTRow|White|Abcde|Abcef|Unique}} | ||
{{EDTRow|Green|Ron Doe|Ron Doe67|Match}} | {{EDTRow|Green|Ron Doe|Ron Doe67|Match}} | ||
Line 34: | Line 34: | ||
===Recommended Usage=== | ===Recommended Usage=== | ||
Hybrid Deduper - Where a single incoming record can quickly be evaluated independently against each record in an existing large master database. | :Hybrid Deduper - Where a single incoming record can quickly be evaluated independently against each record in an existing large master database. | ||
Batch or Enterprise runs where the first component allows efficient clustering. | :Batch or Enterprise runs where the first component allows efficient clustering. | ||
Databases where unrecognized keyword variations appear in some of the records. | :Databases where unrecognized keyword variations appear in some of the records. | ||
General or Company data that contain a large similar string but have slight variations in valid keywords and company acronyms cannot accurately be built | :General or Company data that contain a large similar string but have slight variations in valid keywords and company acronyms cannot accurately be built | ||
===Not Recommended=== | ===Not Recommended=== | ||
Short name string comparison. | :Short name string comparison. | ||
Gather / scatter, Survivorship, or record consolidation of sensitive data. | :Gather / scatter, Survivorship, or record consolidation of sensitive data. | ||
Quantifiable data or records with proprietary keywords not associated in our knowledgebase tables. | :Quantifiable data or records with proprietary keywords not associated in our knowledgebase tables. | ||
===Do Not Use With=== | ===Do Not Use With=== | ||
UTF-8 data. This algorithm was ported to Matchup with the assumption that a character equals one byte, and therefore results may not be accurate if the data contains multi-byte characters. | :UTF-8 data. This algorithm was ported to Matchup with the assumption that a character equals one byte, and therefore results may not be accurate if the data contains multi-byte characters. | ||
[[Category:MatchUp Hub]] | [[Category:MatchUp Hub]] | ||
[[Category:Matchcode Optimization]] | [[Category:Matchcode Optimization]] |
Latest revision as of 14:25, 27 September 2018
Longest Common Substring (LCS)
Specifics
- Finds the longest common substring between the two strings.
Summary
- This algorithm finds the longest common substring between two values. For example, the longest common substring between “ABCDE” and “ABCEF” is “ABC”
Returns
- lenLCS / maxLen
Example Matchcode Component
Example Data
STRING1 STRING2 RESULT Abcd Abce Match Abcde Abcef Unique Ron Doe Ron Doe67 Match Al Doe Aerostructures Co Ad Aerostructures Co Match
Performance | |||||
---|---|---|---|---|---|
Slower | Faster | ||||
Matches | |||||
More Matches | Greater Accuracy |
Recommended Usage
- Hybrid Deduper - Where a single incoming record can quickly be evaluated independently against each record in an existing large master database.
- Batch or Enterprise runs where the first component allows efficient clustering.
- Databases where unrecognized keyword variations appear in some of the records.
- General or Company data that contain a large similar string but have slight variations in valid keywords and company acronyms cannot accurately be built
Not Recommended
- Short name string comparison.
- Gather / scatter, Survivorship, or record consolidation of sensitive data.
- Quantifiable data or records with proprietary keywords not associated in our knowledgebase tables.
Do Not Use With
- UTF-8 data. This algorithm was ported to Matchup with the assumption that a character equals one byte, and therefore results may not be accurate if the data contains multi-byte characters.