Matchcode Optimization:Longest Common Substring (LCS)
Jump to navigation
Jump to search
Longest Common Substring (LCS)
Specifics
- Finds the longest common substring between the two strings.
Summary
- This algorithm finds the longest common substring between two values. For example, the longest common substring between “ABCDE” and “ABCEF” is “ABC”
Returns
- lenLCS / maxLen
Example Matchcode Component
Example Data
STRING1 STRING2 RESULT Abcd Abce Match Abcde Abcef Unique Ron Doe Ron Doe67 Match Al Doe Aerostructures Co Ad Aerostructures Co Match
Performance | |||||
---|---|---|---|---|---|
Slower | Faster | ||||
Matches | |||||
More Matches | Greater Accuracy |
Recommended Usage
- Hybrid Deduper - Where a single incoming record can quickly be evaluated independently against each record in an existing large master database.
- Batch or Enterprise runs where the first component allows efficient clustering.
- Databases where unrecognized keyword variations appear in some of the records.
- General or Company data that contain a large similar string but have slight variations in valid keywords and company acronyms cannot accurately be built
Not Recommended
- Short name string comparison.
- Gather / scatter, Survivorship, or record consolidation of sensitive data.
- Quantifiable data or records with proprietary keywords not associated in our knowledgebase tables.
Do Not Use With
- UTF-8 data. This algorithm was ported to Matchup with the assumption that a character equals one byte, and therefore results may not be accurate if the data contains multi-byte characters.