Matchcode Optimization:Phonetex: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
No edit summary |
||
Line 5: | Line 5: | ||
==Phonetex== | ==Phonetex== | ||
===Specifics=== | ===Specifics=== | ||
(pronounced "Fo-NEH-tex") An auditory matching algorithm developed by Melissa Data. It works best in matching words that sound alike but are spelled differently. It is an improvement over the Soundex algorithm. | :(pronounced "Fo-NEH-tex") An auditory matching algorithm developed by Melissa Data. It works best in matching words that sound alike but are spelled differently. It is an improvement over the Soundex algorithm. | ||
===Summary=== | ===Summary=== | ||
A variation of the SoundEx Algorithm. PhonetEx takes into account letter combinations that sound alike, particularly at the start of the word (such as 'PN' = 'N', 'PH' = 'F'). | :A variation of the SoundEx Algorithm. PhonetEx takes into account letter combinations that sound alike, particularly at the start of the word (such as 'PN' = 'N', 'PH' = 'F'). | ||
===Returns=== | ===Returns=== | ||
The Phonetex algorithm is a string transformation and comparison-based algorithm and is performed on the keybuilding. For example, JOHNSON would be transformed to "J565000000" and JHNSN would also be transformed to "J565000000" which would then be considered a Phonetex match after evaluation. | :The Phonetex algorithm is a string transformation and comparison-based algorithm and is performed on the keybuilding. For example, JOHNSON would be transformed to "J565000000" and JHNSN would also be transformed to "J565000000" which would then be considered a Phonetex match after evaluation. | ||
===Example Matchcode Component=== | ===Example Matchcode Component=== | ||
Line 19: | Line 19: | ||
{{ExampleDataTableV1|STRING1|STRING2|RESULT | {{ExampleDataTableV1|STRING1|STRING2|RESULT | ||
|AdditionalRows= | |AdditionalRows= | ||
{{EDTRow| | {{EDTRow|Green|Johnson|Jhnsn|Match Found}} | ||
{{EDTRow| | {{EDTRow|Green|Stephenz|Stevens|Match Found}} | ||
{{EDTRow|Green|Beaumarchais|Bumarchay|Match Found}} | {{EDTRow|Green|Beaumarchais|Bumarchay|Match Found}} | ||
{{EDTRow|Green|Neumon|Pneumon|Match Found}} | {{EDTRow|Green|Neumon|Pneumon|Match Found}} | ||
Line 34: | Line 34: | ||
===Recommended Usage=== | ===Recommended Usage=== | ||
Large or enterprise level batch runs where. Using this algorithm will not prevent efficient clustering. Since the algorithm is performed during keybuilding, throughput will be fast. | :Large or enterprise level batch runs where. Using this algorithm will not prevent efficient clustering. Since the algorithm is performed during keybuilding, throughput will be fast. | ||
Databases created via real-time data entry where audio likeness errors are introduced. | :Databases created via real-time data entry where audio likeness errors are introduced. | ||
Databases of US and English language origin. | :Databases of US and English language origin. | ||
===Not Recommended For=== | ===Not Recommended For=== | ||
Databases of non-US and non-English language origin. | :Databases of non-US and non-English language origin. | ||
Fields whose content data is of type Dictionary or Quantifiable. | :Fields whose content data is of type Dictionary or Quantifiable. | ||
===Do Not Use With=== | ===Do Not Use With=== | ||
UTF-8 data. This algorithm was ported to MatchUp with the assumption that a character equals one byte, and therefore results may not be accurate if the data contains multi-byte characters. | :UTF-8 data. This algorithm was ported to MatchUp with the assumption that a character equals one byte, and therefore results may not be accurate if the data contains multi-byte characters. | ||
[[Category:MatchUp Hub]] | [[Category:MatchUp Hub]] | ||
[[Category:Matchcode Optimization]] | [[Category:Matchcode Optimization]] |
Latest revision as of 14:30, 27 September 2018
Phonetex
Specifics
- (pronounced "Fo-NEH-tex") An auditory matching algorithm developed by Melissa Data. It works best in matching words that sound alike but are spelled differently. It is an improvement over the Soundex algorithm.
Summary
- A variation of the SoundEx Algorithm. PhonetEx takes into account letter combinations that sound alike, particularly at the start of the word (such as 'PN' = 'N', 'PH' = 'F').
Returns
- The Phonetex algorithm is a string transformation and comparison-based algorithm and is performed on the keybuilding. For example, JOHNSON would be transformed to "J565000000" and JHNSN would also be transformed to "J565000000" which would then be considered a Phonetex match after evaluation.
Example Matchcode Component
Example Data
STRING1 STRING2 RESULT Johnson Jhnsn Match Found Stephenz Stevens Match Found Beaumarchais Bumarchay Match Found Neumon Pneumon Match Found
Performance | |||||
---|---|---|---|---|---|
Slower | Faster | ||||
Matches | |||||
More Matches | Greater Accuracy |
Recommended Usage
- Large or enterprise level batch runs where. Using this algorithm will not prevent efficient clustering. Since the algorithm is performed during keybuilding, throughput will be fast.
- Databases created via real-time data entry where audio likeness errors are introduced.
- Databases of US and English language origin.
Not Recommended For
- Databases of non-US and non-English language origin.
- Fields whose content data is of type Dictionary or Quantifiable.
Do Not Use With
- UTF-8 data. This algorithm was ported to MatchUp with the assumption that a character equals one byte, and therefore results may not be accurate if the data contains multi-byte characters.