Matchcode Optimization:Phonetex
Phonetex
Specifics
- (pronounced "Fo-NEH-tex") An auditory matching algorithm developed by Melissa Data. It works best in matching words that sound alike but are spelled differently. It is an improvement over the Soundex algorithm.
Summary
- A variation of the SoundEx Algorithm. PhonetEx takes into account letter combinations that sound alike, particularly at the start of the word (such as 'PN' = 'N', 'PH' = 'F').
Returns
- The Phonetex algorithm is a string transformation and comparison-based algorithm and is performed on the keybuilding. For example, JOHNSON would be transformed to "J565000000" and JHNSN would also be transformed to "J565000000" which would then be considered a Phonetex match after evaluation.
Example Matchcode Component
Example Data
STRING1 STRING2 RESULT Johnson Jhnsn Match Found Stephenz Stevens Match Found Beaumarchais Bumarchay Match Found Neumon Pneumon Match Found
Performance | |||||
---|---|---|---|---|---|
Slower | Faster | ||||
Matches | |||||
More Matches | Greater Accuracy |
Recommended Usage
- Large or enterprise level batch runs where. Using this algorithm will not prevent efficient clustering. Since the algorithm is performed during keybuilding, throughput will be fast.
- Databases created via real-time data entry where audio likeness errors are introduced.
- Databases of US and English language origin.
Not Recommended For
- Databases of non-US and non-English language origin.
- Fields whose content data is of type Dictionary or Quantifiable.
Do Not Use With
- UTF-8 data. This algorithm was ported to MatchUp with the assumption that a character equals one byte, and therefore results may not be accurate if the data contains multi-byte characters.