<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>http://wiki.melissadata.com/index.php?action=history&amp;feed=atom&amp;title=Pentaho%3AMatchUp%3AAlgorithms</id>
	<title>Pentaho:MatchUp:Algorithms - Revision history</title>
	<link rel="self" type="application/atom+xml" href="http://wiki.melissadata.com/index.php?action=history&amp;feed=atom&amp;title=Pentaho%3AMatchUp%3AAlgorithms"/>
	<link rel="alternate" type="text/html" href="http://wiki.melissadata.com/index.php?title=Pentaho:MatchUp:Algorithms&amp;action=history"/>
	<updated>2026-05-16T21:37:17Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.43.1</generator>
	<entry>
		<id>http://wiki.melissadata.com/index.php?title=Pentaho:MatchUp:Algorithms&amp;diff=9087&amp;oldid=prev</id>
		<title>Admin: Created page with &quot;{{PentahoMatchUpNav |MatchcodeEditorCollapse= }}  {{CustomTOC}}  The MatchUp Editor can use the following matching algorithms:  ===Exact Matching=== Determines whether two str...&quot;</title>
		<link rel="alternate" type="text/html" href="http://wiki.melissadata.com/index.php?title=Pentaho:MatchUp:Algorithms&amp;diff=9087&amp;oldid=prev"/>
		<updated>2015-06-17T17:42:44Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;quot;{{PentahoMatchUpNav |MatchcodeEditorCollapse= }}  {{CustomTOC}}  The MatchUp Editor can use the following matching algorithms:  ===Exact Matching=== Determines whether two str...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;{{PentahoMatchUpNav&lt;br /&gt;
|MatchcodeEditorCollapse=&lt;br /&gt;
}}&lt;br /&gt;
&lt;br /&gt;
{{CustomTOC}}&lt;br /&gt;
&lt;br /&gt;
The MatchUp Editor can use the following matching algorithms:&lt;br /&gt;
&lt;br /&gt;
===Exact Matching===&lt;br /&gt;
Determines whether two strings are identical.&lt;br /&gt;
&lt;br /&gt;
===Soundex===&lt;br /&gt;
SoundEx is a string transformation and comparison-based algorithm. For example, JOHNSON would be transformed to &amp;quot;J525&amp;quot; and JHNSN would also be transformed to &amp;quot;J525&amp;quot; which would then be considered a SoundExing match after evaluation.&lt;br /&gt;
If the original strings are identical, SoundEx will return 100%. If the SoundEx&amp;#039;d strings are equal, the algorithm returns 99%. Otherwise, SoundEx will return 0%.&lt;br /&gt;
&lt;br /&gt;
===Phonetex===&lt;br /&gt;
A variation of the SoundEx Algorithm. PhonetEx takes into account letter combinations that sound alike, particularly at the start of the word (such as &amp;#039;PN&amp;#039; = &amp;#039;N&amp;#039;, &amp;#039;PH&amp;#039; = &amp;#039;F&amp;#039;).&lt;br /&gt;
&lt;br /&gt;
===Containment===&lt;br /&gt;
Matches when one record&amp;#039;s component is contained in another record. For example, “Smith” is contained in “Smithfield.”&lt;br /&gt;
&lt;br /&gt;
===Frequency===&lt;br /&gt;
Matches the characters in one record’s component to the characters in another without any regard to the sequence. For example “abcdef” would match ”badcfe.”&lt;br /&gt;
&lt;br /&gt;
===Fast Near===&lt;br /&gt;
A typographical matching algorithm. It works best in matching words that don’t match because of a few typographical errors. Exactly how many errors is specified on a scale from 1(Tight) to 4(Loose). The Fast Near algorithm is a faster approximation of the Accurate Near algorithm described below. The tradeoff for speed is accuracy; sometimes Fast Near will find false matches or miss true matches.&lt;br /&gt;
&lt;br /&gt;
===Accurate Near===&lt;br /&gt;
An implementation of the Levenshtein algorithm. It is a typographical matching algorithm. The Accurate Near algorithm produces better results than the Fast Near algorithm, but is slower.&lt;br /&gt;
&lt;br /&gt;
===Frequency Near===&lt;br /&gt;
The Frequency algorithm will match the characters of one string to the characters of another without any regard to the sequence. For example &amp;quot;abcdef&amp;quot; would be considered a 100% match to &amp;quot;badcfe.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
===Vowels===&lt;br /&gt;
Only vowels will be compared. Consonants will be removed.&lt;br /&gt;
&lt;br /&gt;
===Consonants===&lt;br /&gt;
Only consonants will be compared. Vowels will be removed.&lt;br /&gt;
&lt;br /&gt;
===Alphabetic===&lt;br /&gt;
Only alphabetic characters will be compared.&lt;br /&gt;
&lt;br /&gt;
===Numeric===&lt;br /&gt;
Only numeric characters will be compared. Decimals and signs are considered numeric.&lt;br /&gt;
&lt;br /&gt;
===N-Gram===&lt;br /&gt;
Counts the number of common sub-strings (grams) between the two strings. Substring size ‘N’, is currently defaulted as 2 in MatchUp.&lt;br /&gt;
&lt;br /&gt;
===Jaro===&lt;br /&gt;
Gathers common characters (in order) between the two strings, then counts transpositions between the two common strings.&lt;br /&gt;
&lt;br /&gt;
===Jaro-Winkler===&lt;br /&gt;
Just like Jaro, but gives added weight for matching characters at the start of the string (up to 4 characters).&lt;br /&gt;
&lt;br /&gt;
===Longest Common Substring (LCS)===&lt;br /&gt;
Finds the longest common substring between the two strings.&lt;br /&gt;
&lt;br /&gt;
===Needleman-Wunch===&lt;br /&gt;
A variation of the Levenshtein algorithm. Levenshtein and Needleman-Wunsch are identical except that character mistakes are given different weights depending on how far two characters are on a standard keyboard layout. For example: A to S is given a mistake weight of 0.4, while A to D is a 0.6 and A to P is a 1.0.&lt;br /&gt;
&lt;br /&gt;
===MD Keyboard===&lt;br /&gt;
An algorithm developed by Melissa Data which counts keyboarding mis-hits with a weighted penalty based on the distance of the mis-hit and assigns a percentage of similarity between the compared strings.&lt;br /&gt;
This effectively adds the &amp;quot;understanding&amp;quot; that the keyboarder may have typed in one character before another.&lt;br /&gt;
&lt;br /&gt;
===Smith-Waterman-Gotoh===&lt;br /&gt;
A variation of the Needleman-Wunsch algorithm. Needleman-Wunsch and Smith-Waterman-Gotoh are identical except that character deletions are given a different weight.&lt;br /&gt;
This effectively adds the &amp;quot;understanding&amp;quot; that the keyboarder may have tried to abbreviate one of the words.&lt;br /&gt;
&lt;br /&gt;
===Dice’s Coefficient===&lt;br /&gt;
A variation of the N-Gram algorithm. Dice&amp;#039;s Coefficient counts matching n-Grams but does not count extra duplicate n-Grams.&lt;br /&gt;
&lt;br /&gt;
===Jaccard Similarity Coefficient===&lt;br /&gt;
A variation of the N-Gram algorithm. The Jaccard Similarity is identical to the N-Gram algorithm but uses a different formula for similarity computation.&lt;br /&gt;
&lt;br /&gt;
===Overlap Coefficient===&lt;br /&gt;
A variation of the N-Gram algorithm. The Overlap Coefficient is identical to the N-Gram algorithm but uses a different formula for similarity computation.&lt;br /&gt;
&lt;br /&gt;
===Double MetaPhone===&lt;br /&gt;
A variation of the PhonetEx Algorithm. Double Metaphone performs 2 different PhonetEx-style transformations. It creates two PhonetEx-like strings (primary and alternate) for both strings.&lt;br /&gt;
The logic used for Double Metaphone Similarity works as follows:&lt;br /&gt;
*If primary1 = primary2 and alternate1 = alternate 2, then we have a very good match (99%).&lt;br /&gt;
*If either primary1 = alternate2 or alternate1 = primary2, and alternate1=alternate2, then we have a good match (85%).&lt;br /&gt;
*If alternate 1 = alternate2, we have an acceptable match (75%).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
[[Category:Pentaho]]&lt;/div&gt;</summary>
		<author><name>Admin</name></author>
	</entry>
</feed>