Issues:MatchUp Object: Difference between revisions

Revision as of 18:16, 12 June 2015

Fuzzy Algorithms: Accurate Near , Frequency, and other Levenshtein based algorithms

Using the Accurate Near fuzzy algorithm on one or more components, running the same data set repeatedly could sporadically return different dupe counts.

This may not be apparent for a single run because the fuzzy algorithms do not return a percent difference for used algorithms, only returning a status whether the algorithm found a match between records.

We identified a new compiler issue and have also reviewed our other fuzzy algorithms which use similar computations and compiler variable initialization.

If you use any of the Fuzzy algorithms, please contact us and we will provide you with the available patch.

Large KeyFile Size effect on Memory resources

By default, MatchUp object allocates a large SetUserInfo, the unique identifier attached to built match key - 1024 bytes. See MatchUp Object Best Practices for override instructions.

Fuzzy: Legacy Matchcodes

Legacy Matchcodes, imported from previous versions, allowed a Fuzzy: Near setting of '0'. This is incompatible with the current version and can cause an error. Using the interface to edit the matchcode by changing the Distance to 1 will resolve the problem

Fuzzy: First component with set distance missing dupes

Setting a distance for a first component forces the component to use the Intersecting deduper. This may result in records within a set distance to be put in different clusters, and therefore may never get compared.
Workaround: Use an exact algorithm in the first component and keep a distance component, if required further down the component list. This will prevent missed dupes (and give you better speed benchmarks.
Resolution: This may require an advanced change to the deduper. Development is aware of the issue and is exploring options.

@@ Line 2: / Line 2: @@
 {{CustomTOC}}
-==Fuzzy: Accurate Near deduping==
+==Fuzzy Algorithms: Accurate Near , Frequency, and other Levenshtein based algorithms==
-Using the Accurate Near fuzzy algorithm on one or more components, running the same data set repeatedly may return different dupe counts.
+Using the Accurate Near fuzzy algorithm on one or more components, running the same data set repeatedly could sporadically return different dupe counts.
-This  may not be apparent for a single run because the fuzzy algorithms do not return a percent difference for used algorithms, only returning a status whether the algorithm found a match between records. The recommended workaround is to use FastNear1 (after establishing an Exact baseline).
+This may not be apparent for a single run because the fuzzy algorithms do not return a percent difference for used algorithms, only returning a status whether the algorithm found a match between records.
+We identified a new compiler issue and have also reviewed our other fuzzy algorithms which use similar computations and compiler variable initialization.
+If you use any of the Fuzzy algorithms, please contact us and we will provide you with the available patch.

Issues:MatchUp Object: Difference between revisions

Revision as of 18:16, 12 June 2015

Contents

Fuzzy Algorithms: Accurate Near , Frequency, and other Levenshtein based algorithms

Large KeyFile Size effect on Memory resources

Fuzzy: Legacy Matchcodes

Fuzzy: First component with set distance missing dupes

Navigation menu

Issues:MatchUp Object: Difference between revisions

Revision as of 18:16, 12 June 2015

Fuzzy Algorithms: Accurate Near , Frequency, and other Levenshtein based algorithms

Large KeyFile Size effect on Memory resources

Fuzzy: Legacy Matchcodes

Fuzzy: First component with set distance missing dupes

Navigation menu

Search