Issues:MatchUp Object: Difference between revisions

From Melissa Data Wiki
Jump to navigation Jump to search
Tim (talk | contribs)
No edit summary
No edit summary
 
(16 intermediate revisions by 2 users not shown)
Line 1: Line 1:
[[MatchUp Object|← MatchUp Object]]
{{CustomTOC}}
{{CustomTOC}}


==Large KeyFile Size effect on Memory resources==
==MatchUp Editor==
By default, MatchUp object allocates a large SetUserInfo, the unique identifier attached to built match key - 1024 bytes.  
''Build 5166''
See [[Best Practices:MatchUp API|MatchUp Best Practices]] for override instructions.
 
The MatchUpEditor.exe standalone matchcode editor GUI interface will display an unhandled exception error when selecting the Swapping configuration button. Existing matchcodes with configured ‘Swapping’ settings will still process correctly. Therefore, the current library build will not be able to create matchcodes with swapping using the GUI editor. This will be fixed in the next build.
 
 
==Matchcode Component using Start:Word:1==
''Build 5131''
 
Matchcodes that use the <code>Start:Word</code> setting on any component may experience slow processing times. Even though using this setting should only result in an insignificant amount of additional key building time, it has been reported that increasingly larger record counts leads to slower processing times. Development is looking into the issue, and is not recommended for usage when processing large amounts of data.
 
 
==Error Executing Linux Scripts==
''Build 5118''
 
When running some of the linux scripts, you may receive the error:
 
<code>bad interpreter: No such file or directory</code>
 
As we migrate our ISO building process to a newer automated process, some of the files are still being compiled with ‘windows’ end of line formatting.


;Resolution
<code>Run … $ dos2unix <scriptname></code>


==Fuzzy: Containment returns false for empty values==
If you set a component as Both Fields for the short empty. <br>
and set fuzzy setting to Containment <br>
Then, Compare two records where the component values are blank for both records. <br>
The records do not match


==Matchcode Edits Not Saving==
When using the Matchcode Editor, when selecting [OK] after editing a matchcode and then reopening the matchcode editor, the changes have not been saved.


==Fuzzy: Jaro and Jaro-Winkler crashes on certain columns==
;Resolution
Under certain combinations, Jaro settings for component size and component distance can cause a crash
Check the location of your Matchcode data files. This is usually C:\ProgramData/Melissa DATA\MatchUP unless you have overridden the installer and or your code. Right Click the MatchUp folder and give ‘Full Control’ permissions to the respective user group.




==Fuzzy: Legacy Matchcodes==
==.NET Wrapper Namespace==
Legacy Matchcodes, imported from previous versions, allowed a Fuzzy: Near setting of '0'. This is incompatible with the current version and can cause an error. Using the interface to edit the matchcode by changing the Distance to 1 will resolve the problem
''Build 2628''


The namespace in the new mdMatchUp.cs classes is 'namespace MDClasses'. The previous wrapper namespace is 'namespace MelissaData' thus breaking backwards compatibility. Updating to the new wrapper will require you do change the namespace and recompile. We will rollback the namespace name in the next build.


==Fuzzy: First component with set distance missing dupes==


==Fuzzy Algorithms: Accurate Near , Frequency, and other Levenshtein based algorithms==
Using the Accurate Near fuzzy algorithm on one or more components, running the same data set repeatedly could sporadically return different dupe counts.


==Keybuilding==
This may not be apparent for a single run because the fuzzy algorithms do not return a percent difference for used algorithms, only returning a status whether the algorithm found a match between records.
Email Domain Update: If your source data contains a TLD of a domain you want to update, the key will not get updated with the new domain. If the domain you wish to update does not have the TLD in the source data, the key does get built correctly.  


We identified a new compiler issue and have also reviewed our other fuzzy algorithms which use similar computations and compiler variable initialization.


When parsing a FullName (keybuilding) datatype, hyphenated last names with spaces around the hyphen, keys not built with both last name parts.
If you use any of the Fuzzy algorithms, please contact us and we will provide you with the available patch.




Alpha Numeric County Road patterns' keys not built correctly <br>
==Large KeyFile Size effect on Memory resources==
ex: 23505 K49, Le Mars, IA, 51031
By default, MatchUp object allocates a large SetUserInfo, the unique identifier attached to built match key - 1024 bytes.


See [[MatchUp Object:Best Practices|MatchUp Object Best Practices]] for override instructions.


==Data: highway pattern parsing==
This is for the ReadWrite Interface only.
The mdMatchUp.dat has many 'Highway type' indicators, but needs... <br>


[Street] <br>
 
COUNTY ROAD,Highway,Hwy,4 <br>
==Fuzzy: Legacy Matchcodes==
Legacy Matchcodes, imported from previous versions, allowed a Fuzzy: Near setting of '0'. This is incompatible with the current version and can cause an error. Using the interface to edit the matchcode by changing the Distance to 1 will resolve the problem




==Fuzzy: First component with set distance missing dupes==
Setting a distance for a first component forces the component to use the Intersecting deduper. This may result in records within a set distance to be put in different clusters, and therefore may never get compared. <br>
Workaround: Use an exact algorithm in the first component and keep a distance component, if required further down the component list. This will prevent missed dupes (and give you better speed benchmarks. <br>
Resolution: This may require an advanced change to the deduper. Development is aware of the issue and is exploring options.


[[MatchUp Object|Back to MatchUp Object Main Page]]


[[Category:Issues]]
[[Category:Issues]]
[[Category:MatchUp Object]]
[[Category:MatchUp Object]]

Latest revision as of 21:55, 11 January 2024

← MatchUp Object


MatchUp Editor

Build 5166

The MatchUpEditor.exe standalone matchcode editor GUI interface will display an unhandled exception error when selecting the Swapping configuration button. Existing matchcodes with configured ‘Swapping’ settings will still process correctly. Therefore, the current library build will not be able to create matchcodes with swapping using the GUI editor. This will be fixed in the next build.


Matchcode Component using Start:Word:1

Build 5131

Matchcodes that use the Start:Word setting on any component may experience slow processing times. Even though using this setting should only result in an insignificant amount of additional key building time, it has been reported that increasingly larger record counts leads to slower processing times. Development is looking into the issue, and is not recommended for usage when processing large amounts of data.


Error Executing Linux Scripts

Build 5118

When running some of the linux scripts, you may receive the error:

bad interpreter: No such file or directory

As we migrate our ISO building process to a newer automated process, some of the files are still being compiled with ‘windows’ end of line formatting.

Resolution

Run … $ dos2unix <scriptname>


Matchcode Edits Not Saving

When using the Matchcode Editor, when selecting [OK] after editing a matchcode and then reopening the matchcode editor, the changes have not been saved.

Resolution

Check the location of your Matchcode data files. This is usually C:\ProgramData/Melissa DATA\MatchUP unless you have overridden the installer and or your code. Right Click the MatchUp folder and give ‘Full Control’ permissions to the respective user group.


.NET Wrapper Namespace

Build 2628

The namespace in the new mdMatchUp.cs classes is 'namespace MDClasses'. The previous wrapper namespace is 'namespace MelissaData' thus breaking backwards compatibility. Updating to the new wrapper will require you do change the namespace and recompile. We will rollback the namespace name in the next build.


Fuzzy Algorithms: Accurate Near , Frequency, and other Levenshtein based algorithms

Using the Accurate Near fuzzy algorithm on one or more components, running the same data set repeatedly could sporadically return different dupe counts.

This may not be apparent for a single run because the fuzzy algorithms do not return a percent difference for used algorithms, only returning a status whether the algorithm found a match between records.

We identified a new compiler issue and have also reviewed our other fuzzy algorithms which use similar computations and compiler variable initialization.

If you use any of the Fuzzy algorithms, please contact us and we will provide you with the available patch.


Large KeyFile Size effect on Memory resources

By default, MatchUp object allocates a large SetUserInfo, the unique identifier attached to built match key - 1024 bytes.

See MatchUp Object Best Practices for override instructions.

This is for the ReadWrite Interface only.


Fuzzy: Legacy Matchcodes

Legacy Matchcodes, imported from previous versions, allowed a Fuzzy: Near setting of '0'. This is incompatible with the current version and can cause an error. Using the interface to edit the matchcode by changing the Distance to 1 will resolve the problem


Fuzzy: First component with set distance missing dupes

Setting a distance for a first component forces the component to use the Intersecting deduper. This may result in records within a set distance to be put in different clusters, and therefore may never get compared.
Workaround: Use an exact algorithm in the first component and keep a distance component, if required further down the component list. This will prevent missed dupes (and give you better speed benchmarks.
Resolution: This may require an advanced change to the deduper. Development is aware of the issue and is exploring options.