Difference between revisions of "MatchUp Object:Best Practices"

From Melissa Data Wiki
Jump to navigation Jump to search
Line 11: Line 11:
  
  
==Optimizing Speed==
+
==Optimizing Speed: General==
 
BP_MUXX_001 <br>
 
BP_MUXX_001 <br>
1. Matchcodes: Components: Before MatchUp dedupes, it clusters records into groups of possible matches. If
+
1. Network data traffic
your matchcode does not have any components in every used combination, it can not place records into those
 
sub group clusters. In general, the greater number of components used in every combination, the faster the
 
process will be.
 
 
 
2. Matchcode: Fuzzy: MatchUp has an extensive list of Fuzzy Options. Some are performed during the key
 
building process (ie. Soundex) and do not slow the process down. Others are performed on the constructed
 
matchkeys (ie. Near, Jaro, etc.) and therefore slow down the process. If the latter types are required by your
 
process, place them in the component order below an exact component which is also used in every
 
combintation if possible.
 
 
 
3. Network data traffic
 
  
 
We recommend that the source data to be processed by local with repect to the installed Melissa Data
 
We recommend that the source data to be processed by local with repect to the installed Melissa Data
Line 30: Line 19:
 
consolidation with record 'y', are all potential sources o f a slower process.
 
consolidation with record 'y', are all potential sources o f a slower process.
  
4. Source datatype
+
2. Source datatype
  
 
Some database or file types can be read by the calling language or IDE more efficiently than others. Matching your
 
Some database or file types can be read by the calling language or IDE more efficiently than others. Matching your
 
environment to the most efficient file type requires trial and error testing by the developer.
 
environment to the most efficient file type requires trial and error testing by the developer.
  
5. Hardware :
+
3. Hardware :
  
 
It goes without saying that the more hardware you dedicate to a process, the faster it will run. However,
 
It goes without saying that the more hardware you dedicate to a process, the faster it will run. However,
Line 41: Line 30:
 
varied zip code demographics may be able to use multi-processors to process individual clusters of records,
 
varied zip code demographics may be able to use multi-processors to process individual clusters of records,
 
but a database of the same zip code may not. Additionally, for the above factors, hardware may not be the
 
but a database of the same zip code may not. Additionally, for the above factors, hardware may not be the
overrriding factor governing a fast process, ie. a good matchcode may be the most important factor.
+
overriding factor governing a fast process, ie. a good matchcode may be the most important factor.
 +
 
 +
 
 +
==Optimizing Speed: Matchcodes==
 +
BP_MUXX_002 <br>
 +
1. Matchcodes: Components: Before MatchUp dedupes, it clusters records into groups of possible matches. If
 +
your matchcode does not have any components in every used combination, it can not place records into those
 +
sub group clusters. In general, the greater number of components used in every combination, the faster the
 +
process will be.
  
 +
2. Matchcode: Fuzzy: MatchUp has an extensive list of Fuzzy Options. Some are performed during the key
 +
building process (ie. Soundex) and do not slow the process down. Others are performed on the constructed
 +
matchkeys (ie. Near, Jaro, etc.) and therefore slow down the process. If the latter types are required by your
 +
process, place them in the component order below an exact component which is also used in every
 +
combintation if possible.
  
 
==Order of Components in Matchcode==
 
==Order of Components in Matchcode==

Revision as of 15:09, 26 February 2014

Back to MatchUp Object Main Page

Running SSIS packages from the Command Line

BP_XXSS_001
For more efficient memory usage, run your saved SSIS package from the command line instead of running directly from visual studio. Example: dtexec.exe /F "c:\fullpath\Package.dtsx" /Rep EWPD > "OptionalCreateLog.log" tests have shown memory usage to be up to 66% less. Running from visual studio will start devenv and DtsDebugHost processes in your Task Manager, whereas command line processing will start DTEXEC.


Optimizing Speed: General

BP_MUXX_001
1. Network data traffic

We recommend that the source data to be processed by local with repect to the installed Melissa Data program. Network permissions, throughput, and in some cases, MatchUps need to access record 'x' to complete consolidation with record 'y', are all potential sources o f a slower process.

2. Source datatype

Some database or file types can be read by the calling language or IDE more efficiently than others. Matching your environment to the most efficient file type requires trial and error testing by the developer.

3. Hardware :

It goes without saying that the more hardware you dedicate to a process, the faster it will run. However, many processes can not take advantage of additional hardware, or show diminishing returns. For example, varied zip code demographics may be able to use multi-processors to process individual clusters of records, but a database of the same zip code may not. Additionally, for the above factors, hardware may not be the overriding factor governing a fast process, ie. a good matchcode may be the most important factor.


Optimizing Speed: Matchcodes

BP_MUXX_002
1. Matchcodes: Components: Before MatchUp dedupes, it clusters records into groups of possible matches. If your matchcode does not have any components in every used combination, it can not place records into those sub group clusters. In general, the greater number of components used in every combination, the faster the process will be.

2. Matchcode: Fuzzy: MatchUp has an extensive list of Fuzzy Options. Some are performed during the key building process (ie. Soundex) and do not slow the process down. Others are performed on the constructed matchkeys (ie. Near, Jaro, etc.) and therefore slow down the process. If the latter types are required by your process, place them in the component order below an exact component which is also used in every combintation if possible.

Order of Components in Matchcode

BP_MUXX_002
Although the Matchcode Editor interface lets you place the components in any order, the Object does have a few restrictions when calling the AddMapping methods. Namely, Address Line AddMappings must be called last, even if you have added another component after the Address matchcode components. Calling AddMappings in the wrong order will throw an error, therefore when using the matchcode editor, place your address components last. The exception would be rare cases where address components are used in every specified column, but a different component is not used in all combinations (specified columns).


Using Efficient SetUserInfo

BP_MUOB_001
By default, SetUserInfo, the unique identifier attached to built match key is 1024 bytes, allowing the developer to pass an advanced custom identifier, or even source data to the key file. While this can have data handling advantages, this will cause the key file and temporary sort files to grow much larger than needed for most jobs, and will slow down the process. A new reserve funcion has been added, allowing the user to override the default UserInfo size. Example...

ReadWrite->SetReserved("UserInfoSize","12");

Our tests have shown this to reduce the key and temporary disk storage usage to decrease by a factor of 10 and the processing time to decrease by as much as 60%


Back to MatchUp Object Main Page