MatchUp Hub:Data Considerations

From Melissa Data Wiki
Revision as of 22:18, 25 September 2018 by Admin (talk | contribs) (Created page with "← MatchUp Hub ==Data Considerations== After making sure your environment is setup correctly to run MatchUp ([[MatchUp Hub:Environment#Evaluation Areas|En...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

← MatchUp Hub

Data Considerations

After making sure your environment is setup correctly to run MatchUp (Environment Evaluation Areas) and your matchcode has been evaluated and optimized (Environment) Users can still experience problems with slow processing speeds due to bad data.

In MatchUp, clustering is made possible when we have at least one component common to all used matchcode combinations. Since ZIP5 is used in all matchcode combinations, built keys will be grouped into different clusters based on that datatype. Therefore, if you have a database which contains uneven distribution of ZIP codes, as in the table below, changing your matchcode to include LAST NAME, for example, would create better clustering.

Table 1

RECID LAST NAME ADDRESS CITY ZIP
1 Jones 12 Main Street Boston 02125
2 Smith 57 Maple Lane Boston 02125
3 Connor 34 Summer Street Boston 02125
4 Williams 1 Oak Drive Boston 02125
n *** *** *** 02125


In the case of Table 2, checking your data and an identifying an extensive amount of NULL values can also be a source of clustering issues. You can check this by using one of our Profiler products to check for NULL/Empty values, as well incorrect data types in columns. Passing your data through an address verification service in order to correct empty field values can help fix bad zip/addresses. For other NULL data types we suggest an alternate matchcode before deduping or splitting the data into multiple threads.

Table 2

RECID LAST NAME ADDRESS CITY ZIP
1 Jones 12 Main Street Boston NULL
2 Smith 57 Maple Lane Boston NULL
3 Connor 34 Summer Street Boston NULL
4 Williams 1 Oak Drive Boston NULL
n *** *** *** NULL


Table 3 shows a good example of data that has first been standardized and verified before processing with matchup. Due to the clustering aspect of ZIP codes, the data below will be grouped into two sections which will provide much better processing speeds.

Table 3

RECID LAST NAME ADDRESS CITY ZIP
1 Jones 12 Main Street Boston 02125
2 Smith 57 Maple Road Boston 02125
3 Connors 3 Summer Circle Boston 02121
4 Williams 17 Oak Drive Boston 02121
n *** *** *** ***