SSIS:MatchUp:Golden Record
The Golden Record Selection Option is used for selecting the best record amongst a group of duplicate records as your remaining Master Record. This allows for having a single accurate representation of each entity in your data. Thus in a group of duplicate records for example:
Name | Address | Last Update |
---|---|---|
John Doe | 123 Main St | 10/25/2003 |
John Doe | 123 Main St | 4/16/2012 |
John Doe | 123 Main St | 8/6/2008 |
We can prioritize the selection of the Golden Record by latest update. The selected remaining record will be:
Name | Address | Last Update |
---|---|---|
John Doe | 123 Main St | 4/16/2012 |
The component allows for using pre-defined algorithms that are commonly used to select the Golden Record, or you can also write your own custom expressions for flexible rule generation.
There are four algorithms that can be modified by clicking the ".." button to the right of its respective algorithm.
Multiple algorithms can be selected to evaluate the Golden Record. The algorithms are ordered from 1-4 in order of priority. Select the rule and click either the up arrow or down arrow to change this order. Golden Record selection will first be evaluated by the first selected algorithm in the list. If a Golden Record cannot be determined using the first selected algorithm, it will automatically cascade down to the second selected algorithm to re-evaluate the ties. The component will continue to cascade through all selected algorithms until a single golden record can be evaluated or until all the selected algorithms have been applied.
The four algorithms are Last Updated, Most Complete, Data Quality Score, and Custom Expression.
Last Updated
This algorithm allows for selecting the Golden Record based on the newest or oldest Date/Time.
- "Last Updated" Column
- Input requires a valid column of type Date/Time (DT_Date).
- Oldest/Newest Date
- Select whether the Golden Record should be selected based on which one has the Oldest/Newest Date.
- Example
- For the following data:
Name Address Last Update John Doe 123 Main St 10/25/2003 John Doe 123 Main St 4/16/2012 John Doe 123 Main St 8/6/2008
- Selecting the Newest Date for the “Last Update” column will yield the following results:
Name Address Last Update Results John Doe 123 Main St 4/16/2012 MS02 John Doe 123 Main St 8/6/2008 MS03 John Doe 123 Main St 10/25/2003 MS03
- The record containing MS02 is the selected Golden Record.
Most Complete
This algorithm allows for selecting the Golden Record based on the completeness/length of information.
The options allows for selecting multiple columns for evaluation. In a group of duplicate records, the Golden Record will be selected based on the selected columns has the longest concatenated string.
Example:
For the following data:
Name | Address |
---|---|
Joseph Doe 123 Main St | |
Joe Doe 123 Main St | |
J. Doe 123 Main St |
Selecting the Name and Address columns will yield the following results:
Name | Address | Results |
---|---|---|
Joseph Doe 123 Main St MS02 | ||
Joe Doe 123 Main St MS03 | ||
J. Doe 123 Main St MS03 |
The record containing MS02 is the selected Golden Record.
Data Quality Score
This algorithm allows for selecting the Golden Record based on the Quality of Your Data. The Data Quality Score is required to be used in tandem with the Melissa Data Contact Verify Component (CVC).
For more information on the Contact Verify Component, click here: http://www.melissadata.com/webhelp/ssis/cvc/index.htm
- Column Containing Result Codes
- Input requires a valid column containing the Melissa Data Returned Result Codes returned by the CVC.
- For more information on the CVC Results, click here
- http://www.melissadata.com/webhelp/ssis/cvc/index.htm#output/customoutputfilters.htm
Rule
There are a total of 6 rules that can be utilized.
- Check the box next to the rule(s) that you wish to use.
- Rules are also ordered from 1-6 in order of priority. Select the rule and click either the up arrow or down arrow to change this order.
- Address Quality Score
- The Address Quality Score evaluates your Address Results and selects which one has the best address composition and most deliverable.
- Name Quality Score
- The Name Quality Score evaluates your Name Results and selects which one has the most probable real person’s name.
- Phone Quality Score
- The Phone Quality Score evaluates your Phone Results and selects which one has the most valid 10-digit phone number.
- Email Quality Score
- The Email Quality Score evaluates your Email Results and selects which one has the most correct and deliverable email address.
- GeoCode Quality Score
- The GeoCode Quality Score evaluates your GeoCode Results and selects which one has the highest level of GeoCoding accuracy.
- Data Quality Score
- The Data Quality Score evaluates all possible results combined from Address, Email, Name, Phone and GeoCode in a single column.