SSIS:MatchUp:Golden Record

From Melissa Data Wiki
Jump to navigation Jump to search

← SSIS:Data Quality Components

MatchUp Navigation
Overview
Editions
Tutorial
Advanced Configuration
On-Premise
MatchUp Tabs
Matchcode
Field Mapping
Options
Golden Record
Custom Expression Elements
Survivorship/Pass-Through
Lookup Pass-Through Columns
Output Filter
Matchcode Editor
Matchcode Evaluation
Matchcode List
Component List
Component Properties
Algorithms
Matchcodes Overview
Component Combinations
Blank Field Mapping
Mapping
Optimization
Swap Matching
Result Codes
Result Codes



The Golden Record Selection Option is used for selecting the best record amongst a group of duplicate records as your remaining Master Record. This allows for having a single accurate representation of each entity in your data. Thus in a group of duplicate records for example:


Name Address Last Update
John Doe 123 Main St 10/25/2003
John Doe 123 Main St 4/16/2012
John Doe 123 Main St 8/6/2008

We can prioritize the selection of the Golden Record by latest update. The selected remaining record will be:


Name Address Last Update
John Doe 123 Main St 4/16/2012

The component allows for using pre-defined algorithms that are commonly used to select the Golden Record, or you can also write your own custom expressions for flexible rule generation.



There are four algorithms that can be modified by clicking the ".." button to the right of its respective algorithm.

Multiple algorithms can be selected to evaluate the Golden Record. The algorithms are ordered from 1-4 in order of priority. Select the rule and click either the up arrow or down arrow to change this order. Golden Record selection will first be evaluated by the first selected algorithm in the list. If a Golden Record cannot be determined using the first selected algorithm, it will automatically cascade down to the second selected algorithm to re-evaluate the ties. The component will continue to cascade through all selected algorithms until a single golden record can be evaluated or until all the selected algorithms have been applied.

The four algorithms are Last Updated, Most Complete, Data Quality Score, and Custom Expression.


Last Updated

This algorithm allows for selecting the Golden Record based on the newest or oldest Date/Time.



"Last Updated" Column
Input requires a valid column of type Date/Time (DT_Date).
Oldest/Newest Date
Select whether the Golden Record should be selected based on which one has the Oldest/Newest Date.
Example
For the following data:
Name Address Last Update
John Doe 123 Main St 10/25/2003
John Doe 123 Main St 4/16/2012
John Doe 123 Main St 8/6/2008
Selecting the Newest Date for the “Last Update” column will yield the following results:


Name Address Last Update Results
John Doe 123 Main St 4/16/2012 MS02
John Doe 123 Main St 8/6/2008 MS03
John Doe 123 Main St 10/25/2003 MS03


The record containing MS02 is the selected Golden Record.


Most Complete

This algorithm allows for selecting the Golden Record based on the completeness/length of information.



The options allows for selecting multiple columns for evaluation. In a group of duplicate records, the Golden Record will be selected based on the selected columns has the longest concatenated string.

Example:

For the following data:

Name Address
Joseph Doe 123 Main St
Joe Doe 123 Main St
J. Doe 123 Main St


Selecting the Name and Address columns will yield the following results:


Name Address Results
Joseph Doe 123 Main St MS02
Joe Doe 123 Main St MS03
J. Doe 123 Main St MS03

The record containing MS02 is the selected Golden Record.


Data Quality Score

This algorithm allows for selecting the Golden Record based on the Quality of Your Data. The Data Quality Score is required to be used in tandem with the Melissa Data Contact Verify Component (CVC).

For more information on the Contact Verify Component, see CVC Overview.



Column Containing Result Codes
Input requires a valid column containing the Melissa Data Returned Result Codes returned by the CVC.
For more information on the CVC Results, see Custom Output Filters.


Rule

There are a total of 6 rules that can be utilized.

  • Check the box next to the rule(s) that you wish to use.
  • Rules are also ordered from 1-6 in order of priority. Select the rule and click either the up arrow or down arrow to change this order.
Address Quality Score
The Address Quality Score evaluates your Address Results and selects which one has the best address composition and most deliverable.
Name Quality Score
The Name Quality Score evaluates your Name Results and selects which one has the most probable real person’s name.
Phone Quality Score
The Phone Quality Score evaluates your Phone Results and selects which one has the most valid 10-digit phone number.
Email Quality Score
The Email Quality Score evaluates your Email Results and selects which one has the most correct and deliverable email address.
GeoCode Quality Score
The GeoCode Quality Score evaluates your GeoCode Results and selects which one has the highest level of GeoCoding accuracy.
Data Quality Score
The Data Quality Score evaluates all possible results combined from Address, Email, Name, Phone and GeoCode in a single column.


Custom Expression

The Custom Expression Algorithm allows you to set up a custom expression of your own design.



Expression

Use a pre-built expression
To use a pre-built expression, select one from the drop-down list. You may also remove the selected pre-built expression by clicking the self-titled button.
Use the specified expression
Alternatively, you may create your own expression in this field.
  • Clicking the text button will check your custom expression for correct syntax.
  • To save your custom built expression, click the "Save the above expression as a new Pre-Built Expression..." button.

Precedence

Choose one of the two radio buttons here to designate which result takes precendence over another when comparing two results.

  • The result with the Lowest Value
  • The result with the Highest Value

Expression Elements

Double-clicking any element in the expression elements section will insert it into the "Use the specified expression:" field.

For more information, see Custom Expression Elements.

The following elements are available for your use:

For further information on SSIS Operators, see http://msdn.microsoft.com/en-us/library/ms137538(v=sql.100).aspx.

For further information on SSIS Functions, see http://msdn.microsoft.com/en-us/library/ms141671(v=sql.100).aspx.