Pentaho/Contact Zone:MatchUp Tutorial
← Data Quality Components for Pentaho
The following steps will guide you in the basic usage of MatchUp.
Add Component
To add MatchUp to your project, drag the component onto the Data Flow screen. This will snap the MatchUp Component into your workflow space.
Connect Input
Select a data flow source for your input data. Many formats can be used as sources, including Excel files, flat files or Access Input data sources. Connect this data source to the MatchUp Component by dragging the arrow from your data flow source to the MatchUp Component.
Configure Component
Double click the MatchUp Component to bring up the interface.
Advanced Configuration
Click the Advanced Configuration button on the bottom of the window.
Set up the MatchUp Advanced Configuration. See Advanced Configuration.
Matchcode Tab
Select the matchcode you want to use from the drop-down menu. If you need a customized matchcode, you may create your own by using the Matchcode Editor.
Field Mapping Tab
Designate the input columns to be matched against the lookup columns.
Options Tab
Designate any special naming for the output columns, select the lookup options (e.g. whether to suppress or intersect), and select any Golden Record algorithms.
Source Pass-Through Columns Tab
Select any columns to pass through to the output table unprocessed. If you choose to use Survivor Pass, you can consolidate column data from any matching record(s) into a single output record.
Lookup Pass-Through Columns Tab
Select any columns that should appear in the Lookup output stream.
Output Filter Tab
You can specify the filter from the drop down or you can also create your own custom filter.
Connect Output
Add data destinations for downstream output. Connect the respective output filter pin to the output destination.
Save Settings
Click File and select Save as to save the project.
Run Project
Now, the project is ready to run. It is possible to observe in real time as records flow from your input source through the MatchUp Component and pipe output source depending on the filtering options.