Pentaho:Profiler:Analysis Options
Jump to navigation
Jump to search
← Data Quality Components for Pentaho
Profiler Navigation | ||||
---|---|---|---|---|
Overview | ||||
Tutorial | ||||
Advanced Configuration | ||||
| ||||
Output Pins | ||||
Result Codes |
The Analysis Options tab allows for enabling/disabling certain profiling calculations. Disabling unused Analysis Options will become beneficial due to the increase in processing time.
Analysis Options
- Sort Analysis
- This is an analysis of any prevailing sortation for each profiled column. This enables/disables the sortation analysis, which can increase profiling time. This time penalty grows geometrically as more records are added. If you are not interested in this statistic, disable it to decrease your profiling time.
- MatchUp Analysis
- This is an analysis of duplicate record detection. This enables/disables duplicate record detection. Duplicate analysis increases the profiling time by under 5% and ProfileData profiling time by about 30%.
- RightFielder Analysis
- This is an analysis of profiled columns' inferred data type (e.g., Full Name, Address, etc.). This enables/disables inferred data type analysis. This analysis is responsible for the Inconsistent Data and Inferred Data Type statistics. This increases the profiling time by under 10%.
- Data Aggregation
- This is an analysis of aggregate data determination (e.g., averages, median, quartiles, etc.). This enables/disables all forms of aggregation and value gathering. Any statistic that cannot be determined incrementally (for example, median, population standard deviation, etc.) is determined via aggregation. This analysis is also responsible for all value tables (Frequency, Pattern, SoundEx, etc.). All iterators and data aggregation statistics are dependent on this analysis. This increases profiling time by over 90%.
Setup Options
The Setup Options are not required. They are used purely for documentation purposes and will have no impact on profiling results.
- Table Name
- This function sets the user name for a particular run.
- User Name
- This function sets the user name for a particular run.
- Job Name
- This function sets the job name for a particular run.
- Job Description
- This function sets the job description for a particular run.