Profiler Object:Column-Based Statistics

From Melissa Data Wiki
Revision as of 17:37, 22 September 2023 by Admin (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

← Profiler Object Reference

Profiler Object Interface Navigation
Initialization
Object Information
Enumeration Listing and Parsing
Column Specification
Initiate Profiling
Data Input
Profiling
Table-Based Statistics
Column-Based Statistics
Column-Based String Statistics
Column-Based Numeric Statistics
Column-Based Date/Time Statistics
Column-Based Name Statistics
Column-Based State/Province Statistics
Column-Based Zip/Postal Code Statistics
Column-Based Country Statistics
Column-Based Email Statistics
Column-Based Phone Statistics
Frequency Iterators
Column-Based Value Frequency Table Iteration
Column-Based Value Length Frequency Table Iteration
Column-Based Value Pattern Table Iteration
Column-Based Value Date/Time Table Iteration
Column-Based Value SoundEx Table Iteration
Column-Based Word Table Iteration
Column-Based Word Length Table Iteration
Result Codes
Profiler Object Result Codes
Result Codes


The column-based statistics should only be retrieved after ProfileData is called. These functions return column-specific details.

GetColumnInferredDataType

This function returns a column’s inferred data type in ProfilerDataType form. See ProfilerDataType Enumerations for details. The inferred data type is used to determine if a prevalent data type is seen for the majority of values in this column. For a deviant value to be returned (i.e., a value that differs from the user-specified data type), the count of that detected data type must exceed all other detected data type counts by at least 20%.

SetRightFielderAnalysis must be set to true to get inferred datatype analysis.

This function takes one parameter.

Parameters

Name Data Type Description
ColumnName String Column Name to get the column’s inferred data type.


Syntax profiler->GetColumnInferredDataType(columnNameStr);
C ProfilerDataType = mdProfilerGetColumnInferredDataType(profiler, columnNameStr);
.Net ProfilerDataType = profiler.GetColumnInferredDataType(columnNameStr);


GetColumnInferredColumnType

Along with the data type analysis, the Profiler will also analyses the column type and return.

ProfilerColumnType enumerator that is the closest to what the column type is. This could be different than the column type specified by the user. Possible values that this function could return are: ColumnTypeInt1, ColumnTypeReal8, ColumnTypeBoolean, ColumnTypeDate, etc.

SetRightFielderAnalysis must be set to true to get inferred datatype analysis.

This function takes one parameter.

Parameters

Name Data Type Description
FieldName String A string value representing the Field Name.


Syntax ProfilerColumnType = profiler->GetColumnInferredColumnType(FieldName);
C ProfilerColumnType = mdProfilerGetColumnInferredColumnType(profiler, FieldName);
.Net ProfilerColumnType = profiler.GetColumnInferredColumnType(FieldName);


GetColumnSortation

This function returns a column's natural sortation. This is the sortation order seen in the values as they were input. In order for a column to be considered near-sorted, no more than 10% of the input values must be out of order.

This function accepts one parameter.

Parameters

Name Data Type Description
ColumnName String Column Name to get the column’s sortation information.


Return Value

This function returns one of the following enumerations.

Enum Value Sortation Type Description
0 SortUnknown No sortation detected.
1 SortStringAscending Values are sorted ascending, using a string comparison.
2 SortStringDescending Values are sorted descending, using a string comparison.
3 SortNumericAscending Values are sorted ascending, using a numeric comparison.
4 SortNumericDescending Values are sorted descending, using a numeric comparison.
5 SortDateAscending Values are sorted ascending, using date/time comparison.
6 SortDateDescending Values are sorted descending, using date/time comparison.


Syntax profiler->GetColumnSortation(columnNameStr);
C Sortation = mdProfilerGetColumnSortation(profiler, columnNameStr);
.Net Sortation = profiler.GetColumnSortation(columnNameStr);


GetColumnSortationPercent

This function returns a percentage indicating how well a column is sorted. This is only reported for columns where GetColumnSortation returned a value other than SortUnknown. The sortation percentage is determined by counting the number of re-ordering values that would be required to put the list of values into a sorted state, and then dividing this value by the worst-case value (i.e., re-ordering required for a reverse-sorted list.)

This function takes one parameter.

Parameters

Name Data Type Description
ColumnName String Column Name to get the sortation percentage


Syntax profiler->GetColumnSortationPercent(columnNameStr);
C double = mdProfilerGetColumnSortationPercent(profiler, columnNameStr);
.Net double = profiler.GetColumnSortationPercent(columnNameStr);


GetColumnMostPopularCount

This function returns the number of records that contain the most popular value.

This function takes one parameter

Parameters

Name Data Type Description
ColumnName String Column Name to get the sortation percentage


Syntax profiler->GetColumnMostPopularCount(columnNameStr);
C integer = mdProfilerGetColumnMostPopularCount(profiler, columnNameStr);
.Net integer = profiler.GetColumnSortationMostPopulatCount(columnNameStr);


GetColumnDistinctCount

This returns the number of distinct values in a column. Distinct values may have duplicates. A group of duplicate values is counted as 1 distinct value.

For example, the table below has a distinct count of 11 and a unique count of 7. Each state is a distinct value, even if it has duplicates in the data frequency.

State Data Frequency
AK 1
OK 1
OR 1
FL 1
NY 1
AL 1
WO 1
MO 2
IL 3
WI 3
MI 4

This function takes one parameter.

Parameters

Name Data Type Description
ColumnName String Column Name to get the Distinct count.


Syntax profiler->GetColumnDistinctCount(columnNameStr);
C integer = mdProfilerGetColumnDistinctCount(profiler, columnNameStr);
.Net integer = profiler.GetColumnDistinctCount(columnNameStr);


GetColumnUniqueCount

This function returns the number of unique values in the specified column. Unique values do not have duplicates.

For example, the table below has a distinct count of 11 and a unique count of 7. Each state with a single entry in the data frequency is a unique value. States with duplicate values are not counted.

State Data Frequency
AK 1
OK 1
OR 1
FL 1
NY 1
AL 1
WO 1
MO 2
IL 3
WI 3
MI 4

This function takes one parameter.

Parameters

Name Data Type Description
ColumnName String Column Name to get the unique count.


Syntax profiler->GetColumnUniqueCount(columnNameStr);
C integer = mdProfilerGetColumnUniqueCount(profiler, columnNameStr);
.Net integer = profiler.GetColumnUniqueCount(columnNameStr);


GetColumnDefaultValueCount

This function returns the number of records that contained the default value set with the SetColumnDefaultValue function.

This function accepts one parameter.

Parameters

Name Data Type Description
ColumnName String Column Name to get the default value count


Syntax profiler->GetColumnDefaultValueCount(columnNameStr);
C integer = mdProfilerGetColumnDefaultValueCount(profiler, columnNameStr);
.Net integer = profiler.GetColumnDefaultValueCount(columnNameStr);


GetColumnBelowRangeCount

This function returns the number of records with values that were below the lower bound set with the SetColumnValueRange function.

This function accepts one parameter.

Parameters

Name Data Type Description
ColumnName String Column Name to get the below range count


Syntax profiler->GetColumnBelowRangeCount(ColumnNameStr);
C integer = mdProfilerGetColumnBelowRangeCount(profiler, columnNameStr);
.Net integer = profiler.GetColumnBelowRangeCount(columnNameStr);


GetColumnAboveRangeCount

This function returns the number of records with values that were above the upper bound set with the SetColumnValueRange function.

This funtion accepts one parameter.

Parameters

Name Data Type Description
ColumnName String Column Name to get the above range count


Syntax profiler->GetColumnAboveRangeCount(columnNameStr);
C integer = mdProfilerGetColumnAboveRangeCount(profiler, columnNameStr);
.Net integer = profiler.GetColumnAboveRangeCount(ColumnNameStr);


GetColumnAboveSizeCount

This function returns the number of records with values that were longer than the length set with the SetColumnSize function.

This function takes one parameter

Parameters

Name Data Type Description
ColumnName String Column Name to get the above size count.


Syntax profiler->GetColumnAboveSizeCount(columnNameStr);
C integer = mdProfilerGetColumnAboveSizeCount(profiler, columnNameStr);
.Net integer = profiler.GetColumnAboveSizeCount(columnNameStr);


GetColumnAbovePrecisionCount

This function returns the number of records with numeric values that have a precision greater than the precision set with the SetColumnPrecision function.

This function takes one parameter.

Parameters

Name Data Type Description
ColumnName String Column Name to get the above precision count.


Syntax profiler->GetColumnAbovePrecisionCount(ColumnNameStr);
C integer = mdProfilerGetColumnAbovePrecisionCount(profiler, columnNameStr);
.Net integer = profiler.GetColumnAbovePrecisionCount(columnNameStr);


GetColumnAboveScaleCount

This function returns the number of records with numeric values that have a scale larger than the scale set with the SetColumnScale function.

Parameters

Name Data Type Description
ColumnName String Column Name to get the above scale count


Syntax profiler->GetColumnAboveScaleCount(ColumnNameStr);
C integer = mdProfilerGetColumnAboveScaleCount(profiler, columnNameStr);
.Net integer = profiler.GetColumnAboveScaleCount(columnNameStr);


GetColumnInvalidRegExCount

This function returns the number of records with values that did not match any of the regular expressions set with the SetColumnCustomPattern function.

This function takes one parameter.

Parameters

Name Data Type Description
ColumnName String Column Name to get the count of records that have not matched with the regular expression set with the SetColumnCustomPattern function.


Syntax profiler->GetColumnInvalidRegExCount(ColumnNameStr);
C integer = mdProfilerGetColumnInvalidRegExCountCount(profiler, columnNameStr);
.Net integer = profiler.GetColumnInvalidRegExCount(columnNameStr);


GetColumnEmptyCount

This function returns the number of records with empty values. An empty value is not Null, can contain spaces, and has no string or value.

This function takes one parameter.

Parameters

Name Data Type Description
ColumnName String Column Name to get the empty count.


Syntax profiler->GetColumnEmptyCount(columnNameStr);
C integer = mdProfilerGetColumnEmptyCount(profiler, columnNameStr);
.Net integer = profiler.GetColumnEmptyCount(ColumnNameStr);


GetColumnNullCount

This function returns the number of records with NULL values.

Parameters

Name Data Type Description
ColumnName String Column Name to get the column null count.


Syntax profiler->GetColumnNullCount(ColumnNameStr);
C integer = mdProfilerGetColumnNullCount(profiler, columnNameStr);
.Net integer = profiler.GetColumnNullCount(columnNameStr);


GetColumnInvalidDataCount

This function returns the number of records where the value is inconsistent with the column type set with the AddColumn function. (e.g., If you set a column's column type to ColumnTypeInt1 and the input value is "John Smith", that’s considered Invalid Data and therefore the counter for this function will be incremented.)

This function takes one parameter.

Parameters

Name Data Type Description
ColumnName String Column Name to get the column invalid data count.


Syntax profiler->GetColumnInvalidDataCount(ColumnNameStr);
C integer = mdProfilerGetColumnInvalidDataCount(profiler, columnNameStr);
.Net integer = profiler.GetColumnInvalidDataCount(columnNameStr);


GetColumnInvalidUTF8Count

This function returns the number of records containing an invalid UTF-8 sequence.

This function takes one parameter.

Parameters

Name Data Type Description
ColumnName String Column Name to get the invalid UTF8 count.


Syntax profiler->GetColumnInvalidUTF8Count(ColumnNameStr);
C integer = mdProfilerGetColumnInvaidUTF8Count(profiler, columnNameStr);
.Net integer = profiler.GetColumnInvalidUTF8Count(columnNameStr);


GetColumnNonPrintingCharCount

This function returns the number of records containing non-printable characters. Printable characters are letters, numbers, punctuation, etc.

This function takes one parameter.

Parameters

Name Data Type Description
ColumnName String Column Name to get non-printing character count.


Syntax profiler->GetColumnNonPrintingCharCount(columnNameStr);
C integer = mdProfilerGetColumnNonPrintingCharCount(profiler, columnNameStr);
.Net integer = profiler.GetColumnNonPrintingCharCount(columnNameStr);


GetColumnDiacriticCharCount

This function returns the number of records containing diacritic characters. Diacritic characters are symbols added to letters of the alphabet to indicate different pronunciation than the letters are usually given.

Parameters

Name Data Type Description
ColumnName String Column Name to get diacritic character count.


Syntax profiler->GetColumnDiacriticCharCount(ColumnNameStr);
C integer = mdProfilerGetColumnDiacriticCharCount(profiler, ColumnNameStr);
.Net integer = profiler.GetColumnDiacriticCharCount(ColumnNameStr);


GetColumnForeignCharCount

This function returns the number of records containing foreign characters. All diacritic characters are foreign characters, but not all foreign characters are diacritics.

This function takes one parameter.

Parameters

Name Data Type Description
ColumnName String Column Name to get foreign character count


Syntax profiler->GetColumnForeignCharCount(ColumnNameStr);
C integer = mdProfilerGetColumnForeignCharCount(profiler, ColumnNameStr);
.Net integer = profiler.GetColumnForeignCharCount(ColumnNameStr);


GetColumnAlphaOnlyCount

This function returns the number of records that contain only alphabetic characters. This include spaces and punctuation but not numbers or symbols.

This function takes one parameter.

Parameters

Name Data Type Description
ColumnName String Column Name to get Alphabetic character count.


Syntax profiler->GetColumnAlphaOnlyCount(columnNameStr);
C integer = mdProfilerGetColumnAlphaOnlyCount(profiler, columnNameStr);
.Net integer = profiler.GetColumnAlphaOnlyCount(columnNameStr);


GetColumnNumericOnlyCount

This function returns the number of records that contain only numeric characters. This includes spaces and punctuation but not alphabetic characters or symbols.

This function takes one parameter.

Parameters

Name Data Type Description
ColumnName String Column Name to get Numeric character count.


Syntax profiler->GetColumnNumericOnlyCount(columnNameStr);
C integer = mdProfilerGetColumnNumericOnlyCount(profiler, columnNameStr);
.Net integer = profiler.GetColumnNumericOnlyCount(columnNameStr);


GetColumnAlphaNumericCount

This function returns the number of records that contain both alphabetic and numeric characters. This includes spaces and punctuation.

This function takes one parameter.

Parameters

Name Data Type Description
ColumnName String Column Name to get alphanumeric character count.


Syntax profiler->GetColumnAlphaNumericCount(columnNameStr);
C integer = mdProfilerGetColumnAlphaNumericCount(profiler, columnNameStr);
.Net integer = profiler.GetColumnAlphaNumericCount(columnNameStr);


GetColumnUpperCaseOnlyCount

This function returns the number of records that only contain upper-case alphabetic characters. This includes spaces, punctuation, and numbers.

This function takes one parameter.

Parameters

Name Data Type Description
ColumnName String Column Name to get upper-case only count.


Syntax profiler->GetColumnUpperCaseOnlyCount(columnNameStr);
C integer = mdProfilerGetColumnUpperCaseOnlyCount(profiler, columnNameStr);
.Net integer = profiler.GetColumnUpperCaseOnlyCount(columnNameStr);


GetColumnLowerCaseOnlyCount

This function returns the number of records that only contain lower-case alphabetic characters. This includes spaces, punctuation, and numbers.

This function takes one parameter.

Parameters

Name Data Type Description
ColumnName String Column Name to get lower-case only count.


Syntax profiler->GetColumnLowerCaseOnlyCount(columnNameStr);
C integer = mdProfilerGetColumnLowerCaseOnlyCount(profiler, columnNameStr);
.Net integer = profiler.GetColumnLowerCaseOnlyCount(columnNameStr);


GetColumnMixedCaseCount

This function returns the number of records that contain mixed-case (both upper and lower-case characters) alphabetic characters. This includes spaces, punctuation, and numbers.

This function takes one parameter.

Parameters

Name Data Type Description
ColumnName String Column Name to get mixed-case chatacters count.


Syntax profiler->GetColumnMixedCaseCount(columnNameStr);
C integer = mdProfilerGetColumnMixedCaseCount(profiler, columnNameStr);
.Net integer = profiler.GetColumnMixedCaseCount(columnNameStr);


GetColumnSingleSpaceCount

This function returns the number of records that contain multiple words separated only by a single space.

This function takes one parameter.

Parameters

Name Data Type Description
ColumnName String Column Name to get single space count.


Syntax profiler->GetColumnSinlgeSpaceCount(columnNameStr);
C integer = mdProfilerGetColumnSingleSpaceCount(profiler, columnNameStr);
.Net integer = profiler.GetColumnSingleSpaceCount(columnNameStr);


GetColumnMultiSpaceCount

This function returns the number of records that contain multiple words separated by more than one space, at-least once.

This function takes one parameter.

Parameters

Name Data Type Description
ColumnName String Column Name to get multi-space count.


Syntax profiler->GetColumnMultiSpaceCount(columnNameStr);
C integer = mdProfilerGetColumnMultiSpaceCount(profiler, columnNameStr);
.Net integer = profiler.GetColumnMultiSpaceCount(columnNameStr);


GetColumnLeadingSpaceCount

This function returns the number of records that contain one or more leading space.

This function takes one parameter.

Parameters

Name Data Type Description
ColumnName String Column Name to get leading space count.


Syntax profiler->GetColumnLeadingSpaceCount(columnNameStr);
C integer = mdProfilerGetColumnLeadingSpaceCount(profiler, columnNameStr);
.Net integer = profiler.GetColumnLeadingSpaceCount(columnNameStr);


GetColumnTrailingSpaceCount

This function returns the number of records that contain one or more trailing spaces.

This function takes one parameter.

Parameters

Name Data Type Description
ColumnName String Column Name to get trailing space count.


Syntax profiler->GetColumnTrailingSpaceCount(columnNameStr);
C integer = mdProfilerGetColumnTrailingSpaceCount(profiler, columnNameStr);
.Net integer = profiler.GetColumnTrailingSpaceCount(columnNameStr);


GetColumnMaxSpaces

This function returns the maximum number of spaces that occurred between words in the column values.

This function takes one parameter.

Parameters

Name Data Type Description
ColumnName String Column Name to get maximum space count.


Syntax profiler->GetColumnMaxSpaces(ColumnNameStr);
C integer = mdProfilerGetColumnMaxSpaces(profiler, columnNameStr);
.Net integer = profiler.GetColumnMaxSpaces(columnNameStr);


GetColumnMinSpaces

This function returns the minimum number of spaces that occurred between words in the column values.

This function takes one parameter.

Parameters

Name Data Type Description
ColumnName String Column Name to get minimum space count.


Syntax profiler->GetColumnMinSpaces(columnNameStr);
C integer = mdProfilerGetColumnMinSpaces(profiler, columnNameStr);
.Net integer = profiler.GetColumnMinSpaces(columnNameStr);


GetColumnTotalSpaces

This function returns the total number of spaces that occurred between words in the column values. This doesn’t include leading and trailing spaces.

This function takes one parameter.

Parameters

Name Data Type Description
ColumnName String Column Name to get total spaces.


Syntax profiler->GetColumnTotalSpaces(columnNameStr);
C integer = mdProfilerGetColumnTotalSpaces(profiler, columnNameStr);
.Net integer = profiler.GetColumnTotalSpaces(columnNameStr);


GetColumnTotalWordBreaks

This function returns the total number of word breaks found in the column values.

This function takes one parameter.

Parameters

Name Data Type Description
ColumnName String Column Name to get total word breaks.


Syntax profiler->GetColumnTotalWordBreaks(ColumnNameStr);
C integer = mdProfilerGetColumnTotalWordBreaks(profiler, columnNameStr);
.Net integer = profiler.GetColumnTotalWordBreaks(columnNameStr);


GetColumnAvgSpaces

This function returns the average number of spaces found between words in the column values.

This function takes one parameter.

Parameters

Name Data Type Description
ColumnName String Column Name to get average spaces.


Syntax profiler->GetColumnAvgSpaces(ColumnNameStr);
C double = mdProfilerGetColumnAvgSpaces(profiler, columnNameStr);
.Net double = profiler.GetColumnAvgSpaces(columnNameStr);


GetColumnDecorationCharCount

This function returns the number of records with the values containing decorative characters. Decorative characters appear at the beginning or end of the value, and are tab, comma, pipe, and double-quote. This count is useful because it often indicates that field delimiters may have somehow found their way into the data stream.

Decorative character analysis is meant to detect bad data imports by flagging (returning result code QS07) and counting any delimiters that made their way to the values of your table.

This function takes one parameter.

Parameters

Name Data Type Description
ColumnName String Column Name to decorative character count.


Syntax profiler->GetColumnDecorationCharCount(columnNameStr);
C integer = mdProfilerGetColumnDecorationCharCount(profiler, columnNameStr);
.Net integer = profiler.GetColumnDecorationCharCount(columnNameStr);


GetColumnProfanityCount

This function returns the number of records with values containing profanity.

This function takes one parameter.

Parameters

Name Data Type Description
ColumnName String Column Name to get profanity count.


Syntax profiler->GetColumnProfanityCount(columnNameStr);
C integer = mdProfilerGetColumnProfanityCount(profiler, columnNameStr);
.Net integer = profiler.GetColumnProfanityCount(columnNameStr);


GetColumnInconsistentDataCount

This function returns the number of records with values that are inconsistent with the Data Type you set with the AddColumn function. Record inconsistency is evaluated by analyzing a record's column values and determining what type of data each value represents. This determination is not an absolute, as a value can often be mistaken for another type.

This function takes one parameter.

Parameters

Name Data Type Description
ColumnName String Column Name to get Inconsistent data count.


Syntax profiler->GetColumnInconsistentDataCount(columnNameStr);
C integer = mdProfilerGetColumnInconsistentDataCount(profiler, columnNameStr);
.Net integer = profiler.GetColumnInconsistentDataCount(columnNameStr);