Pentaho/Contact Zone:Cleanser:Expression Builder

From Melissa Data Wiki
Jump to navigation Jump to search

← Contact Zone

← Data Quality Components for Pentaho

Cleanser Navigation
Overview
Tutorial
Advanced Configuration
Cleanser Tabs
Input Fields
Output Filter
Expression Builder
Result Codes



The expression builder lets you create your own custom expressions.


PENT Cleanser ExpressionBuilder.png


Column Names

The Columns names section lists the columns available from the input.


Variables

The Variables section lists the following system variables:


Variable Name Data Type Description
Internal.Kettle.Build.Version
Internal.Kettle.Build.Date
Internal.Kettle.Version


String Functions/Operators

The String Functions/Operators section lists the following functions/operators:

Operators

Operator Description
Not Equal (!=) Performs a comparison to determine if two expressions are not equal.
Less (<) Performs a comparison to determine if the first expression is less than the second one.
Greater (>) Performs a comparison to determine if the first expression is greater than the second one.
Equal (==) Performs a comparison to determine if two expressions are equal.
Concatenate (+) Concatenates two expressions.
Greater or Equal (>=) Performs a comparison to determine if the first expression is greater than or equal to the second one.
Less or Equal (<=) Performs a comparison to determine if the first expression is less than or equal to the second one.


Functions

Function Description
Lower() Returns a character expression after converting uppercase characters to lowercase characters.
Left() Returns part of a character string starting at a specified number of characters from the left.
LTrim() Returns a character expression after removing leading spaces.
Len() Returns the number of characters in a character expression.
PadLeft() Retruns a character expression after padding a specified string to the left of a specified character.
FindString() Returns the one-based index of the specified occurrence of a character string within an expression.
Upper() Returns a character expression after converting lowercase characters to uppercase characters.
Trim() Returns a character expression after removing leading and trailing spaces.
SubString() Returns a part of a character expression.
Reverse() Returns a character expression in reverse order.
PadRight() Retruns a character expression after padding a specified string to the right of a specified character.
Right() Returns part of a character string starting at a specified number of characters from the right.
RTrim() Returns a character expression after removing trailing spaces.
Replace() Returns a character expression after replacing a string within the expression with either a different string or an empty string.
Replicate() Returns a character expression, replicated a specified number of times.


Numeric Functions/Operators

The Numeric Functions/Operators section lists the following functions/operators:

Operators

Operator Description
Multiply (*) Multiplies two numeric expressions.
Modulus (%) Provides the integer remainder after dividing the first numeric expression by the second one.
Not Equal (!=) Performs a comparison to determine if two expressions are not equal.
Less (<) Performs a comparison to determine if the first expression is less than the second one.
Divide (/) Divides the first numeric expression by the second one.
Greater (>) Performs a comparison to determine if the first expression is greater than the second one.
Subtract (-) Subtracts the second numeric expression from the first one.
Equal (==) Performs a comparison to determine if two expressions are equal.
Add (+) Adds two numeric expressions.
Greater or Equal (>=) Performs a comparison to determine if the first expression is greater than or equal to the second one.
Less or Equal (<=) Performs a comparison to determine if the first expression is less than or equal to the second one.


Functions

Function Description
Abs() Returns the absolute, positive value of a numeric expression.
Sqrt() Returns the square root of a numeric expression.
Floor() Returns the largest integer that is less than or equal to a numeric expression.
Round() Returns a numeric expression that is rounded to the specified length or precision.
Sign() Returns the positive (+), negative (-), or zero (0) sign of a numeric expression.
Ln() Returns the natural logarithm of a numeric expression.
Log() Returns the base-10 logarithm of a numeric expression.
Exp() Returns the exponent to base e of the specified expression.
Ceiling() Returns the smallest integer that is greater than or equal to a numeric expression.
Square() Returns the square of a numeric expression.
Power() Returns the result of raising a numeric expression to a power.


Boolean Functions/Operators

The Boolean Functions/Operators section lists the following functions/operators:

Operators

Operator Description
Not Equal (!=) Performs a comparison to determine if two expressions are not equal.
Conditional (?:) Returns one of two expressions based on the evaluation of a Boolean expression.
Logical NOT (!) Negates a Boolean operand.
Logical AND (&&) Performs a logical AND operation.
Equal (==) Performs a comparison to determine if two expressions are equal.
Logical OR (||) Performs a logical OR operation.


Functions

Function Description
IsNull() Returns a Boolean result based on whether an expression is null.


Date/Time Functions/Operators

The Date/Time Functions/Operators section lists the following functions/operators:

Operators

Operator Description
Not Equal (!=) Performs a comparison to determine if two expressions are not equal.
Less (<) Performs a comparison to determine if the first expression is less than the second one.
Greater (>) Performs a comparison to determine if the first expression is greater than the second one.
Equal (==) Performs a comparison to determine if two expressions are equal.
Greater or Equal (>=) Performs a comparison to determine if the first expression is greater than or equal to the second one.
Less or Equal (<=) Performs a comparison to determine if the first expression is less than or equal to the second one.


Functions

Function Description
Day() Returns an integer that represents the day of the specified date.
GetUTCDate() Returns the current date of the system in UTC time (Universal Time Coordinate or Greenwich Mean Time).
Year() Returns an integer that represents the year of the specified date.
DateAdd() Returns a new DT_DBTIMESTAMP value by adding a date or time interval to a specified date.
DateDiff() Returns the number of date and time boundaries crossed between two specified dates.
GetDate() Returns the current date of the system.
DatePart() Returns an integer representing a datepart of a date.
Month() Returns an integer that represents the month of the specified date.


Misc. Functions/Operators

The Misc. Functions/Operators section lists the following functions/operators:

Operators

Operator Description
Conditional(?:) Returns one of two expressions based on the evaluation of a Boolean expression.


Functions

Function Description
GeoCodeScore() Returns a quality score for the geocode results codes.
Null() Returns a null value of a requested data type.
DataQualityScore() Returns a quality score for the results codes.
NameScore() Returns a quality score for the name results codes.
EmailScore() Returns a quality score for the email results codes.
PhoneScore() Returns a quality score for the phone results codes.
IsNull() Returns a Boolean result based on whether an expression is null.
AddressScore() Returns a quality score for the address results codes.


DataType Casts

The DataType Casts section lists the following system variables:


Data Type Description
DT_STR A null-terminated ANSI/MBCS character string with a maximum length of 8000 characters. (If a column value contains additional null terminators, the string will be truncated at the occurrence of the first null.)
DT_DATE A date structure that consists of year, month, day, hour, minute, seconds, and fractional seconds. The fractional seconds have a fixed scale of 7 digits.

The DT_DATE data type is implemented using an 8-byte floating-point number. Days are represented by whole number increments, starting with 30 December 1899, and midnight as time zero. Hour values are expressed as the absolute value of the fractional part of the number. However, a floating point value cannot represent all real values; therefore, there are limits on the range of dates that can be presented in DT_DATE. On the other hand, DT_DBTIMESTAMP is represented by a structure that internally has individual fields for year, month, day, hours, minutes, seconds, and milliseconds. This data type has larger limits on ranges of the dates it can present.

DT_DECIMAL An exact numeric value with a fixed precision and a fixed scale. This data type is a 12-byte unsigned integer with a separate sign, a scale of 0 to 28, and a maximum precision of 29.
DT_NUMERIC An exact numeric value with a fixed precision and scale. This data type is a 16-byte unsigned integer with a separate sign, a scale of 0 - 38, and a maximum precision of 38.
DT_BOOL A Boolean value.
DT_DBTIMESTAMP A timestamp structure that consists of year, month, day, hour, minute, second, and fractional seconds. The fractional seconds have a fixed scale of 3 digits.