Pentaho:Contact Verify:Phone/Email
Jump to navigation
Jump to search
← Data Quality Components for Pentaho
Contact Verify Navigation | |||||||||
---|---|---|---|---|---|---|---|---|---|
Overview | |||||||||
Tutorial | |||||||||
| |||||||||
| |||||||||
Result Codes |
Input Phone
This is where you map the input columns containing the original phone numbers.
- Phone Number
- The Phone input column requires a 10-digit phone number in a standard format.
Output Phone Components
Use these columns to map output columns for the geographical and parsed phone number data. Because of number portability, the geographic information may not reflect the actual location of the phone number’s owner for wireless or VOIP numbers.
- Phone Number
- The name of the output phone column.
- Format
- Select the format to be used for phone numbers in your data.
- Area Code
- This column returns the Area Code portion of the parsed phone number.
- Prefix
- This column returns the three-digit prefix portion of the parsed phone number.
- Suffix
- This column returns the four-digit suffix portion of the parsed phone number.
- Extension
- If the input phone number contained any extension information, that would be returned by this column.
- Additional Output Columns
- Click the Additional Output Columns... button to map columns for information beyond basic phone number parsing.
Input Email
- Email Address
- This string value must, at the minimum, contain the basic components of an email address: two strings of text separated by a “@” character.
Output Email Components
- Email Address
- This column returns the complete email address, standardized, and corrected according to the options selected in the Email Standardize Options.
- Standardization Options & Additional Output Columns
- Click the Standardization Options & Additional Output Columns... button to control how the CVC corrects and standardizes the email address and map the parsing and information columns.
Output Phone Columns
- City
- This column returns the city associated with the phone number's area code and prefix.
- State/Province
- This column returns the two-character state abbreviation associated with the phone number's area code and prefix.
- County Name
- This column returns the county name for the location associated with the phone number's area code and prefix.
- County FIPS
- This column returns the five-digit county FIPS code associated with the phone number's area code and prefix.
- Country Code
- This column returns the country code associated with the input phone number. This is the two-character abbreviation for the United States or Canada and not the numeric international dialing code.
- Time Zone
- This column returns the name of the time zone where the input area code and prefix are located.
- Time Zone Code
- This column returns a one- or two-digit number code for the time zone where the area code and prefix are located. The number also indicates the number of hours that the time zone is behind UTC/GMT. In other words, Eastern Standard Time has a time zone code of 5, indicating that the Eastern time zone is five hours behind UTC/GMT.
- This number does not indicate differences due to daylight savings time.
Email Standardize Options
Standardization Options
- Standardize Casing
- If this box is checked, the Component will reset the input email address to all lowercase letters. For example, “JSmith@MelissaData.com” would become “jsmith@melissadata.com.”
- Correct Email Syntax
- If this box is checked, the Component will do the following:
- Remove any illegal characters from the address. This would include excess “@” characters.
- Correct misspelled domain names. For example, “yaho.com” would be replaced by “yahoo.com.”
- Correct misspelled top-level domain names. For example, “.con” would be replaced with “.com.”
- Perform Web Service Lookup
- If this box is checked, the Component will attempt to validate the input email address by locating the domain from a compiled and continuously updated list of valid domains. This is slower than a database lookup but potentially more accurate if the domain name is either obscure, new, or no longer valid.
- Perform Database Lookup
- If this box is checked, the domain name is checked against the Email Object’s local database of known valid and invalid domain names. This is faster but may not include recently registered domains.
- Perform Fuzzy Lookup
- If this box is checked, the Component will attempt to validate the input email address by applying fuzzy matching algorithms to the input domain. This is slower than database lookup but potentially more accurate if the domain name contains a common or transposed typo.
- Update Domains
- If this box is checked, the Component will attempt to update the domain name of the email address. One domain name can replace another in cases such as a change in corporate ownership. For example, the domain of subscribers to the @Home cable Internet service was switched from “home.com” to “cox.net.”
- Perform DNS Lookup
- If this box is checked, the Component will attempt to validate the input email address by locating an MX (Mail Exchange) record or an A (Address Name) record for the domain on a DNS server. This is slower than a database lookup but potentially more accurate if the domain name is either obscure or new.
Output Columns
- Mailbox Name
- This column returns the portion of the email address that precedes the “@” character. For “ray@melissadata.com,” this column would return “ray.”
- Domain Name
- This column returns the domain name from the parsed email address, minus the top level domain. For “ray@melissadata.com,” this column would return “melissadata” (without the “.com”).
- Top Level Domain
- This column returns the top level domain (TLD) indicator from the input email address. For “ray@melissadata.com,” this would return the “dot com” portion.
- Top Level Domain Description
- This column returns the official text description associated with the top level domain. Not all TLDs have a description.