Global Address Verification:FAQ: Difference between revisions

From Melissa Data Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
(6 intermediate revisions by the same user not shown)
Line 1: Line 1:
[[Global Address Verification|← Global Address Verification]]
[[Global Address Verification|← Global Address Verification]]

==What type of global address information can I send?==
==Result Codes==
:Global Address is designed to take full standard addresses. This means that the address must, in most cases, have a house number, a street (or thoroughfare) and identifiable area data (like postal code and locality). The service will not behave particularly well for partial addresses, point of interest information (like a park or plot of land without a formal address), Directions (e.g. "across the street from ABC Bank") or non-address information (like phone numbers). The reason for this is that address verification relies on having a good address data source behind it in order to confirm its validity, and non-formal address data is not good enough to power our verification solution.
===How do I use Results Codes for the Global Address Web Service?===
:The Global Address Web Service will return a number of codes to inform you of the status of an global address. These codes can be divided into AV codes for status, AE codes for errors, and AC codes for changes.  

===AV Codes===
==What are the minimum input fields required?==
:These codes tell you how good the returned address is. They are slightly different than the AS codes you are used to in our US only products.
:The only field that is really required is the country input. We cannot verify an address without a country. Secondly, almost all addresses must effectively have an AddressLine input. While it is theoretically possible to verify an address with a very unique piece of information like a postal code in certain countries and areas, that is very rare.
:*AV2X denotes an address that has been fully verified.
:*AV1X denotes a partially verified address, but something was invalid.
:*The second number (X) indicates the level of verification to which the output address is valid up to. The possible values from between 1 to 5.

:[[File:TUT WS GlobalAV AVChart.jpg|450px]]
==How should I send in my address input data?==
:Your input address can be sent in a number of ways. Our main advice is to try and send the data you have with as little manipulation as possible if you have a full address. The most common ways to send it are:

:;For example:
:#AV25 indicates the output address was fully verified to the SubPremises(level 5.)
<li>Full address in the AddressLines:</li>
:#AV24 indicates the output address was fully verified to the Premises(level 4.)
<pre style="margin-left:20px;">AddressLine1: 22382 Avenida Empresa
:#AV22 indicates the output address was fully verified to the Locality(level 2.) but we did not have data above the locality.
AddressLine2: Rancho Santa Margarita CA, 92688
:#AV14 indicates the output address was partially verified but is good up to the Premises(level 4.) You can imply that this means something was wrong with the sub-premise.
Ctry: US
:#AV13 indicates the output address was partially verified but is good up to the Thoroughfare(level 3.) You can imply that this means something was wrong with the premise.

====AV24 vs AV25====
<li>Full address in the AddressLines + Parsed Area:</li>
:Both of these codes are similar in that they tell you the address is fully good down to the delivery address. You will get an AV24 if the destination address is a house or building without any sub-premises (therefore level 5 is impossible). AV25 means the address was fully good and that the destination address does contain sub-premises. If you have a apartment complex and you enter a wrong suite, you will get an AV14 (partially good address, good to the premise).
<pre style="margin-left:20px;">AddressLine1: 22382 Avenida Empresa
Locality: Rancho Santa Margarita
AdministrativeArea: CA
PostalCode: 92688
Ctry: US</pre>

====Using the AV Codes====
:'''Note:''' What you want to avoid as much as possible is sending in duplicated information, like this '''<span style="color:red;">BAD example below'''</span>:
:One use of the AV codes is to simply look for AV2. This will indicate that all addresses have been fully verified with no errors up to the data available. However, if you want to ensure all addresses are correct down to the delivery address, you should look for AV24 or AV25.
<pre style="color:red; margin-left:40px;">AddressLine1: 22382 Avenida Empresa
AddressLine2: Rancho Santa Margarita CA, 92688
Locality: Rancho Santa Margarita
AdministrativeArea: CA
PostalCode: 92688
Ctry: US</pre>

===AE Codes===
==What encoding does Global Address use?==
:AE codes will tell you what type of error occurred when verifying the address. It is possible to get AE codes along with AV codes (like Sub-premise invalid, multiple match) or AE codes by themselves (postal code error).
:Global Address uses UTF-8 Unicode encoding. It is very important to make sure your data is in UTF-8 when sending to our service, especially for non-Latin countries. Be on the lookup for question marks (?), squares () or other weird characters like �. They may be an indication of encoding issues and may result in data loss.

===AC Codes===
==Why are there different levels of Address Verification Result Codes?==
:AC codes will tell you what we changed in the output address when compared to the input address. Standardizations (Street to St) do not count as changes.  
:The address source data that is available to use as part of our Address validation product can differ from country to country. For most developed countries with a major postal agency like the United States, Great Britain, France, etc, we will have delivery point data available. However, there may be countries where that detailed data is not available, or simply does not exist. In those cases, we will use less detailed information. Here is an overview and inside peak of the types of data available:
:*'''Delivery Point Data'''
::Source: 1 Main St Apartment 12, Anytown 12345, USA
::Max Verification Level: AV25

:*'''Range Data'''
::Source: 100-200 Main St, Anytown 12345, USA
::Max Verification Level: AV24

:*'''Thoroughfare Data'''
The Global Address Web Service has a number of different input fields. They include AddressLines 1-8 as well as parsed input fields like locality, administrative area, postal code, etc. You can pass an address into the global web service in two ways:
::Source: Main St, Anytown 12345, USA
;1. Pass the entire address using just the Address Lines
::Max Verification Level: AV23
:Address1: 2 Holt Street
:Address2: Surry Hills
:Address3: NSW 2010
:Country: AU

;2. Pass the delivery address in the Address Lines and the area information in using the parsed inputs.
:*'''Locality Data'''
:Address1: 2 Holt Street
::Source: Anytown 12345, USA
:Locality: Surry Hills
::Max Verification Level: AV22
:AdministrativeArea: NSW
:PostalCode: 2010
:Country: AU

===Additional Tips===
==What can affect speeds from the Global Address Web Service?==
:#If you have the area information parsed out already and you trust it, pass it into the input parsed fields (method #2 above). We do not want to lose that piece of information and making the service re-parse introduces an unnecessary area for errors. If you are not sure that your parsed area information is correct, pass it into the address lines.
The response time of Global Address Web Service is highly dependent on a number of factors:
:#Be wary of duplicate information. Try not to pass in duplicate information if possible. A common example is the address lines containing the locality and also passing the same value into the input locality.
:#Country input is required. It can contain a country name or an ISO 3166 code, but it must be there.

*'''The country of the input.''' This is one of the most important factors as each country has its own engine in the background and its own verification paths and algorithms. A well-structured country like the US will be much faster than a less structured one like India. Also, a Latin-based address system will usually be faster than a non-Latin system.
*'''The quality of the address.''' A good address will be faster than a bad or partially bad address. Bad addresses will require extra steps, additional fuzzy matching logic, and extra parses before we either find a match or give up.
*'''Real time vs batch.''' Real time means sending one record at a time, while for batch you will send up to 100 records at a time and loop through your data. Processing 100 at a time will increase your overall per address speed by up to 25%-50%.
*'''The location of the client server.''' The distance between our server to yours will make a difference in overall speed. See here for our current server location list:
*'''The current load of our public cloud servers.''' We maintain and make available a large amount of processing capacity. However, all things being equal, the more usage at any given time will decrease overall speed during that time.
*'''The protocol used.''' We have seen SOAP be up to 10% slower than XML or JSON.

===Expected Throughput Speeds===
The Global Address Web Service has a decent number of output fields. These fields can return duplicate information so it is important to understand exactly what each field returns. Please see the [[Global Address Verification:Response|Response Fields]] for more detailed information. For this FAQ, we will focus on which pieces will return you back a full address:
With all of these factors, we cannot give a single throughput number that will cover all the different use cases and types of input. Our expectation is that the majority of users will see varying speeds dependent on the contents of their records and their specific circumstances.

===Normal Mode===
:{| class="alternate01"
!colspan="2"|Single Record Address Validation
::Returns the mailing address in one line.
|Single verification for on demand address validation
!scope="row"|Throughput Range
|50k - 120k records per hour
!scope="row"|Use Cases
|Form entry<br>Point of entry verification<br>Onboarding clients
!colspan="2"|Single Threaded Batch
|Batch processing for small to medium volume processing
!scope="row"|Throughput Range
|100k - 400k records per hour
!scope="row"|Use Cases
|Overnight batch processing<br>Cleansing small-medium sized lists
!colspan="2"|Multi Threaded Batch
|Batch processing for high-volume processing when throughput is important
!scope="row"|Throughput Range
|300k - 800k records per hour
!scope="row"|Use Cases
|Cleansing large lists<br>Data integration and ETL processes

:;Organization + AddressLine1-8
As seen in the figures above, multiple threads will improve performance. We recommend starting with 10-15 threads for processing large batch lists.
::Returns the full address back in multiple lines

===Response Times===
::Returns the parsed area fields. These fields do not constitute a full address and its data duplicated in FormattedAddress and AddressLines1-8. This data can be used for profiling or de-duping.
Response times can vary based upon the postal quality of the country, script, and input quality. For example, validating addresses in Middle Eastern and Asia Pacific countries can cause response times to increase due to the complexity of parsing and validating non-Latin character setsIn contrast, response times in the United States and Canada are expected to be lower due to high postal quality and the Latin-based addresses.
::(These are the additional fields which include: locality, dependent locality, administrative area, sub administrative area, sub national area, & postal code.)

===DeliveryLines Options===
Response times are not guaranteed, but we can provide a general range:
:This option must be turned on. Please see [[Media:DQT_WS_Global_RG.pdf|the manual]] for information on how to do this.

:{| class="alternate01"
::Returns the mailing address in one line.
!Description!!Response Times
|Latin Based Countries||30ms - 150ms
|Non-Latin Based Countries||50ms - 500ms

:;Organization + AddressLine1-8
::Returns just the delivery address (including the dependent locality). Any area information locality or larger is not included.
Melissa utilizes dynamic scaling to improve throughput speeds during peak usage periods. This behavior ensures that our web services can adjust to changes in traffic and consistently provide optimal performance regardless of demand.

If greater-than-usual volume is anticipated, please consult with your sales representative to ensure that Melissa can work with you to successfully manage your campaign.
::Returns the area information in their individual fields not included in AddressLine1-8.
::(These are the additional fields which include: locality, dependent locality, administrative area,  sub administrative area, sub national area, & postal code.)

==How does the Global Address engine handle different Scripts?==
:Global Address has an option called <code>OutputScript</code> that has 3 possible values:
:*<code>NoChange</code>: We will detect the script of the input and leave output in that same script.
:*<code>Latn</code>: We will change the output to Latin on output.
:*<code>Native</code>: We will change the output to the native script of the country (Like Cyrillic for Russia)

:Here are some things to make note of when trying to understand this functionality.
The Global Address Web Service uses UTF-8. Make sure you are storing your address data in Unicode (nvarchar) and passing them to the web service in UTF-8. Be on the lookup for question marks (?), squares (▖) or other weird characters like �. They may be an indication of encoding issues and may result in data loss.
:#Global Address essentially supports up to two scripts per country. Latin and if the country uses a language that is not Latin based, we convert to that language if Native is set.
:#Note that Script and Language are not the same. English, Spanish, French are all Latin Script even if their alphabet and diacritics used are slightly different.
:#We can only change scripts for a record if we are able to verify and validate the address.
:#For Latin based languages, if you specify <code>OutputScript=Latn</code>, we will also remove the diacritics (<code>Gjøvik</code> vs <code>Gjovik</code>)

[[Category:Global Address Verification]]
[[Category:Global Address Verification]]

Latest revision as of 23:28, 28 March 2024

← Global Address Verification

What are the Pros and Cons of using Global Address Public Cloud vs On-Premise?

In terms of the global address verification functionality, the Public Cloud and the On-Premise both will product the same verification results. Our public cloud offer is after all simply a web application layer built on top of the On-Premise API. You will receive the same global coverage regardless of which one you pick. The main reasons for picking one or the other are due to architectural as well as organization policy reasons.

Public Cloud On-Premise
Privacy and Security Data is hosted on Melissa servers in a secure and private environment. We undergo security audits like SOC2 and no PII data is stored. The most private and secure option. The Global Address library and data files are located on your own servers and nothing leaves your network.
Maintenance Start verifying addresses with no setup or installation. No maintenance of servers or data updates to worry about. Updates provided on a quarterly basis that will need to be updated on all machines hosting the on-premise product. Install is included or simply use a copy/replace operation.
Pricing Transactional Pricing. Best and cheapest way to get started and only pay for exactly what you use. Bulk pricing. We are required to charge bulk pricing for putting data on premise. Most cost efficient for high volumes.
Ease of Use Ready to use web application that supports REST/XML/JSON/SOAP on a globally hosted redundant infrastructure. API for use in a programming language like Java/Python/.NET, etc. A low-level API that can be integrated to exactly your needs, but is not a ready to go web service like Public Cloud.

What type of global address information can I send?

Global Address is designed to take full standard addresses. This means that the address must, in most cases, have a house number, a street (or thoroughfare) and identifiable area data (like postal code and locality). The service will not behave particularly well for partial addresses, point of interest information (like a park or plot of land without a formal address), Directions (e.g. "across the street from ABC Bank") or non-address information (like phone numbers). The reason for this is that address verification relies on having a good address data source behind it in order to confirm its validity, and non-formal address data is not good enough to power our verification solution.

What are the minimum input fields required?

The only field that is really required is the country input. We cannot verify an address without a country. Secondly, almost all addresses must effectively have an AddressLine input. While it is theoretically possible to verify an address with a very unique piece of information like a postal code in certain countries and areas, that is very rare.

How should I send in my address input data?

Your input address can be sent in a number of ways. Our main advice is to try and send the data you have with as little manipulation as possible if you have a full address. The most common ways to send it are:
  1. Full address in the AddressLines:
  2. AddressLine1: 22382 Avenida Empresa
    AddressLine2: Rancho Santa Margarita CA, 92688
    Ctry: US
  3. Full address in the AddressLines + Parsed Area:
  4. AddressLine1: 22382 Avenida Empresa
    Locality: Rancho Santa Margarita
    AdministrativeArea: CA
    PostalCode: 92688
    Ctry: US
Note: What you want to avoid as much as possible is sending in duplicated information, like this BAD example below:
AddressLine1: 22382 Avenida Empresa
AddressLine2: Rancho Santa Margarita CA, 92688
Locality: Rancho Santa Margarita
AdministrativeArea: CA
PostalCode: 92688
Ctry: US

What encoding does Global Address use?

Global Address uses UTF-8 Unicode encoding. It is very important to make sure your data is in UTF-8 when sending to our service, especially for non-Latin countries. Be on the lookup for question marks (?), squares (▖) or other weird characters like �. They may be an indication of encoding issues and may result in data loss.

Why are there different levels of Address Verification Result Codes?

The address source data that is available to use as part of our Address validation product can differ from country to country. For most developed countries with a major postal agency like the United States, Great Britain, France, etc, we will have delivery point data available. However, there may be countries where that detailed data is not available, or simply does not exist. In those cases, we will use less detailed information. Here is an overview and inside peak of the types of data available:
  • Delivery Point Data
Source: 1 Main St Apartment 12, Anytown 12345, USA
Max Verification Level: AV25
  • Range Data
Source: 100-200 Main St, Anytown 12345, USA
Max Verification Level: AV24
  • Thoroughfare Data
Source: Main St, Anytown 12345, USA
Max Verification Level: AV23
  • Locality Data
Source: Anytown 12345, USA
Max Verification Level: AV22

What can affect speeds from the Global Address Web Service?

The response time of Global Address Web Service is highly dependent on a number of factors:

  • The country of the input. This is one of the most important factors as each country has its own engine in the background and its own verification paths and algorithms. A well-structured country like the US will be much faster than a less structured one like India. Also, a Latin-based address system will usually be faster than a non-Latin system.
  • The quality of the address. A good address will be faster than a bad or partially bad address. Bad addresses will require extra steps, additional fuzzy matching logic, and extra parses before we either find a match or give up.
  • Real time vs batch. Real time means sending one record at a time, while for batch you will send up to 100 records at a time and loop through your data. Processing 100 at a time will increase your overall per address speed by up to 25%-50%.
  • The location of the client server. The distance between our server to yours will make a difference in overall speed. See here for our current server location list:
  • The current load of our public cloud servers. We maintain and make available a large amount of processing capacity. However, all things being equal, the more usage at any given time will decrease overall speed during that time.
  • The protocol used. We have seen SOAP be up to 10% slower than XML or JSON.

Expected Throughput Speeds

With all of these factors, we cannot give a single throughput number that will cover all the different use cases and types of input. Our expectation is that the majority of users will see varying speeds dependent on the contents of their records and their specific circumstances.

Single Record Address Validation
Description Single verification for on demand address validation
Throughput Range 50k - 120k records per hour
Use Cases Form entry
Point of entry verification
Onboarding clients
Single Threaded Batch
Description Batch processing for small to medium volume processing
Throughput Range 100k - 400k records per hour
Use Cases Overnight batch processing
Cleansing small-medium sized lists
Multi Threaded Batch
Description Batch processing for high-volume processing when throughput is important
Throughput Range 300k - 800k records per hour
Use Cases Cleansing large lists
Data integration and ETL processes

As seen in the figures above, multiple threads will improve performance. We recommend starting with 10-15 threads for processing large batch lists.

Response Times

Response times can vary based upon the postal quality of the country, script, and input quality. For example, validating addresses in Middle Eastern and Asia Pacific countries can cause response times to increase due to the complexity of parsing and validating non-Latin character sets. In contrast, response times in the United States and Canada are expected to be lower due to high postal quality and the Latin-based addresses.

Response times are not guaranteed, but we can provide a general range:

Description Response Times
Latin Based Countries 30ms - 150ms
Non-Latin Based Countries 50ms - 500ms


Melissa utilizes dynamic scaling to improve throughput speeds during peak usage periods. This behavior ensures that our web services can adjust to changes in traffic and consistently provide optimal performance regardless of demand.

If greater-than-usual volume is anticipated, please consult with your sales representative to ensure that Melissa can work with you to successfully manage your campaign.

How does the Global Address engine handle different Scripts?

Global Address has an option called OutputScript that has 3 possible values:
  • NoChange: We will detect the script of the input and leave output in that same script.
  • Latn: We will change the output to Latin on output.
  • Native: We will change the output to the native script of the country (Like Cyrillic for Russia)
Here are some things to make note of when trying to understand this functionality.
  1. Global Address essentially supports up to two scripts per country. Latin and if the country uses a language that is not Latin based, we convert to that language if Native is set.
  2. Note that Script and Language are not the same. English, Spanish, French are all Latin Script even if their alphabet and diacritics used are slightly different.
  3. We can only change scripts for a record if we are able to verify and validate the address.
  4. For Latin based languages, if you specify OutputScript=Latn, we will also remove the diacritics (Gjøvik vs Gjovik)