MatchUp Object:Incremental Deduping

From Melissa Data Wiki
Jump to navigation Jump to search


Incremental deduping is usually used for real-time data entry validation. For example, a call center data-entry system where an operator would like to determine whether or not the caller is an existing customer. At any time, a calling program can pass the incremental deduping engine the contents of a record; the engine will then report as to whether or not this record is a dupe, and if so, which record or records it matches.

Incremental deduping consists of the following steps:

  1. The program processes a record and sends the specific information (ZIP/PC, Name, Address, etc) to MatchUp Object.
  2. Based on previous records sent to the API, it reports whether or not the record from the first step matches any of these previous records.
  3. Optionally, the application can tell MatchUp Object to add this record to its database for consideration in future comparisons.

The Historical Database

The incremental deduping engine relies heavily on a historical database that it maintains. The lifetime of this database is as long as necessary (seconds, days, even years). This database is constructed and maintained by MatchUp Object, so it can determine whether or not an incoming record matches other records fairly quickly.

Multi-User/Multi-Thread Considerations

Incremental deduping is unique in that multiple users or multiple processes can access the same historical database simultaneously. The API maintains a locking system to ensure that competing processes don't collide. In order for two processes to work in this fashion, the initialization function for each process must specify the same historical database (a.k.a. “key file”).

Transaction-Based Processing

The Incremental deduper interface of MatchUp Object features the option of using transaction-based operations on the historical database. This enables an application to process multiple calls to the AddRecord function as one, speeding up processing of large lists.

Incremental Order of Operations

Using the Incremental deduper is pretty straightforward. This section will outline the basic steps and then show an example of the programming logic for a typical implementation of the Incremental deduper.

  1. Initialize the Incremental deduper.
  2. After creating an instance of the Incremental deduper, point the object toward its supporting data file, select a matchcode and key file to use, and initialize these files.
  3. Create field mappings.
  4. In order to build a key to compare to the key file, the Incremental deduper needs to know which types of data the program will be passing to the deduper and in what order.
  5. Read the record from the data source.
  6. This can be a new address passed from a website, a single record from a newly acquired list or data source, to be compared against the master list.
  7. Build a match key for the incoming record.
  8. This consists of passing the actual data to the deduper in the same order used when creating a field mapping. After passing the necessary fields (usually a small subset of the fields from each record) via the AddField function, the Incremental deduper uses this information to generate a match key.
  9. Compare the match key to the key file.
  10. The MatchRecord function searches the key file for any keys that match the new record. If it finds a match, it provides information on the duplicate records in the key file.
  11. Write new records to the key file.
  12. The new key, whether or not it is unique, can then be written to the key file, so it can be used for future deduping operations. The program code must also write the new address record to the database separately.