MatchUp Object:Hybrid Interface

From Melissa Data Wiki
Jump to navigation Jump to search

← MatchUp Object Reference

MatchUp Object Hybrid Interface Navigation
Overview
Order of Operations
Functions
Initialization
Mapping
Match Key
Comparison



Overview

The Hybrid interface differs from the Incremental and Read-Write interfaces in that it does not maintain a key file of its own. It is up to the developer to maintain a list of match keys to use for deduping operations. This increases the flexibility of the Hybrid interface but at the expense of programming complexity.

The main advantage of Hybrid deduping is that it allows the developer to build smaller lists of match keys on the fly and quickly compare records to a small subset of the database.

Clustering

The concept of Clustering, outlined in the key concepts section, is essential to the Hybrid interface. Unlike the other interfaces, where the clustering is taking place behind the scenes, the Hybrid interface allows the developer to use clustering to compare a record against only a small portion of a list.

The Hybrid interface uses the concept of a cluster size, which is the maximum number of characters at the beginning of a key that can be used to group a number of keys into smaller groups that can be compared against each other. For example, a cluster size of 5 means that the first five characters of a match key are used to create the clusters.

In other words, only the records where the first five characters of the match key for one record are identical to the first five characters of the match key for another record are considered when performing a Hybrid deduping operation.

Key Maintenance

Unlike the other interfaces, the Hybrid interface does not automatically handle the read/write operations to a key file. While this forces the developer to do more work, it allows a great deal of flexibility in how match keys are stored and handled.

In the previous example, with a cluster size of 5, if the match keys are stored in a field within a SQL database, a cluster could be built quickly by performing a SELECT query where the first five characters of the match key field matches the first five characters of the match key for the new record.

While this gives the developer far more flexibility, it also requires a great deal more coding and a greater understanding of certain MatchUp concepts.