MatchUp Object:Hybrid Interface: Difference between revisions
No edit summary |
No edit summary |
||
Line 4: | Line 4: | ||
==Overview== | ==Overview== | ||
The Hybrid interface differs from the [[MatchUp Object:Incremental Interface|Incremental]] and [[MatchUp Object:Read-Write Interface|Read-Write | The Hybrid interface differs from the [[MatchUp Object:Incremental Interface|Incremental]] and [[MatchUp Object:Read-Write Interface|Read-Write]] interfaces in that it does not maintain a key file of its own. It is up to the developer to maintain a list of match keys to use for deduping operations. This increases the flexibility of the Hybrid interface but at the expense of programming complexity. | ||
The main advantage of Hybrid deduping is that it allows the developer to build smaller lists of match keys on the fly and quickly compare records to a small subset of the database. | The main advantage of Hybrid deduping is that it allows the developer to build smaller lists of match keys on the fly and quickly compare records to a small subset of the database. |
Latest revision as of 17:24, 29 July 2015
MatchUp Object Hybrid Interface Navigation | |||||
---|---|---|---|---|---|
Overview | |||||
Order of Operations | |||||
|
Overview
The Hybrid interface differs from the Incremental and Read-Write interfaces in that it does not maintain a key file of its own. It is up to the developer to maintain a list of match keys to use for deduping operations. This increases the flexibility of the Hybrid interface but at the expense of programming complexity.
The main advantage of Hybrid deduping is that it allows the developer to build smaller lists of match keys on the fly and quickly compare records to a small subset of the database.
Clustering
The concept of Clustering, outlined in the key concepts section, is essential to the Hybrid interface. Unlike the other interfaces, where the clustering is taking place behind the scenes, the Hybrid interface allows the developer to use clustering to compare a record against only a small portion of a list.
The Hybrid interface uses the concept of a cluster size, which is the maximum number of characters at the beginning of a key that can be used to group a number of keys into smaller groups that can be compared against each other. For example, a cluster size of 5 means that the first five characters of a match key are used to create the clusters.
In other words, only the records where the first five characters of the match key for one record are identical to the first five characters of the match key for another record are considered when performing a Hybrid deduping operation.
Key Maintenance
Unlike the other interfaces, the Hybrid interface does not automatically handle the read/write operations to a key file. While this forces the developer to do more work, it allows a great deal of flexibility in how match keys are stored and handled.
In the previous example, with a cluster size of 5, if the match keys are stored in a field within a SQL database, a cluster could be built quickly by performing a SELECT query where the first five characters of the match key field matches the first five characters of the match key for the new record.
While this gives the developer far more flexibility, it also requires a great deal more coding and a greater understanding of certain MatchUp concepts.