MatchUp Object:Getting Started
MatchUp Object Introduction Navigation | ||||
---|---|---|---|---|
|
Key Concepts
The following concepts are essential to understanding the logic behind how MatchUp Object functions and successfully integrating the product into applications.
Match Keys
Match Keys are string tokens that represent a database record. They contain only enough information necessary to determine a record’s unique or duplicate status.
Because they only contain a reduced portion of the data in the actual record, MatchUp Object is able to use these keys more efficiently than if it had to compare the complete record against every other record in the database.
Clustering
Once a matchcode key is generated for a given record, it can be compared to the keys of other records. Ideally, every record’s key would be compared to every other record's key. This, however, is not practical in all but very trivial applications because the number of comparisons grows geometrically with the number of records processed. For example, a record set of 100 records requires 4,950 comparisons (99 + 98 +...). A larger set of 10,000 records requires 49,995,000 comparisons (9,999 + 9,998 +...). Large record sets would take prohibitive amounts of time to process.
So we made the assumption that in order for two matchcode keys to be considered matching, there must be something in the keys that must match exactly. In many cases, this will be all or part of the ZIP/Postal Code. So what MatchUp Object does is only compare records that are (in this example) in the same ZIP or Postal Code. On the average (in the US using 5-digit ZIP codes), this will cut the average number of comparisons per record by a factor of thousands.
This concept is known as “break grouping,” “clustering,” “partitioning,” or “neighborhood sorting.” It is very likely that most, if not all other deduping programs have used some form of clustering method.
Here is an example set of matchcode keys using ZIP/Postal Code (5 characters), Last Name(4), First Name(2), Street Number(3), Street Name(5):
02346BERNMA49 GARD 02346BERNMA49 GARD 02357STARBR18 DAME 02357MILLLI123MAIN 03212STARMA18 DAME
When the deduping engine encounters this set of matchcode keys, it compares all the keys in “02346” (2 keys), then “02357” (2 keys), and finally “03212” (1 key). For this small set, 10 comparisons are turned into 2.
In reality, MatchUp Object’s clustering engine is a bit more complicated than this, but this description will aid in understanding its mechanics.
The second deduping engine removes the first component restrictions, allowing you to create matching strategies with rule sets completely independent of each other. This eliminates having to run multiple passes.
Matchcodes
Matchcodes are sets of rules that MatchUp Object uses to determine how match keys are constructed and how much of the key is used for clustering.
For more detail on the subject, see: