FAQ:SSIS:Data Quality Components

From Melissa Data Wiki
Jump to navigation Jump to search

← SSIS:Data Quality Components

What is affected during an install or upgrade

By default the data is stored at this path: C:\Program Files\Melissa DATA\DQT\Data
  • Sample Projects is located
    • C:\Program Files\Melissa DATA\DQT\DQS\Samples
  • Component DLLS:
    • C:\Program Files\Melissa DATA\DQT\DQS\SSIS_2005
    • C:\Program Files\Melissa DATA\DQT\DQS\SSIS_2008
    • C:\Program Files\Melissa DATA\DQT\DQS\SSIS_2012
    • C:\Program Files\Melissa DATA\DQT\DQS\SSIS_2014
  • Object DLLS:
    • C:\Program Files\Melissa DATA\DQT\DQS\32_bit
    • C:\Program Files\Melissa DATA\DQT\DQS\64_bit

How do I enable On-Premise Global Verify?

Please make sure that you have a valid license key that has the local processing enabled. Otherwise the on-premise settings in advanced configuration settings screen will not be visible. You need to set the countries that you want to be processed through locally, the other countries that is not selected will be processed through the Global Verify web service. Please make sure your datafile structures is correct and all datafiles is present, otherwise the dll may crash or reject your settings. When everything is set you can now enter the path for the root folder of your data files into the data path field. You can test to see if you have everything set correctly by pressing test configuration. The tester will report any issues that it finds with your setup. Please contact technical support if you are still having issues with your setup and we will gladly help you out.

How come I can't process less than 100 records for SmartMover NCOA?

USPS regulations state we cannot process NCOA for files less than 100 records. In adherence to the new regulations, SmartMover now checks the first batch to see if it contains less than 100 records. If the batch is less than 100 records we will reject the batch and return an error message. SmartMover will interrupt the processing after processing begins. Please contact your sales representative if you have any questions or concerns regarding this change.

Can we save current settings for DQC for future?

Yes, Melissa Data SSIS includes the feature that you can save currents settings for future use. Under File Tab you can save current setting in a .cfg file in local your drive. Under same File tab click "Open" and upload the .cfg file.

Speed Facts for CVC?

EVC is an enterprise component so performance of EVC is highly dependent on machine configuration. Operating System, RAM, Processor or using network drive for data input or output, make significant impact on performance of EVC. In addition performance also depends upon 2 more main factors:

How you are using EVC?

It is highly recommended for EVC users, select the only those features i.e. LACSLink(Address Add-on) DNS Lookup(Email, Add-on), which are required. Unnecessary use of add-on and feature will effect the performance.

Type of Data?

If the input data is really corrupted or having lot of errors, will effect the performance.

We have additional 2 different scenarios for using EVC, which will help you to analyse the throughput of the component in different situations. For more information you can click on following link: SSIS Speed Scenarios

How do I make Excel Source Drivers work?

The version of the OLE Drivers must match your Office edition (32bit vs 64bit) in your project properties. If you have 32bit Office, click Project then Properties. Under configuration properties, click "debugging" and set "Run64BitRunTime" to False. If you have 64bit Office then set "Run64BitRunTime" to True.

In which format Data Quality Components(DQC) accept input?

You can use any format as input for Melissa Data SSIS components. It can be .csv, .xls, .dbf, xlsx, SQL database etc. In some cases you might not see any column in drop down options of MD SSIS component, in this case you can use "Data Conversion"(in-built component in BIDS) to change the data type of particular missing column to "DT_WSTR".

How we can update individual Melissa Data Object rather than updating whole CVC?

EVC component installs all Melissa Data Objects in "Binn" folder under DTS folder of SQL Server. If you are using EVC in SQL Server 2008 and need to update only Address Object grab latest mdAddr.dll and replace older mdAddr.dll in following location:
C:\\Program Files\\Microsoft SQL Server\\100\\DTS\\PipelineComponents
Same way you can update other Melissa Data Objects i.e for Email Object replace mdEmail.dll, for GeoCoder mdGeo.dll, etc.
If you are installed EVC on SQL Server 2005, the location of Melissa Data Object will be in following location:
C:\\Program Files\\Microsoft SQL Server\\90\\DTS\\PipelineComponents

Microsoft Fuzzy Lookup vs. The Data Matching and Cleansing Component?

The Microsoft transforms are only available with SQL Server enterprise edition which is a significant cost in itself if your organization doesn’t have it already.
All of the following transforms are SQL Server Enterprise edition only:
  • Data Mining Training Destination
  • Data Mining Query Component
  • Fuzzy Grouping
  • Fuzzy Lookup
  • Term Extraction
  • Term Lookup
  • Dimension Processing Destination
  • Partition Processing Destination

The MD Fuzzy Matching Component allows more granular control over matching whereas the SSIS native fuzzy matching does not allow any control of its algorithms or access to multiple algorithms. There are only two configurable parameters with the stock Fuzzy lookup and grouping components ie: similarity threshold and maximum number of results per input record.

The MD Fuzzy Matching component has 16 configurable matching algorithms some of them developed specifically by us such as the unique Keyboard distance for misspellings. All industry standard matching algorithms are supported, and probabilistic match settings can be configured for each column of data and the rules can be daisy chained for maximum catch all situations. Also the component has the ability to ignore common data quality issues when matching. ie; whitespaces, irregular characters, etc.

The Melissa Data Fuzzy Matching Component can also inherently cleanse your data through regular expressions. Regular expressions can be built and tested through an expression builder which is also built in to the component. Cleansing your data prior to matching is a crucial step in order to get the most accurate results for matching.

The component also is able to automatically split up the results into 3 destinations: Matches, Possible Matches and Non-Matches. Based off the percentage score between 2 records, the component will re-direct the results to the appropriate destination according to the percent thresholds you’ve set in the component. Also the Fuzzy SSIS component has full metadata (source and compare scores for multiple algorithms) for tracking record lineages.

One more thing to bear in mind is that, the stock Fuzzy Lookup component in SSIS requires that your reference table be a SQL Table. The Fuzzy Matching Component does not limit you to a SQL Table Reference Database.

The DMCC suite also comes with the MatchUP component, which is specially designed for the unique set of matching problems with Address, Name and Company data fields. MatchUp recognizes combinations, inverse names, Acronyms, Nick names, numerical streets such as 12th vs. twelfth, and so on.

What Versions of SQL Server/Visual Studio Is Your Components Supported in?

Melissa Data currently supports 2008, 2012, and 2014 versions of SQL Server. We also provide components for SQL Server 2005 as part of the SSIS Installation, however, it is no longer supported. Certain Microsoft SQL Server version are supported by specific Microsoft Visual Studio versions for our components.
  • Microsoft SQL Server 2014 - Microsoft Visual Studio 2013
  • Microsoft SQL Server 2012 - Microsoft Visual Studio 2010/2012
  • Microsoft SQL Server 2010 - Microsoft Visual Studio 2008
  • Microsoft SQL Server 2008 - Microsoft Visual Studio 2005