MatchUp Software:Excel/Access Troubleshooting Guide: Difference between revisions
Created page with "Troubleshooting Tips When Processing an Access or Excel Table Access and Excel are difficult formats to process because they are in a proprietary Microsoft format. Because of..." |
No edit summary |
||
Line 7: | Line 7: | ||
==Add an Autonumber field to your table.== | ==Add an Autonumber field to your table.== | ||
One of the biggest problems with DAO/ADO is that these technologies were written with the assumption that people always access their data in a linear fashion, from start to end. While MatchUp does do this while building keys, it must access the data in a random sequence when it is performing the merge/purge operation. Another way to explain this is to say that DAO/ADO do not use record numbers. So when MatchUp tells DAO/ADO to move to record 58, DAO/ADO uses an awkward (and apparently buggy) algorithm to accomplish this. Adding an autonumber field essentially provides a record number field for your table, saving DAO/ADO from sweating out the computations. | One of the biggest problems with DAO/ADO is that these technologies were written with the assumption that people always access their data in a linear fashion, from start to end. While MatchUp does do this while building keys, it must access the data in a random sequence when it is performing the merge/purge operation. Another way to explain this is to say that DAO/ADO do not use record numbers. So when MatchUp tells DAO/ADO to move to record 58, DAO/ADO uses an awkward (and apparently buggy) algorithm to accomplish this. Adding an autonumber field essentially provides a record number field for your table, saving DAO/ADO from sweating out the computations. | ||
==Try the other technology.== | |||
There is no way to easily get the data access version, so we can’t even say “your data access engine is outdated, get version xxxx”. Also, different versions of Excel perform better with different versions of the data access engine, while the same version of Access might be the complete opposite. | |||
In Tools | User Settings | General: Interface, you can select either DAO or ADO. Often, changing this will do the trick. Note that ADO is usually slower than DAO. | |||
==EDIT OUTPUT FIELD MAPPING== | |||
By the time you configure the Output Field Mapping Tab, MatchUp is creating the structure for the Output Table based on what you have in this screen. However, at this point it does not know the Output table type (you haven’t created it yet). Since MatchUp can create output tables of many different types, each with restrictions of what constitutes a valid structure, you must create an Output Field Mapping structure which conforms with a ‘common valid structure ruleset’ which any Output type can use. | |||
FIELD SIZES | |||
If you're working with a table with rather large fields (ie, over 200 characters), you may want to try reducing the sizes to a more manageable size and see what happens. | |||
Excel most always reports field sizes as 255 characters, attempting to modify the column width in excel will not work. | |||
Be sure that you won't truncate any of your data before doing this. | |||
FIELD COLUMN NAMES | |||
Since some database systems have restrictions on column names, MatchUp follows suit and will not create these Output Table types if the column names are seen by MatchUp as multi-word or have a length of over 10 characters. | |||
Excel allows duplicate field names. MatchUp tries to do what it can with these, but sometimes comes up short. It's best just to fix the problem in the native DBMS before working with the file in MatchUp. And better database practice! | |||
A good rule of thumb – if you look at only the first 10 characters of the column names, and there are more than one instance, or embedded spaces, the program may crash. | |||
FIELD TYPES | |||
If the table has a few 'oddball' fields, you may want to make a copy of the table and try removing the fields. Such fields would be non-character and non-numeric fields, and not critical to the deduping process or the desired output table. | |||
If all fields are required in Output, you can try changing the output column type. For example, different versions of Excel and Access contain options for numeric data, logic fields and dates. If you find one of these to be problematic, try changing the output type to ‘fixed character’. | |||
For a hands off, advanced approach, you can edit your .ini file… | |||
Warning: This is an advanced topic. Proceed only if you know what you are doing! | |||
When MatchUp needs to create a table, it normally uses data from the chart below to determine what DBMS (ie, 'native') data type should be used for a given MatchUp field type: | |||
dBASE FoxPro ASCII Excel Access SQL Server Oracle DB/2 | |||
Fixed Char Character Character Character General Text Char Char Character | |||
Variable Char Numeric Character Character General Text Text or Memo Char or Text VarChar Character | |||
Integer Numeric Integer Numeric General Number: Long Integer Int Long Integer | |||
Float Numeric Double Numeric Scientific Number: Double Float Float Double | |||
Decimal Numeric Numeric Numeric Currency Currency Decimal Number Decimal | |||
Logical Logical Logical Logical Numeric Yes/No Bit Char Character | |||
DateTime Date Date Date Date Date/Time DateTime Date TimeStamp | |||
However, like many things, one size does not fit all. For Excel, Access and SQL Server users, there exists a mechanism to override these native field types. | |||
Regardless of the database engine, you will need to open your dt.ini file (it will be in the folder where MatchUp is installed) in Notepad. At the end of the file (after a blank line or two), add the section listed. | |||
Access or Excel through DAO: | |||
[cDao] | |||
FixedChar=Text <Text, Char or Memo> | |||
VarChar=Text <Text, Char or Memo> | |||
Integer=Long <Byte, Integer or Long> | |||
Float=Double <Single, Double or Float> | |||
Decimal=Currency <Currency or Numeric> | |||
Logical=YesNo <YesNo is the only choice> | |||
DateTime=TimeStamp <Date, Time or TimeStamp> | |||
Above, the black entry indicates the MatchUp field type you want to change, blue indicates the Access/Excel type you want to use, and green indicates your choices for that particular field type (in the dt.ini file, do not type in the green text!). | |||
The available native data types are the very same ones that you would use in a CREATE TABLE statement; consult your Excel or Access documentation for exactly what each data type represents. | |||
Access or Excel through ADO: | |||
Even though we are accessing the same table types as DAO, the native data types are sometimes different (direct your complaints to Microsoft for this). | |||
[cAdo] | |||
FixedChar=Char <Char, VarChar or Memo> | |||
VarChar=VarChar <Char, VarChar or Memo> | |||
Integer=Integer <TinyInt, SmallInt or Integer> | |||
Float=Double <Single or Double> | |||
Decimal=Numeric <Currency, Decimal or Numeric> | |||
Logical=YesNo <YesNo is the only choice> | |||
DateTime=TimeStamp <Date, Time or TimeStamp> | |||
Above, the black entry indicates the MatchUp field type you want to change, blue indicates the Access/Excel type you want to use, and green indicates your choices for that particular field type (in the dt.ini file, do not type in the green text!). | |||
The available native data types are the very same ones that you would use in a CREATE TABLE statement; consult your Excel or Access documentation for exactly what each data type represents. | |||
SQL Server: | |||
[cSQLServer] | |||
FixedChar=Char <Char, VarChar or Text> | |||
VarChar=Char <Char, VarChar or Text> | |||
Integer=Int <TinyInt, SmallInt or Integer> | |||
Float=Float <Float is the only choice> | |||
Decimal=Numeric <Decimal or Numeric> | |||
Logical=Bit <Bit is the only choice> | |||
DateTime=DateTime <DateTime or SmallDateTime> | |||
Above, the black entry indicates the MatchUp field type you want to change, blue indicates the SQL Server type you want to use, and green indicates your choices for that particular field type (in the dt.ini file, do not type in the green text!). | |||
The available native data types are the very same ones that you would use in a CREATE TABLE statement; consult your SQL Server documentation for exactly what each data type represents. | |||
Notes: | |||
• These settings are global, and are not setup file specific. | |||
• If you omit any entry or specify an invalid one, the default will be used. | |||
==CASS CONSIDERATIONS== | |||
Unlike some other database types, MatchUp CASS process does not allow you to append columns to the Excel or Access table you are processing. You must select an existing column. So before processing, you must create the columns in the respective program, using the recommended troubleshooting guidelines above. | |||
Especially with CASS (but also Merge Purge), make sure that you are allowing zero-length (aka, 'Allow NULLs') fields. | |||
==Choose an alternate format== | |||
If all else fails, converting the files to dBASE (or .csv) will work | |||
Our dBASE driver was written in-house and is the most stable and fastest database type available for output tables. | |||
If your Output table must be returned in its native format, it is usually quicker to process in dbase and import the output table to the native program. | |||
Although .csv is not as fast as dBASE, and does not allow you to use some of the advanced options (to write results back to the source table), it is a much more stable option than Excel or Access. | |||
==Fixing Excel Zip Codes== | |||
What's up with Excel's handling of Zip Codes? | |||
Wouldn't we like to know... Excel is very quirky when working with Zip Codes. Sometimes it will get convert them all to numbers. Although that's fine in the midwest, up in the northeast one or more digit of the zip code can be lost. | |||
MatchUp does a good job of converting ‘excel-ed’ zip codes to their five digit format but problems can arise if there is a mix of zip5 and zip9 codes. Since excel will report that the field is numeric, the zip9 codes containing a dash will get wiped out, and MatchUp will see these records as not containing a zip code. | |||
One way to handle this (sometimes this won't work!): | |||
1. Insert a blank column next to the Zip field | |||
2. On the first row of data, type =TEXT(C1,0), where C1 is the cell containing the zip code. | |||
3. Copy the function for the rest of the column (the C1 should correctly adjust to C2, C3, etc as you go down). Your northeast Zip Codes should now read correctly. | |||
4. Click at the very top of the column to highlight the entire row. | |||
5. Right-click and select Copy. | |||
6. Right-click and select Paste Special. Select Values. | |||
7. You can delete the original column of badly formatted Zip Codes. |
Revision as of 20:35, 24 January 2014
Troubleshooting Tips When Processing an Access or Excel Table
Access and Excel are difficult formats to process because they are in a proprietary Microsoft format. Because of this, we are forced to use Microsoft's prescribed methods to access data. MatchUp uses one of two technologies for doing this: ADO and DAO. This document is intended to provide steps in working around the default behavior and providing MatchUp with formats it can handle.
Add an Autonumber field to your table.
One of the biggest problems with DAO/ADO is that these technologies were written with the assumption that people always access their data in a linear fashion, from start to end. While MatchUp does do this while building keys, it must access the data in a random sequence when it is performing the merge/purge operation. Another way to explain this is to say that DAO/ADO do not use record numbers. So when MatchUp tells DAO/ADO to move to record 58, DAO/ADO uses an awkward (and apparently buggy) algorithm to accomplish this. Adding an autonumber field essentially provides a record number field for your table, saving DAO/ADO from sweating out the computations.
Try the other technology.
There is no way to easily get the data access version, so we can’t even say “your data access engine is outdated, get version xxxx”. Also, different versions of Excel perform better with different versions of the data access engine, while the same version of Access might be the complete opposite.
In Tools | User Settings | General: Interface, you can select either DAO or ADO. Often, changing this will do the trick. Note that ADO is usually slower than DAO.
EDIT OUTPUT FIELD MAPPING
By the time you configure the Output Field Mapping Tab, MatchUp is creating the structure for the Output Table based on what you have in this screen. However, at this point it does not know the Output table type (you haven’t created it yet). Since MatchUp can create output tables of many different types, each with restrictions of what constitutes a valid structure, you must create an Output Field Mapping structure which conforms with a ‘common valid structure ruleset’ which any Output type can use.
FIELD SIZES If you're working with a table with rather large fields (ie, over 200 characters), you may want to try reducing the sizes to a more manageable size and see what happens.
Excel most always reports field sizes as 255 characters, attempting to modify the column width in excel will not work. Be sure that you won't truncate any of your data before doing this.
FIELD COLUMN NAMES Since some database systems have restrictions on column names, MatchUp follows suit and will not create these Output Table types if the column names are seen by MatchUp as multi-word or have a length of over 10 characters. Excel allows duplicate field names. MatchUp tries to do what it can with these, but sometimes comes up short. It's best just to fix the problem in the native DBMS before working with the file in MatchUp. And better database practice! A good rule of thumb – if you look at only the first 10 characters of the column names, and there are more than one instance, or embedded spaces, the program may crash.
FIELD TYPES
If the table has a few 'oddball' fields, you may want to make a copy of the table and try removing the fields. Such fields would be non-character and non-numeric fields, and not critical to the deduping process or the desired output table.
If all fields are required in Output, you can try changing the output column type. For example, different versions of Excel and Access contain options for numeric data, logic fields and dates. If you find one of these to be problematic, try changing the output type to ‘fixed character’. For a hands off, advanced approach, you can edit your .ini file…
Warning: This is an advanced topic. Proceed only if you know what you are doing!
When MatchUp needs to create a table, it normally uses data from the chart below to determine what DBMS (ie, 'native') data type should be used for a given MatchUp field type: dBASE FoxPro ASCII Excel Access SQL Server Oracle DB/2 Fixed Char Character Character Character General Text Char Char Character Variable Char Numeric Character Character General Text Text or Memo Char or Text VarChar Character Integer Numeric Integer Numeric General Number: Long Integer Int Long Integer Float Numeric Double Numeric Scientific Number: Double Float Float Double Decimal Numeric Numeric Numeric Currency Currency Decimal Number Decimal Logical Logical Logical Logical Numeric Yes/No Bit Char Character DateTime Date Date Date Date Date/Time DateTime Date TimeStamp
However, like many things, one size does not fit all. For Excel, Access and SQL Server users, there exists a mechanism to override these native field types.
Regardless of the database engine, you will need to open your dt.ini file (it will be in the folder where MatchUp is installed) in Notepad. At the end of the file (after a blank line or two), add the section listed.
Access or Excel through DAO: [cDao] FixedChar=Text <Text, Char or Memo> VarChar=Text <Text, Char or Memo> Integer=Long <Byte, Integer or Long> Float=Double <Single, Double or Float> Decimal=Currency <Currency or Numeric> Logical=YesNo <YesNo is the only choice> DateTime=TimeStamp <Date, Time or TimeStamp>
Above, the black entry indicates the MatchUp field type you want to change, blue indicates the Access/Excel type you want to use, and green indicates your choices for that particular field type (in the dt.ini file, do not type in the green text!).
The available native data types are the very same ones that you would use in a CREATE TABLE statement; consult your Excel or Access documentation for exactly what each data type represents.
Access or Excel through ADO: Even though we are accessing the same table types as DAO, the native data types are sometimes different (direct your complaints to Microsoft for this). [cAdo] FixedChar=Char <Char, VarChar or Memo> VarChar=VarChar <Char, VarChar or Memo> Integer=Integer <TinyInt, SmallInt or Integer> Float=Double <Single or Double> Decimal=Numeric <Currency, Decimal or Numeric> Logical=YesNo <YesNo is the only choice> DateTime=TimeStamp <Date, Time or TimeStamp>
Above, the black entry indicates the MatchUp field type you want to change, blue indicates the Access/Excel type you want to use, and green indicates your choices for that particular field type (in the dt.ini file, do not type in the green text!).
The available native data types are the very same ones that you would use in a CREATE TABLE statement; consult your Excel or Access documentation for exactly what each data type represents.
SQL Server: [cSQLServer] FixedChar=Char <Char, VarChar or Text> VarChar=Char <Char, VarChar or Text> Integer=Int <TinyInt, SmallInt or Integer> Float=Float <Float is the only choice> Decimal=Numeric <Decimal or Numeric> Logical=Bit <Bit is the only choice> DateTime=DateTime <DateTime or SmallDateTime>
Above, the black entry indicates the MatchUp field type you want to change, blue indicates the SQL Server type you want to use, and green indicates your choices for that particular field type (in the dt.ini file, do not type in the green text!).
The available native data types are the very same ones that you would use in a CREATE TABLE statement; consult your SQL Server documentation for exactly what each data type represents.
Notes: • These settings are global, and are not setup file specific. • If you omit any entry or specify an invalid one, the default will be used.
CASS CONSIDERATIONS
Unlike some other database types, MatchUp CASS process does not allow you to append columns to the Excel or Access table you are processing. You must select an existing column. So before processing, you must create the columns in the respective program, using the recommended troubleshooting guidelines above.
Especially with CASS (but also Merge Purge), make sure that you are allowing zero-length (aka, 'Allow NULLs') fields.
Choose an alternate format
If all else fails, converting the files to dBASE (or .csv) will work Our dBASE driver was written in-house and is the most stable and fastest database type available for output tables. If your Output table must be returned in its native format, it is usually quicker to process in dbase and import the output table to the native program. Although .csv is not as fast as dBASE, and does not allow you to use some of the advanced options (to write results back to the source table), it is a much more stable option than Excel or Access.
Fixing Excel Zip Codes
What's up with Excel's handling of Zip Codes?
Wouldn't we like to know... Excel is very quirky when working with Zip Codes. Sometimes it will get convert them all to numbers. Although that's fine in the midwest, up in the northeast one or more digit of the zip code can be lost. MatchUp does a good job of converting ‘excel-ed’ zip codes to their five digit format but problems can arise if there is a mix of zip5 and zip9 codes. Since excel will report that the field is numeric, the zip9 codes containing a dash will get wiped out, and MatchUp will see these records as not containing a zip code. One way to handle this (sometimes this won't work!): 1. Insert a blank column next to the Zip field 2. On the first row of data, type =TEXT(C1,0), where C1 is the cell containing the zip code. 3. Copy the function for the rest of the column (the C1 should correctly adjust to C2, C3, etc as you go down). Your northeast Zip Codes should now read correctly. 4. Click at the very top of the column to highlight the entire row. 5. Right-click and select Copy. 6. Right-click and select Paste Special. Select Values. 7. You can delete the original column of badly formatted Zip Codes.