Difference between revisions of "Name Object:Config Example"

From Melissa Data Wiki
Jump to navigation Jump to search
Line 1: Line 1:
<nowiki/>; mdName.cfg – the NameObject configuration file
<nowiki/>; mdName.cfg – the NameObject configuration file


;  If you’ve ever wanted to change the behavior of how the NameObject parses, genderizes,
<nowiki/>;  If you’ve ever wanted to change the behavior of how the NameObject parses, genderizes,
;  or creates salutations for certain names, you’ll need to understand how to edit the  
<nowiki/>;  or creates salutations for certain names, you’ll need to understand how to edit the  
;  mdName.cfg file. This file is used to add, change or remove entries from the API’s
<nowiki/>;  mdName.cfg file. This file is used to add, change or remove entries from the API’s
;  stock name tables compiled into the distributed mdName.dat file.  
<nowiki/>;  stock name tables compiled into the distributed mdName.dat file.  


;  For detailed definitions and usage see the actual config file, or the documentation  
<nowiki/>;  For detailed definitions and usage see the actual config file, or the documentation  




;  The content of this file can be used as the actual mdName.cfg file. Just save the  
<nowiki/>;  The content of this file can be used as the actual mdName.cfg file. Just save the  
;  unformatted text and rename to mdName.cfg. Alternately, you can just cut and paste
<nowiki/>;  unformatted text and rename to mdName.cfg. Alternately, you can just cut and paste
;  the below examples into the actual file. Any line beginning with semi-colon is a  
<nowiki/>;  the below examples into the actual file. Any line beginning with semi-colon is a  
;  comment, and has no effect on processing. The uncommented lines are actual  
<nowiki/>;  comment, and has no effect on processing. The uncommented lines are actual  
;  examples of the respective name type.
<nowiki/>;  examples of the respective name type.




; [Prefix] - List of name prefixes.
<nowiki/>; [Prefix] - List of name prefixes.
;  Format is <Prefix>, <Sex>, <Dual Expansion>, <Case>
<nowiki/>;  Format is <Prefix>, <Sex>, <Dual Expansion>, <Case>
;  Proprietary prefixes can cause names to be split incorrectly and name patterns to be  
<nowiki/>;  Proprietary prefixes can cause names to be split incorrectly and name patterns to be  
;  misidentified.  
<nowiki/>;  misidentified.  


; example – change ‘zm phil jackson’ to ‘Zen Master Phil Jackson’ (even though he isn’t)
<nowiki/>; example – change ‘zm phil jackson’ to ‘Zen Master Phil Jackson’ (even though he isn’t)
[Prefix]
[Prefix]
zm,M,'',Zen Master
zm,M,'',Zen Master
Line 27: Line 27:




; [FirstName] - List of first names (used for name splitting, genderizing).
<nowiki/>; [FirstName] - List of first names (used for name splitting, genderizing).
;  Format is <First Name>, <Sex>, <Misspelling>, <Rank>, <Case>
<nowiki/>;  Format is <First Name>, <Sex>, <Misspelling>, <Rank>, <Case>
;  Adding entries in this section helps split and or case uncommon or international names
<nowiki/>;  Adding entries in this section helps split and or case uncommon or international names
;  that are new to existing census or database lists.
<nowiki/>;  that are new to existing census or database lists.


[FirstName]
[FirstName]
Line 37: Line 37:
-HARDY
-HARDY


; [FirstNameFix] - List of misspelled first names and their corrections.
<nowiki/>; [FirstNameFix] - List of misspelled first names and their corrections.
;  Format is <Misspelling>, <Correction>
<nowiki/>;  Format is <Misspelling>, <Correction>
;  Why not just make a spelling correction above, in the [FirstName] <Case> parameter ?
<nowiki/>;  Why not just make a spelling correction above, in the [FirstName] <Case> parameter ?
;  Because, sometimes we want the FirstName additions to help in splitting, but aren’t sure of a  
<nowiki/>;  Because, sometimes we want the FirstName additions to help in splitting, but aren’t sure of a  
;  name correction (maybe ‘Mr. Timotee Smith’ is his correctly spelled name).
<nowiki/>;  name correction (maybe ‘Mr. Timotee Smith’ is his correctly spelled name).
;  Setting the FirstNameSpellingCorrection property tells the NameObject to also use these  
<nowiki/>;  Setting the FirstNameSpellingCorrection property tells the NameObject to also use these  
;  entries to correct misspelled names
<nowiki/>;  entries to correct misspelled names


[FirstNameFix]
[FirstNameFix]
Line 49: Line 49:




; [LNPrefix] - List of last name prefixes  
<nowiki/>; [LNPrefix] - List of last name prefixes  
;  Format is <Last Name Prefix>, <Case>
<nowiki/>;  Format is <Last Name Prefix>, <Case>
;  This example will help identify the ‘Ze’ in ‘Frank Ze Bond’ as part of the last name,  
<nowiki/>;  This example will help identify the ‘Ze’ in ‘Frank Ze Bond’ as part of the last name,  
;  not a middle name
<nowiki/>;  not a middle name


[LNPrefix]
[LNPrefix]
Line 58: Line 58:




; [LastName] - List of last names.
<nowiki/>; [LastName] - List of last names.
;  Format is <Last Name>, <Rank>, <O-Name>, <Case>
<nowiki/>;  Format is <Last Name>, <Rank>, <O-Name>, <Case>
;  Adding entries here is useful for special casing Last Names. It can also be used to identify
<nowiki/>;  Adding entries here is useful for special casing Last Names. It can also be used to identify
;  solitary “O’s” as an indicator of an Irish Last name. Now an example like “joe o jeep” is  
<nowiki/>;  solitary “O’s” as an indicator of an Irish Last name. Now an example like “joe o jeep” is  
;  assumed you want to parse this name as “Joe O’Jeep” but “Joe Ojeep” should not be parsed  
<nowiki/>;  assumed you want to parse this name as “Joe O’Jeep” but “Joe Ojeep” should not be parsed  
;  as an Irish Name. If you wanted to add an Irish last name by flagging the solitary “O” and
<nowiki/>;  as an Irish Name. If you wanted to add an Irish last name by flagging the solitary “O” and
;  a concatenated string like “Joe O Spence” and “Joe Ospence” as “Joe O’Spence” add it as  
<nowiki/>;  a concatenated string like “Joe O Spence” and “Joe Ospence” as “Joe O’Spence” add it as  
;  below…   
<nowiki/>;  below…   


[LastName]
[LastName]
Line 73: Line 73:




; [Suffix] - List of name suffixes.
<nowiki/>; [Suffix] - List of name suffixes.
;  Format is <Suffix>, <Prefix>, <Salutation Remove>, <Dual Name Remove>, <Case>
<nowiki/>;  Format is <Suffix>, <Prefix>, <Salutation Remove>, <Dual Name Remove>, <Case>
;  Chances are, with mostly full name formats, unrecognized suffixes can get split  
<nowiki/>;  Chances are, with mostly full name formats, unrecognized suffixes can get split  
;  into the Last Name component. By adding an entry here, we will now correctly split
<nowiki/>;  into the Last Name component. By adding an entry here, we will now correctly split
;  a record like ‘John Smith, Grand Poohbah
<nowiki/>;  a record like ‘John Smith, Grand Poohbah


[Suffix]
[Suffix]
Line 83: Line 83:




; [DualIndicator] - List of dual name connectors.
<nowiki/>; [DualIndicator] - List of dual name connectors.
;  Format is <Dual Name Connector>, <Delete>
<nowiki/>;  Format is <Dual Name Connector>, <Delete>
;  the practical example ‘Trustee for’ is already in the distributed data file, so a less probable
<nowiki/>;  the practical example ‘Trustee for’ is already in the distributed data file, so a less probable
;  example: ‘john smith married susan jones’
<nowiki/>;  example: ‘john smith married susan jones’


[DualIndicator]
[DualIndicator]
Line 92: Line 92:




; [Suspect] - List of suspicious words & phrases.
<nowiki/>; [Suspect] - List of suspicious words & phrases.
;  Format is <Word/Phrase>, <Indicator>
<nowiki/>;  Format is <Word/Phrase>, <Indicator>
;  these words still get parsed, but the error code will identify them as vulgar, a company identifier ;  or suspect. There may even be a pre-existing entry which you may later determine to be
<nowiki/>;  these words still get parsed, but the error code will identify them as vulgar, a company identifier ;  or suspect. There may even be a pre-existing entry which you may later determine to be
;  a real name. Example: my new boss is ‘Fred Scat’. Ouch.
<nowiki/>;  a real name. Example: my new boss is ‘Fred Scat’. Ouch.
;  NOTE: no <case> parameter for this table
<nowiki/>;  NOTE: no <case> parameter for this table


[Suspect]
[Suspect]
Line 114: Line 114:




;  When an input name is flagged with a [Suspect] company indicator, you may choose
<nowiki/>;  When an input name is flagged with a [Suspect] company indicator, you may choose
;  to pass that input into the StandardizeCompany method. The following two table
<nowiki/>;  to pass that input into the StandardizeCompany method. The following two table
;  overrides allow you to apply special casing to the returned company.
<nowiki/>;  overrides allow you to apply special casing to the returned company.




;  [Acronym] - These entries (4 letters or less) are NOT Acronyms and will be proper cased
<nowiki/>;  [Acronym] - These entries (4 letters or less) are NOT Acronyms and will be proper cased
;              when passed through the StandardizeCompany method
<nowiki/>;              when passed through the StandardizeCompany method
;      Format is <Lookup>
<nowiki/>;      Format is <Lookup>
;        <Lookup> = A short word that you do not want uppercased like an Acronym
<nowiki/>;        <Lookup> = A short word that you do not want uppercased like an Acronym


; example. The following may actually represent a company name, not an acronym
<nowiki/>; example. The following may actually represent a company name, not an acronym
; like ‘Duz Brothers Inc’, so we don’t want it all capitalized
<nowiki/>; like ‘Duz Brothers Inc’, so we don’t want it all capitalized
;
[Acronym]
[Acronym]
DUZ
DUZ


; [Company] - Words and phrases from company names that do not follow common casing rules.
<nowiki/>; [Company] - Words and phrases from company names that do not follow common casing rules.
;  Format is <Company>, <Case>
<nowiki/>;  Format is <Company>, <Case>
;      <Company> = The lookup word which requires special casing
<nowiki/>;      <Company> = The lookup word which requires special casing
;      <Case>    = The way this lookup word should be cased
<nowiki/>;      <Case>    = The way this lookup word should be cased


; example: These entries should be identified as Companies in the [Suspect] section (see above)
<nowiki/>; example: These entries should be identified as Companies in the [Suspect] section (see above)
; When the StandardizeCompany method is called, the following substitutions should be made
<nowiki/>; When the StandardizeCompany method is called, the following substitutions should be made
; when the identified company is actually ‘ABC ZZx’, not ‘Abc Zzx’
<nowiki/>; when the identified company is actually ‘ABC ZZx’, not ‘Abc Zzx’
[Company]
[Company]
ABC,ABC
ABC,ABC
Line 143: Line 143:
   
   


; [DualPattern] - List of dual name patterns.
<nowiki/>; [DualPattern] - List of dual name patterns.
;  This one is much more advanced than the others, and should not be edited without
<nowiki/>;  This one is much more advanced than the others, and should not be edited without
;  contacting support. While editing the other above entries would affect that particular word,
<nowiki/>;  contacting support. While editing the other above entries would affect that particular word,
;  editing here could negatively affect your entire process.  
<nowiki/>;  editing here could negatively affect your entire process.  


;  Format is <Pattern>, <Counts>, <Name Types>, <Split Type>
<nowiki/>;  Format is <Pattern>, <Counts>, <Name Types>, <Split Type>


;  P?&P?,> >,1,1        already exists and helps split ‘Mr. Smithhh and Mrs. Smithhh’
<nowiki/>;  P?&P?,> >,1,1        already exists and helps split ‘Mr. Smithhh and Mrs. Smithhh’
;                                    or ‘Mr. Johnnn Smithhh & Dr. Maryy Lynne Smithhhh’
<nowiki/>;                                    or ‘Mr. Johnnn Smithhh & Dr. Maryy Lynne Smithhhh’
;  ?F&?PF,,6,2            already exists and helps split ‘Smithhhh, John and Dr. Mary’
<nowiki/>;  ?F&?PF,,6,2            already exists and helps split ‘Smithhhh, John and Dr. Mary’




Line 158: Line 158:




;  Although you may not find the examples here impractical, test them out on sample data  
<nowiki/>;  Although you may not find the examples here impractical, test them out on sample data  
;  to see how this alternate config file changes NameObject results. And if you ever come up with  
<nowiki/>;  to see how this alternate config file changes NameObject results. And if you ever come up with  
;  common edits we have over-looked, please let us know, we are always trying to make the
<nowiki/>;  common edits we have over-looked, please let us know, we are always trying to make the
;  API even more accurate.
<nowiki/>;  API even more accurate.

Revision as of 16:44, 19 February 2014

; mdName.cfg – the NameObject configuration file

; If you’ve ever wanted to change the behavior of how the NameObject parses, genderizes, ; or creates salutations for certain names, you’ll need to understand how to edit the ; mdName.cfg file. This file is used to add, change or remove entries from the API’s ; stock name tables compiled into the distributed mdName.dat file.

; For detailed definitions and usage see the actual config file, or the documentation


; The content of this file can be used as the actual mdName.cfg file. Just save the ; unformatted text and rename to mdName.cfg. Alternately, you can just cut and paste ; the below examples into the actual file. Any line beginning with semi-colon is a ; comment, and has no effect on processing. The uncommented lines are actual ; examples of the respective name type.


; [Prefix] - List of name prefixes. ; Format is <Prefix>, <Sex>, <Dual Expansion>, <Case> ; Proprietary prefixes can cause names to be split incorrectly and name patterns to be ; misidentified.

; example – change ‘zm phil jackson’ to ‘Zen Master Phil Jackson’ (even though he isn’t) [Prefix] zm,M,,Zen Master Mr and Mrs,,Mr ans Mrs,Mr ans Mrs


; [FirstName] - List of first names (used for name splitting, genderizing). ; Format is <First Name>, <Sex>, <Misspelling>, <Rank>, <Case> ; Adding entries in this section helps split and or case uncommon or international names ; that are new to existing census or database lists.

[FirstName] Timotee,7,x,,Timotee Deshawn,7,x,,DeShawn -HARDY

; [FirstNameFix] - List of misspelled first names and their corrections. ; Format is <Misspelling>, <Correction> ; Why not just make a spelling correction above, in the [FirstName] <Case> parameter ? ; Because, sometimes we want the FirstName additions to help in splitting, but aren’t sure of a ; name correction (maybe ‘Mr. Timotee Smith’ is his correctly spelled name). ; Setting the FirstNameSpellingCorrection property tells the NameObject to also use these ; entries to correct misspelled names

[FirstNameFix] Timotee,Timothy


; [LNPrefix] - List of last name prefixes ; Format is <Last Name Prefix>, <Case> ; This example will help identify the ‘Ze’ in ‘Frank Ze Bond’ as part of the last name, ; not a middle name

[LNPrefix] ze,Ze


; [LastName] - List of last names. ; Format is <Last Name>, <Rank>, <O-Name>, <Case> ; Adding entries here is useful for special casing Last Names. It can also be used to identify ; solitary “O’s” as an indicator of an Irish Last name. Now an example like “joe o jeep” is ; assumed you want to parse this name as “Joe O’Jeep” but “Joe Ojeep” should not be parsed ; as an Irish Name. If you wanted to add an Irish last name by flagging the solitary “O” and ; a concatenated string like “Joe O Spence” and “Joe Ospence” as “Joe O’Spence” add it as ; below…

[LastName] Legrandless,,,LeGrandless ojeep,,X,Ojeep ospence,,X,O’Spence


; [Suffix] - List of name suffixes. ; Format is <Suffix>, <Prefix>, <Salutation Remove>, <Dual Name Remove>, <Case> ; Chances are, with mostly full name formats, unrecognized suffixes can get split ; into the Last Name component. By adding an entry here, we will now correctly split ; a record like ‘John Smith, Grand Poohbah

[Suffix] grand poohbah,GrP,,,Grand PoohbaH


; [DualIndicator] - List of dual name connectors. ; Format is <Dual Name Connector>, <Delete> ; the practical example ‘Trustee for’ is already in the distributed data file, so a less probable ; example: ‘john smith married susan jones’

[DualIndicator] married


; [Suspect] - List of suspicious words & phrases. ; Format is <Word/Phrase>, <Indicator> ; these words still get parsed, but the error code will identify them as vulgar, a company identifier ; or suspect. There may even be a pre-existing entry which you may later determine to be ; a real name. Example: my new boss is ‘Fred Scat’. Ouch. ; NOTE: no <case> parameter for this table

[Suspect]

frakkin,V shoes,C joe the plumber,S -scat ABC,C ZZX,C DUZ,C




; When an input name is flagged with a [Suspect] company indicator, you may choose ; to pass that input into the StandardizeCompany method. The following two table ; overrides allow you to apply special casing to the returned company.


; [Acronym] - These entries (4 letters or less) are NOT Acronyms and will be proper cased ; when passed through the StandardizeCompany method ; Format is <Lookup> ; <Lookup> = A short word that you do not want uppercased like an Acronym

; example. The following may actually represent a company name, not an acronym ; like ‘Duz Brothers Inc’, so we don’t want it all capitalized

[Acronym] DUZ

; [Company] - Words and phrases from company names that do not follow common casing rules. ; Format is <Company>, <Case> ; <Company> = The lookup word which requires special casing ; <Case> = The way this lookup word should be cased

; example: These entries should be identified as Companies in the [Suspect] section (see above) ; When the StandardizeCompany method is called, the following substitutions should be made ; when the identified company is actually ‘ABC ZZx’, not ‘Abc Zzx’ [Company] ABC,ABC ZZX,ZZx


; [DualPattern] - List of dual name patterns. ; This one is much more advanced than the others, and should not be edited without ; contacting support. While editing the other above entries would affect that particular word, ; editing here could negatively affect your entire process.

; Format is <Pattern>, <Counts>, <Name Types>, <Split Type>

; P?&P?,> >,1,1 already exists and helps split ‘Mr. Smithhh and Mrs. Smithhh’ ; or ‘Mr. Johnnn Smithhh & Dr. Maryy Lynne Smithhhh’ ; ?F&?PF,,6,2 already exists and helps split ‘Smithhhh, John and Dr. Mary’



; Although you may not find the examples here impractical, test them out on sample data ; to see how this alternate config file changes NameObject results. And if you ever come up with ; common edits we have over-looked, please let us know, we are always trying to make the ; API even more accurate.