Name Object:Config Example: Difference between revisions
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
<nowiki/>; mdName.cfg – the NameObject configuration file | <nowiki/>; mdName.cfg – the NameObject configuration file | ||
; If you’ve ever wanted to change the behavior of how the NameObject parses, genderizes, | <nowiki/>; If you’ve ever wanted to change the behavior of how the NameObject parses, genderizes, | ||
; or creates salutations for certain names, you’ll need to understand how to edit the | <nowiki/>; or creates salutations for certain names, you’ll need to understand how to edit the | ||
; mdName.cfg file. This file is used to add, change or remove entries from the API’s | <nowiki/>; mdName.cfg file. This file is used to add, change or remove entries from the API’s | ||
; stock name tables compiled into the distributed mdName.dat file. | <nowiki/>; stock name tables compiled into the distributed mdName.dat file. | ||
; For detailed definitions and usage see the actual config file, or the documentation | <nowiki/>; For detailed definitions and usage see the actual config file, or the documentation | ||
; The content of this file can be used as the actual mdName.cfg file. Just save the | <nowiki/>; The content of this file can be used as the actual mdName.cfg file. Just save the | ||
; unformatted text and rename to mdName.cfg. Alternately, you can just cut and paste | <nowiki/>; unformatted text and rename to mdName.cfg. Alternately, you can just cut and paste | ||
; the below examples into the actual file. Any line beginning with semi-colon is a | <nowiki/>; the below examples into the actual file. Any line beginning with semi-colon is a | ||
; comment, and has no effect on processing. The uncommented lines are actual | <nowiki/>; comment, and has no effect on processing. The uncommented lines are actual | ||
; examples of the respective name type. | <nowiki/>; examples of the respective name type. | ||
; [Prefix] - List of name prefixes. | <nowiki/>; [Prefix] - List of name prefixes. | ||
; Format is <Prefix>, <Sex>, <Dual Expansion>, <Case> | <nowiki/>; Format is <Prefix>, <Sex>, <Dual Expansion>, <Case> | ||
; Proprietary prefixes can cause names to be split incorrectly and name patterns to be | <nowiki/>; Proprietary prefixes can cause names to be split incorrectly and name patterns to be | ||
; misidentified. | <nowiki/>; misidentified. | ||
; example – change ‘zm phil jackson’ to ‘Zen Master Phil Jackson’ (even though he isn’t) | <nowiki/>; example – change ‘zm phil jackson’ to ‘Zen Master Phil Jackson’ (even though he isn’t) | ||
[Prefix] | [Prefix] | ||
zm,M,'',Zen Master | zm,M,'',Zen Master | ||
Line 27: | Line 27: | ||
; [FirstName] - List of first names (used for name splitting, genderizing). | <nowiki/>; [FirstName] - List of first names (used for name splitting, genderizing). | ||
; Format is <First Name>, <Sex>, <Misspelling>, <Rank>, <Case> | <nowiki/>; Format is <First Name>, <Sex>, <Misspelling>, <Rank>, <Case> | ||
; Adding entries in this section helps split and or case uncommon or international names | <nowiki/>; Adding entries in this section helps split and or case uncommon or international names | ||
; that are new to existing census or database lists. | <nowiki/>; that are new to existing census or database lists. | ||
[FirstName] | [FirstName] | ||
Line 37: | Line 37: | ||
-HARDY | -HARDY | ||
; [FirstNameFix] - List of misspelled first names and their corrections. | <nowiki/>; [FirstNameFix] - List of misspelled first names and their corrections. | ||
; Format is <Misspelling>, <Correction> | <nowiki/>; Format is <Misspelling>, <Correction> | ||
; Why not just make a spelling correction above, in the [FirstName] <Case> parameter ? | <nowiki/>; Why not just make a spelling correction above, in the [FirstName] <Case> parameter ? | ||
; Because, sometimes we want the FirstName additions to help in splitting, but aren’t sure of a | <nowiki/>; Because, sometimes we want the FirstName additions to help in splitting, but aren’t sure of a | ||
; name correction (maybe ‘Mr. Timotee Smith’ is his correctly spelled name). | <nowiki/>; name correction (maybe ‘Mr. Timotee Smith’ is his correctly spelled name). | ||
; Setting the FirstNameSpellingCorrection property tells the NameObject to also use these | <nowiki/>; Setting the FirstNameSpellingCorrection property tells the NameObject to also use these | ||
; entries to correct misspelled names | <nowiki/>; entries to correct misspelled names | ||
[FirstNameFix] | [FirstNameFix] | ||
Line 49: | Line 49: | ||
; [LNPrefix] - List of last name prefixes | <nowiki/>; [LNPrefix] - List of last name prefixes | ||
; Format is <Last Name Prefix>, <Case> | <nowiki/>; Format is <Last Name Prefix>, <Case> | ||
; This example will help identify the ‘Ze’ in ‘Frank Ze Bond’ as part of the last name, | <nowiki/>; This example will help identify the ‘Ze’ in ‘Frank Ze Bond’ as part of the last name, | ||
; not a middle name | <nowiki/>; not a middle name | ||
[LNPrefix] | [LNPrefix] | ||
Line 58: | Line 58: | ||
; [LastName] - List of last names. | <nowiki/>; [LastName] - List of last names. | ||
; Format is <Last Name>, <Rank>, <O-Name>, <Case> | <nowiki/>; Format is <Last Name>, <Rank>, <O-Name>, <Case> | ||
; Adding entries here is useful for special casing Last Names. It can also be used to identify | <nowiki/>; Adding entries here is useful for special casing Last Names. It can also be used to identify | ||
; solitary “O’s” as an indicator of an Irish Last name. Now an example like “joe o jeep” is | <nowiki/>; solitary “O’s” as an indicator of an Irish Last name. Now an example like “joe o jeep” is | ||
; assumed you want to parse this name as “Joe O’Jeep” but “Joe Ojeep” should not be parsed | <nowiki/>; assumed you want to parse this name as “Joe O’Jeep” but “Joe Ojeep” should not be parsed | ||
; as an Irish Name. If you wanted to add an Irish last name by flagging the solitary “O” and | <nowiki/>; as an Irish Name. If you wanted to add an Irish last name by flagging the solitary “O” and | ||
; a concatenated string like “Joe O Spence” and “Joe Ospence” as “Joe O’Spence” add it as | <nowiki/>; a concatenated string like “Joe O Spence” and “Joe Ospence” as “Joe O’Spence” add it as | ||
; below… | <nowiki/>; below… | ||
[LastName] | [LastName] | ||
Line 73: | Line 73: | ||
; [Suffix] - List of name suffixes. | <nowiki/>; [Suffix] - List of name suffixes. | ||
; Format is <Suffix>, <Prefix>, <Salutation Remove>, <Dual Name Remove>, <Case> | <nowiki/>; Format is <Suffix>, <Prefix>, <Salutation Remove>, <Dual Name Remove>, <Case> | ||
; Chances are, with mostly full name formats, unrecognized suffixes can get split | <nowiki/>; Chances are, with mostly full name formats, unrecognized suffixes can get split | ||
; into the Last Name component. By adding an entry here, we will now correctly split | <nowiki/>; into the Last Name component. By adding an entry here, we will now correctly split | ||
; a record like ‘John Smith, Grand Poohbah | <nowiki/>; a record like ‘John Smith, Grand Poohbah | ||
[Suffix] | [Suffix] | ||
Line 83: | Line 83: | ||
; [DualIndicator] - List of dual name connectors. | <nowiki/>; [DualIndicator] - List of dual name connectors. | ||
; Format is <Dual Name Connector>, <Delete> | <nowiki/>; Format is <Dual Name Connector>, <Delete> | ||
; the practical example ‘Trustee for’ is already in the distributed data file, so a less probable | <nowiki/>; the practical example ‘Trustee for’ is already in the distributed data file, so a less probable | ||
; example: ‘john smith married susan jones’ | <nowiki/>; example: ‘john smith married susan jones’ | ||
[DualIndicator] | [DualIndicator] | ||
Line 92: | Line 92: | ||
; [Suspect] - List of suspicious words & phrases. | <nowiki/>; [Suspect] - List of suspicious words & phrases. | ||
; Format is <Word/Phrase>, <Indicator> | <nowiki/>; Format is <Word/Phrase>, <Indicator> | ||
; these words still get parsed, but the error code will identify them as vulgar, a company identifier ; or suspect. There may even be a pre-existing entry which you may later determine to be | <nowiki/>; these words still get parsed, but the error code will identify them as vulgar, a company identifier ; or suspect. There may even be a pre-existing entry which you may later determine to be | ||
; a real name. Example: my new boss is ‘Fred Scat’. Ouch. | <nowiki/>; a real name. Example: my new boss is ‘Fred Scat’. Ouch. | ||
; NOTE: no <case> parameter for this table | <nowiki/>; NOTE: no <case> parameter for this table | ||
[Suspect] | [Suspect] | ||
Line 114: | Line 114: | ||
; When an input name is flagged with a [Suspect] company indicator, you may choose | <nowiki/>; When an input name is flagged with a [Suspect] company indicator, you may choose | ||
; to pass that input into the StandardizeCompany method. The following two table | <nowiki/>; to pass that input into the StandardizeCompany method. The following two table | ||
; overrides allow you to apply special casing to the returned company. | <nowiki/>; overrides allow you to apply special casing to the returned company. | ||
; [Acronym] - These entries (4 letters or less) are NOT Acronyms and will be proper cased | <nowiki/>; [Acronym] - These entries (4 letters or less) are NOT Acronyms and will be proper cased | ||
; when passed through the StandardizeCompany method | <nowiki/>; when passed through the StandardizeCompany method | ||
; Format is <Lookup> | <nowiki/>; Format is <Lookup> | ||
; <Lookup> = A short word that you do not want uppercased like an Acronym | <nowiki/>; <Lookup> = A short word that you do not want uppercased like an Acronym | ||
; example. The following may actually represent a company name, not an acronym | <nowiki/>; example. The following may actually represent a company name, not an acronym | ||
; like ‘Duz Brothers Inc’, so we don’t want it all capitalized | <nowiki/>; like ‘Duz Brothers Inc’, so we don’t want it all capitalized | ||
[Acronym] | [Acronym] | ||
DUZ | DUZ | ||
; [Company] - Words and phrases from company names that do not follow common casing rules. | <nowiki/>; [Company] - Words and phrases from company names that do not follow common casing rules. | ||
; Format is <Company>, <Case> | <nowiki/>; Format is <Company>, <Case> | ||
; <Company> = The lookup word which requires special casing | <nowiki/>; <Company> = The lookup word which requires special casing | ||
; <Case> = The way this lookup word should be cased | <nowiki/>; <Case> = The way this lookup word should be cased | ||
; example: These entries should be identified as Companies in the [Suspect] section (see above) | <nowiki/>; example: These entries should be identified as Companies in the [Suspect] section (see above) | ||
; When the StandardizeCompany method is called, the following substitutions should be made | <nowiki/>; When the StandardizeCompany method is called, the following substitutions should be made | ||
; when the identified company is actually ‘ABC ZZx’, not ‘Abc Zzx’ | <nowiki/>; when the identified company is actually ‘ABC ZZx’, not ‘Abc Zzx’ | ||
[Company] | [Company] | ||
ABC,ABC | ABC,ABC | ||
Line 143: | Line 143: | ||
; [DualPattern] - List of dual name patterns. | <nowiki/>; [DualPattern] - List of dual name patterns. | ||
; This one is much more advanced than the others, and should not be edited without | <nowiki/>; This one is much more advanced than the others, and should not be edited without | ||
; contacting support. While editing the other above entries would affect that particular word, | <nowiki/>; contacting support. While editing the other above entries would affect that particular word, | ||
; editing here could negatively affect your entire process. | <nowiki/>; editing here could negatively affect your entire process. | ||
; Format is <Pattern>, <Counts>, <Name Types>, <Split Type> | <nowiki/>; Format is <Pattern>, <Counts>, <Name Types>, <Split Type> | ||
; P?&P?,> >,1,1 already exists and helps split ‘Mr. Smithhh and Mrs. Smithhh’ | <nowiki/>; P?&P?,> >,1,1 already exists and helps split ‘Mr. Smithhh and Mrs. Smithhh’ | ||
; or ‘Mr. Johnnn Smithhh & Dr. Maryy Lynne Smithhhh’ | <nowiki/>; or ‘Mr. Johnnn Smithhh & Dr. Maryy Lynne Smithhhh’ | ||
; ?F&?PF,,6,2 already exists and helps split ‘Smithhhh, John and Dr. Mary’ | <nowiki/>; ?F&?PF,,6,2 already exists and helps split ‘Smithhhh, John and Dr. Mary’ | ||
Line 158: | Line 158: | ||
; Although you may not find the examples here impractical, test them out on sample data | <nowiki/>; Although you may not find the examples here impractical, test them out on sample data | ||
; to see how this alternate config file changes NameObject results. And if you ever come up with | <nowiki/>; to see how this alternate config file changes NameObject results. And if you ever come up with | ||
; common edits we have over-looked, please let us know, we are always trying to make the | <nowiki/>; common edits we have over-looked, please let us know, we are always trying to make the | ||
; API even more accurate. | <nowiki/>; API even more accurate. |
Revision as of 16:44, 19 February 2014
; mdName.cfg – the NameObject configuration file
; If you’ve ever wanted to change the behavior of how the NameObject parses, genderizes, ; or creates salutations for certain names, you’ll need to understand how to edit the ; mdName.cfg file. This file is used to add, change or remove entries from the API’s ; stock name tables compiled into the distributed mdName.dat file.
; For detailed definitions and usage see the actual config file, or the documentation
; The content of this file can be used as the actual mdName.cfg file. Just save the
; unformatted text and rename to mdName.cfg. Alternately, you can just cut and paste
; the below examples into the actual file. Any line beginning with semi-colon is a
; comment, and has no effect on processing. The uncommented lines are actual
; examples of the respective name type.
; [Prefix] - List of name prefixes.
; Format is <Prefix>, <Sex>, <Dual Expansion>, <Case>
; Proprietary prefixes can cause names to be split incorrectly and name patterns to be
; misidentified.
; example – change ‘zm phil jackson’ to ‘Zen Master Phil Jackson’ (even though he isn’t) [Prefix] zm,M,,Zen Master Mr and Mrs,,Mr ans Mrs,Mr ans Mrs
; [FirstName] - List of first names (used for name splitting, genderizing).
; Format is <First Name>, <Sex>, <Misspelling>, <Rank>, <Case>
; Adding entries in this section helps split and or case uncommon or international names
; that are new to existing census or database lists.
[FirstName] Timotee,7,x,,Timotee Deshawn,7,x,,DeShawn -HARDY
; [FirstNameFix] - List of misspelled first names and their corrections. ; Format is <Misspelling>, <Correction> ; Why not just make a spelling correction above, in the [FirstName] <Case> parameter ? ; Because, sometimes we want the FirstName additions to help in splitting, but aren’t sure of a ; name correction (maybe ‘Mr. Timotee Smith’ is his correctly spelled name). ; Setting the FirstNameSpellingCorrection property tells the NameObject to also use these ; entries to correct misspelled names
[FirstNameFix] Timotee,Timothy
; [LNPrefix] - List of last name prefixes
; Format is <Last Name Prefix>, <Case>
; This example will help identify the ‘Ze’ in ‘Frank Ze Bond’ as part of the last name,
; not a middle name
[LNPrefix] ze,Ze
; [LastName] - List of last names.
; Format is <Last Name>, <Rank>, <O-Name>, <Case>
; Adding entries here is useful for special casing Last Names. It can also be used to identify
; solitary “O’s” as an indicator of an Irish Last name. Now an example like “joe o jeep” is
; assumed you want to parse this name as “Joe O’Jeep” but “Joe Ojeep” should not be parsed
; as an Irish Name. If you wanted to add an Irish last name by flagging the solitary “O” and
; a concatenated string like “Joe O Spence” and “Joe Ospence” as “Joe O’Spence” add it as
; below…
[LastName] Legrandless,,,LeGrandless ojeep,,X,Ojeep ospence,,X,O’Spence
; [Suffix] - List of name suffixes.
; Format is <Suffix>, <Prefix>, <Salutation Remove>, <Dual Name Remove>, <Case>
; Chances are, with mostly full name formats, unrecognized suffixes can get split
; into the Last Name component. By adding an entry here, we will now correctly split
; a record like ‘John Smith, Grand Poohbah
[Suffix] grand poohbah,GrP,,,Grand PoohbaH
; [DualIndicator] - List of dual name connectors.
; Format is <Dual Name Connector>, <Delete>
; the practical example ‘Trustee for’ is already in the distributed data file, so a less probable
; example: ‘john smith married susan jones’
[DualIndicator] married
; [Suspect] - List of suspicious words & phrases.
; Format is <Word/Phrase>, <Indicator>
; these words still get parsed, but the error code will identify them as vulgar, a company identifier ; or suspect. There may even be a pre-existing entry which you may later determine to be
; a real name. Example: my new boss is ‘Fred Scat’. Ouch.
; NOTE: no <case> parameter for this table
[Suspect]
frakkin,V shoes,C joe the plumber,S -scat ABC,C ZZX,C DUZ,C
; When an input name is flagged with a [Suspect] company indicator, you may choose ; to pass that input into the StandardizeCompany method. The following two table ; overrides allow you to apply special casing to the returned company.
; [Acronym] - These entries (4 letters or less) are NOT Acronyms and will be proper cased
; when passed through the StandardizeCompany method
; Format is <Lookup>
; <Lookup> = A short word that you do not want uppercased like an Acronym
; example. The following may actually represent a company name, not an acronym ; like ‘Duz Brothers Inc’, so we don’t want it all capitalized
[Acronym] DUZ
; [Company] - Words and phrases from company names that do not follow common casing rules. ; Format is <Company>, <Case> ; <Company> = The lookup word which requires special casing ; <Case> = The way this lookup word should be cased
; example: These entries should be identified as Companies in the [Suspect] section (see above) ; When the StandardizeCompany method is called, the following substitutions should be made ; when the identified company is actually ‘ABC ZZx’, not ‘Abc Zzx’ [Company] ABC,ABC ZZX,ZZx
; [DualPattern] - List of dual name patterns.
; This one is much more advanced than the others, and should not be edited without
; contacting support. While editing the other above entries would affect that particular word,
; editing here could negatively affect your entire process.
; Format is <Pattern>, <Counts>, <Name Types>, <Split Type>
; P?&P?,> >,1,1 already exists and helps split ‘Mr. Smithhh and Mrs. Smithhh’ ; or ‘Mr. Johnnn Smithhh & Dr. Maryy Lynne Smithhhh’ ; ?F&?PF,,6,2 already exists and helps split ‘Smithhhh, John and Dr. Mary’
; Although you may not find the examples here impractical, test them out on sample data ; to see how this alternate config file changes NameObject results. And if you ever come up with ; common edits we have over-looked, please let us know, we are always trying to make the ; API even more accurate.