Name Object:Config Example

From Melissa Data Wiki
Revision as of 16:41, 19 February 2014 by Tim (talk | contribs)
Jump to navigation Jump to search

; mdName.cfg – the NameObject configuration file

If you’ve ever wanted to change the behavior of how the NameObject parses, genderizes,
or creates salutations for certain names, you’ll need to understand how to edit the
mdName.cfg file. This file is used to add, change or remove entries from the API’s
stock name tables compiled into the distributed mdName.dat file.
For detailed definitions and usage see the actual config file, or the documentation


The content of this file can be used as the actual mdName.cfg file. Just save the
unformatted text and rename to mdName.cfg. Alternately, you can just cut and paste
the below examples into the actual file. Any line beginning with semi-colon is a
comment, and has no effect on processing. The uncommented lines are actual
examples of the respective name type.


[Prefix] - List of name prefixes.
Format is <Prefix>, <Sex>, <Dual Expansion>, <Case>
Proprietary prefixes can cause names to be split incorrectly and name patterns to be
misidentified.
example – change ‘zm phil jackson’ to ‘Zen Master Phil Jackson’ (even though he isn’t)

[Prefix] zm,M,,Zen Master Mr and Mrs,,Mr ans Mrs,Mr ans Mrs


[FirstName] - List of first names (used for name splitting, genderizing).
Format is <First Name>, <Sex>, <Misspelling>, <Rank>, <Case>
Adding entries in this section helps split and or case uncommon or international names
that are new to existing census or database lists.

[FirstName] Timotee,7,x,,Timotee Deshawn,7,x,,DeShawn -HARDY

[FirstNameFix] - List of misspelled first names and their corrections.
Format is <Misspelling>, <Correction>
Why not just make a spelling correction above, in the [FirstName] <Case> parameter ?
Because, sometimes we want the FirstName additions to help in splitting, but aren’t sure of a
name correction (maybe ‘Mr. Timotee Smith’ is his correctly spelled name).
Setting the FirstNameSpellingCorrection property tells the NameObject to also use these
entries to correct misspelled names

[FirstNameFix] Timotee,Timothy


[LNPrefix] - List of last name prefixes
Format is <Last Name Prefix>, <Case>
This example will help identify the ‘Ze’ in ‘Frank Ze Bond’ as part of the last name,
not a middle name

[LNPrefix] ze,Ze


[LastName] - List of last names.
Format is <Last Name>, <Rank>, <O-Name>, <Case>
Adding entries here is useful for special casing Last Names. It can also be used to identify
solitary “O’s” as an indicator of an Irish Last name. Now an example like “joe o jeep” is
assumed you want to parse this name as “Joe O’Jeep” but “Joe Ojeep” should not be parsed
as an Irish Name. If you wanted to add an Irish last name by flagging the solitary “O” and
a concatenated string like “Joe O Spence” and “Joe Ospence” as “Joe O’Spence” add it as
below…

[LastName] Legrandless,,,LeGrandless ojeep,,X,Ojeep ospence,,X,O’Spence


[Suffix] - List of name suffixes.
Format is <Suffix>, <Prefix>, <Salutation Remove>, <Dual Name Remove>, <Case>
Chances are, with mostly full name formats, unrecognized suffixes can get split
into the Last Name component. By adding an entry here, we will now correctly split
a record like ‘John Smith, Grand Poohbah

[Suffix] grand poohbah,GrP,,,Grand PoohbaH


[DualIndicator] - List of dual name connectors.
Format is <Dual Name Connector>, <Delete>
the practical example ‘Trustee for’ is already in the distributed data file, so a less probable
example
‘john smith married susan jones’

[DualIndicator] married


[Suspect] - List of suspicious words & phrases.
Format is <Word/Phrase>, <Indicator>
these words still get parsed, but the error code will identify them as vulgar, a company identifier ; or suspect. There may even be a pre-existing entry which you may later determine to be
a real name. Example
my new boss is ‘Fred Scat’. Ouch.
NOTE
no <case> parameter for this table

[Suspect]

frakkin,V shoes,C joe the plumber,S -scat ABC,C ZZX,C DUZ,C




When an input name is flagged with a [Suspect] company indicator, you may choose
to pass that input into the StandardizeCompany method. The following two table
overrides allow you to apply special casing to the returned company.


[Acronym] - These entries (4 letters or less) are NOT Acronyms and will be proper cased
when passed through the StandardizeCompany method
Format is <Lookup>
<Lookup> = A short word that you do not want uppercased like an Acronym
example. The following may actually represent a company name, not an acronym
like ‘Duz Brothers Inc’, so we don’t want it all capitalized

[Acronym] DUZ

[Company] - Words and phrases from company names that do not follow common casing rules.
Format is <Company>, <Case>
<Company> = The lookup word which requires special casing
<Case> = The way this lookup word should be cased
example
These entries should be identified as Companies in the [Suspect] section (see above)
When the StandardizeCompany method is called, the following substitutions should be made
when the identified company is actually ‘ABC ZZx’, not ‘Abc Zzx’

[Company] ABC,ABC ZZX,ZZx


[DualPattern] - List of dual name patterns.
This one is much more advanced than the others, and should not be edited without
contacting support. While editing the other above entries would affect that particular word,
editing here could negatively affect your entire process.
Format is <Pattern>, <Counts>, <Name Types>, <Split Type>
P?&P?,> >,1,1 already exists and helps split ‘Mr. Smithhh and Mrs. Smithhh’
or ‘Mr. Johnnn Smithhh & Dr. Maryy Lynne Smithhhh’
?F&?PF,,6,2 already exists and helps split ‘Smithhhh, John and Dr. Mary’



Although you may not find the examples here impractical, test them out on sample data
to see how this alternate config file changes NameObject results. And if you ever come up with
common edits we have over-looked, please let us know, we are always trying to make the
API even more accurate.