[Python-modules-commits] [python-nameparser] 01/04: Import python-nameparser_0.3.14.orig.tar.gz
Edward Betts
edward at moszumanska.debian.org
Tue Mar 22 08:03:49 UTC 2016
This is an automated email from the git hooks/post-receive script.
edward pushed a commit to branch master
in repository python-nameparser.
commit a4296bdca8fb9a08834f113f3aee3c1d25b2c6c7
Author: Edward Betts <edward at 4angle.com>
Date: Mon Mar 21 07:01:58 2016 +0000
Import python-nameparser_0.3.14.orig.tar.gz
---
LICENSE | 3 -
README.rst | 78 +++++++------
dev-requirements.txt | 12 +-
dist/nameparser-0.1.3.tar.gz | Bin 0 -> 5247 bytes
dist/nameparser-0.1.4.tar.gz | Bin 0 -> 5744 bytes
dist/nameparser-0.2.0.tar.gz | Bin 0 -> 7206 bytes
dist/nameparser-0.2.10.tar.gz | Bin 0 -> 13655 bytes
dist/nameparser-0.2.2.tar.gz | Bin 0 -> 7571 bytes
dist/nameparser-0.2.3.tar.gz | Bin 0 -> 7781 bytes
dist/nameparser-0.2.4.tar.gz | Bin 0 -> 10240 bytes
dist/nameparser-0.2.5.tar.gz | Bin 0 -> 11074 bytes
dist/nameparser-0.2.6.tar.gz | Bin 0 -> 11129 bytes
dist/nameparser-0.2.7.tar.gz | Bin 0 -> 11625 bytes
dist/nameparser-0.2.8.tar.gz | Bin 0 -> 11777 bytes
dist/nameparser-0.2.9.tar.gz | Bin 0 -> 12394 bytes
dist/nameparser-0.3.0.tar.gz | Bin 0 -> 9438 bytes
dist/nameparser-0.3.1.tar.gz | Bin 0 -> 13904 bytes
dist/nameparser-0.3.2.tar.gz | Bin 0 -> 14135 bytes
docs/conf.py | 23 +++-
docs/customize.rst | 256 +++++++++++++++++++++++-------------------
docs/index.rst | 49 ++++----
docs/release_log.rst | 10 +-
docs/usage.rst | 107 ++++++++++--------
nameparser/__init__.py | 2 +-
nameparser/config/__init__.py | 40 ++++++-
nameparser/config/suffixes.py | 22 ++--
nameparser/parser.py | 37 +++---
setup.cfg | 2 +
tests.py | 105 ++++++++++++++++-
29 files changed, 483 insertions(+), 263 deletions(-)
diff --git a/LICENSE b/LICENSE
index 9aaf664..f8454f6 100644
--- a/LICENSE
+++ b/LICENSE
@@ -1,9 +1,6 @@
Copyright Derek Gulbranson <derek73 at gmail>.
http://derekgulbranson.com/
-Parser logic based on PHP nameParser.php by G. Miernicki
-http://code.google.com/p/nameparser/
-
-----
LGPL
diff --git a/README.rst b/README.rst
index c42aa1c..facbede 100644
--- a/README.rst
+++ b/README.rst
@@ -6,19 +6,53 @@ Name Parser
.. image:: https://badge.fury.io/py/nameparser.svg
:target: http://badge.fury.io/py/nameparser
-A simple Python (3.2+ & 2.6+) module for parsing human names into their individual
-components. The HumanName class splits a name string up into name parts
-based on placement in the string and matches against known name pieces
-like titles. It joins name pieces on conjunctions and special prefixes to
-last names like "del". Titles can be chained together and include conjunctions
-to handle titles like "Asst Secretary of State". It can also try to
-correct capitalization of all upper or lowercase names.
+A simple Python (3.2+ & 2.6+) module for parsing human names into their
+individual components.
+
+* hn.title
+* hn.first
+* hn.middle
+* hn.last
+* hn.suffix
+* hn.nickname
+
+Supports 3 different comma placement variations in the input string.
+
+1. Title Firstname "Nickname" Middle Middle Lastname Suffix
+2. Lastname [Suffix], Title Firstname (Nickname) Middle Middle[,] Suffix [, Suffix]
+3. Title Firstname M Lastname [Suffix], Suffix [Suffix] [, Suffix]
+
+Instantiating the `HumanName` class with a string splits on commas and then spaces,
+classifying name parts based on placement in the string and matches against known name
+pieces like titles and suffixes.
+
+It correctly handles some common conjunctions and special prefixes to last names
+like "del". Titles and conjunctions can be chained together to handle complex
+titles like "Asst Secretary of State". It can also try to correct capitalization
+of all names that are all upper- or lowercase names.
It attempts the best guess that can be made with a simple, rule-based approach.
-Unicode is supported, but the parser is not likely to be useful for languages
-that to not share the same structure as English names. It's not perfect, but it
+Its main use case is English and it is not likely to be useful for languages
+that do not share the same structure as English names. It's not perfect, but it
gets you pretty far.
+Installation
+------------
+
+::
+
+ pip install nameparser
+
+If you want to try out the latest code from GitHub you can
+install with pip using the command below.
+
+``pip install -e git+git://github.com/derek73/python-nameparser.git#egg=nameparser``
+
+If you're looking for a web service, check out
+`eyeseast's nameparse service <https://github.com/eyeseast/nameparse>`_, a
+simple Heroku-friendly Flask wrapper for this module.
+
+
Quick Start Example
-------------------
@@ -44,15 +78,10 @@ Quick Start Example
'Juan de la Vega'
-3 different comma placement variations are supported for the string that you pass.
-
-* Title Firstname "Nickname" Middle Middle Lastname Suffix
-* Lastname [Suffix], Title Firstname (Nickname) Middle Middle[,] Suffix [, Suffix]
-* Title Firstname M Lastname [Suffix], Suffix [Suffix] [, Suffix]
-
-The parser does not make any attempt to clean the data. It mostly just splits on white
+The parser does not attempt to correct mistakes in the input. It mostly just splits on white
space and puts things in buckets based on their position in the string. This also means
-the difference between 'title' and 'suffix' is positional, not semantic. ("Pre-nominal"
+the difference between 'title' and 'suffix' is positional, not semantic. "Dr" is a title
+when it comes before the name and a suffix when it comes after. ("Pre-nominal"
and "post-nominal" would probably be better names.)
::
@@ -84,21 +113,6 @@ sets`_ of titles, prefixes, etc., or by subclassing the `HumanName` class. See t
.. _Full documentation: http://nameparser.readthedocs.org/en/latest/
-Installation
-------------
-
-``pip install nameparser``
-
-If you want to try out the latest code from GitHub you can
-install with pip using the command below.
-
-``pip install -e git+git://github.com/derek73/python-nameparser.git#egg=nameparser``
-
-If you're looking for a web service, check out
-`eyeseast's nameparse service <https://github.com/eyeseast/nameparse>`_, a
-simple Heroku-friendly Flask wrapper for this module.
-
-
Contributing
------------
diff --git a/dev-requirements.txt b/dev-requirements.txt
index cdd1ba4..0d116e5 100644
--- a/dev-requirements.txt
+++ b/dev-requirements.txt
@@ -1,7 +1,7 @@
-ipdb==0.8.1
+ipdb==0.9.0
nose==1.3.7
-Sphinx==1.3.1
-coverage==3.7.1
-ipython==4.0.0
-Pygments==2.0.2
-dill==0.2.4
+Sphinx==1.3.6
+coverage==4.0.3
+ipython==4.1.2
+Pygments==2.1.3
+dill==0.2.5
diff --git a/dist/nameparser-0.1.3.tar.gz b/dist/nameparser-0.1.3.tar.gz
new file mode 100644
index 0000000..bcecd88
Binary files /dev/null and b/dist/nameparser-0.1.3.tar.gz differ
diff --git a/dist/nameparser-0.1.4.tar.gz b/dist/nameparser-0.1.4.tar.gz
new file mode 100644
index 0000000..0d5bc52
Binary files /dev/null and b/dist/nameparser-0.1.4.tar.gz differ
diff --git a/dist/nameparser-0.2.0.tar.gz b/dist/nameparser-0.2.0.tar.gz
new file mode 100644
index 0000000..762e4b0
Binary files /dev/null and b/dist/nameparser-0.2.0.tar.gz differ
diff --git a/dist/nameparser-0.2.10.tar.gz b/dist/nameparser-0.2.10.tar.gz
new file mode 100644
index 0000000..9b16611
Binary files /dev/null and b/dist/nameparser-0.2.10.tar.gz differ
diff --git a/dist/nameparser-0.2.2.tar.gz b/dist/nameparser-0.2.2.tar.gz
new file mode 100644
index 0000000..2ea6b63
Binary files /dev/null and b/dist/nameparser-0.2.2.tar.gz differ
diff --git a/dist/nameparser-0.2.3.tar.gz b/dist/nameparser-0.2.3.tar.gz
new file mode 100644
index 0000000..9c3a582
Binary files /dev/null and b/dist/nameparser-0.2.3.tar.gz differ
diff --git a/dist/nameparser-0.2.4.tar.gz b/dist/nameparser-0.2.4.tar.gz
new file mode 100644
index 0000000..e69c585
Binary files /dev/null and b/dist/nameparser-0.2.4.tar.gz differ
diff --git a/dist/nameparser-0.2.5.tar.gz b/dist/nameparser-0.2.5.tar.gz
new file mode 100644
index 0000000..4763b8b
Binary files /dev/null and b/dist/nameparser-0.2.5.tar.gz differ
diff --git a/dist/nameparser-0.2.6.tar.gz b/dist/nameparser-0.2.6.tar.gz
new file mode 100644
index 0000000..48ca49f
Binary files /dev/null and b/dist/nameparser-0.2.6.tar.gz differ
diff --git a/dist/nameparser-0.2.7.tar.gz b/dist/nameparser-0.2.7.tar.gz
new file mode 100644
index 0000000..d8fb242
Binary files /dev/null and b/dist/nameparser-0.2.7.tar.gz differ
diff --git a/dist/nameparser-0.2.8.tar.gz b/dist/nameparser-0.2.8.tar.gz
new file mode 100644
index 0000000..0db8a4b
Binary files /dev/null and b/dist/nameparser-0.2.8.tar.gz differ
diff --git a/dist/nameparser-0.2.9.tar.gz b/dist/nameparser-0.2.9.tar.gz
new file mode 100644
index 0000000..5c5d467
Binary files /dev/null and b/dist/nameparser-0.2.9.tar.gz differ
diff --git a/dist/nameparser-0.3.0.tar.gz b/dist/nameparser-0.3.0.tar.gz
new file mode 100644
index 0000000..f2a5672
Binary files /dev/null and b/dist/nameparser-0.3.0.tar.gz differ
diff --git a/dist/nameparser-0.3.1.tar.gz b/dist/nameparser-0.3.1.tar.gz
new file mode 100644
index 0000000..3e290d0
Binary files /dev/null and b/dist/nameparser-0.3.1.tar.gz differ
diff --git a/dist/nameparser-0.3.2.tar.gz b/dist/nameparser-0.3.2.tar.gz
new file mode 100644
index 0000000..5e9812d
Binary files /dev/null and b/dist/nameparser-0.3.2.tar.gz differ
diff --git a/docs/conf.py b/docs/conf.py
index 0595889..09c29c1 100644
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -104,7 +104,28 @@ pygments_style = 'sphinx'
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
-html_theme = 'default'
+html_theme = 'alabaster'
+
+import alabaster
+
+html_theme_path = [alabaster.get_path()]
+extensions += ['alabaster']
+html_theme = 'alabaster'
+html_sidebars = {
+ '**': [
+ 'about.html',
+ 'navigation.html',
+ 'relations.html',
+ 'searchbox.html',
+ 'donate.html',
+ ]
+}
+html_theme_options = {
+ 'github_user': 'derek73',
+ 'github_repo': 'python-nameparser',
+ 'travis_button': True,
+ 'analytics_id': 'UA-339019-11',
+}
# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
diff --git a/docs/customize.rst b/docs/customize.rst
index 68d0bd0..7c92aca 100644
--- a/docs/customize.rst
+++ b/docs/customize.rst
@@ -1,62 +1,13 @@
-Pre-processing
-=================
-
-
-Name buckets
-++++++++++++++
-
-Each attribute has a corresponding ordered list of name pieces.
-
-* o.title_list
-* o.first_list
-* o.middle_list
-* o.last_list
-* o.suffix_list
-* o.nickname_list
-
-If you're doing pre- or post-processing you may wish to manipulate these lists directly.
-The strings returned by the attribute names just join these lists with spaces.
-
-::
-
- >>> hn = HumanName("Juan Q. Xavier Velasquez y Garcia, Jr.")
- >>> hn.middle_list
- [u'Q.', u'Xavier']
- >>> hn.middle_list += ["Ricardo"]
- >>> hn.middle_list
- [u'Q.', u'Xavier', 'Ricardo']
-
-
-You can also replace any name bucket's contents by assigning a string or a list
-directly to the attribute.
-
-::
-
- >>> hn = HumanName("Dr. John A. Kenneth Doe")
- >>> hn.title = ["Associate","Professor"]
- >>> hn.suffix = "Md."
- >>> hn.suffix
- <HumanName : [
- title: 'Associate Processor'
- first: 'John'
- middle: 'A. Kenneth'
- last: 'Doe'
- suffix: 'Md.'
- nickname: ''
- ]>
-
-
Customizing the Parser with Your Own Configuration
==================================================
-Recognition of titles, prefixes, suffixes and conjunctions is provided by
+Recognition of titles, prefixes, suffixes and conjunctions is handled by
matching the lower case characters of a name piece with pre-defined sets
-of strings located in :py:mod:`nameparser.config`. You can easily adjust
+of strings located in :py:mod:`nameparser.config`. You can adjust
these predefined sets to help fine tune the parser for your dataset.
-
-Changing the Predefined Variables
-+++++++++++++++++++++++++++++++++
+Changing the Parser Constants
+-----------------------------
There are a few ways to adjust the parser configuration depending on your
needs. The config is available in two places.
@@ -79,22 +30,47 @@ The other is the ``C`` attribute of a ``HumanName`` instance, e.g.
>>> hn.C
<Constants() instance>
-Both places are usually a reference to the same shared module-level
-:py:class:`~nameparser.config.Constants` instance, depending on how you
+Both places are usually a reference to the same shared module-level
+:py:class:`~nameparser.config.CONSTANTS` instance, depending on how you
instantiate the :py:class:`~nameparser.parser.HumanName` class (see below).
-Take a look at the :py:mod:`nameparser.config` documentation to see what's
-in the constants. Here's a quick walk through of some examples where you
-might want to adjust them.
+
+
+Editable attributes of nameparser.config.CONSTANTS
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+* :py:attr:`~nameparser.config.Constants.titles` - Pieces that come before the name. Cannot include things that may be first names
+* :py:attr:`~nameparser.config.Constants.first_name_titles` - Titles that, when followed by a single name, that name is a first name, e.g. "King David"
+* :py:attr:`~nameparser.config.Constants.suffix_acronyms` - Pieces that come at the end of the name that may or may not have periods separating the letters, e.g. "m.d."
+* :py:attr:`~nameparser.config.Constants.suffix_not_acronyms` - Pieces that come at the end of the name that never have periods separating the letters, e.g. "Jr."
+* :py:attr:`~nameparser.config.Constants.conjunctions` - Connectors like "and" that join the preceeding piece to the following piece.
+* :py:attr:`~nameparser.config.Constants.prefixes` - Connectors like "del" and "bin" that join to the following piece but not the preceeding
+* :py:attr:`~nameparser.config.Constants.capitalization_exceptions` - Dictionary of pieces that do not capitalize the first letter, e.g. "Ph.D"
+* :py:attr:`~nameparser.config.Constants.regexes` - Regular expressions used to find words, initials, nicknames, etc.
+
+Each set of constants comes with `add()` and `remove()` methods for tuning
+the constants for your project. These methods automatically lower case and
+remove punctuation to normalize them for comparison.
+
+Other editable attributes
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+* :py:attr:`~nameparser.config.Constants.string_format`
+* :py:attr:`~nameparser.config.Constants.empty_attribute_default`
+
Parser Customization Examples
-+++++++++++++++++++++++++++++
+-----------------------------
+
+Take a look at the :py:mod:`nameparser.config` documentation to see what's
+in the constants. Here's a quick walk through of some examples where you
+might want to adjust them.
"Hon" is a common abbreviation for "Honorable", a title used when
addressing judges, and is included in the default tiles constants. This
means it will never be considered a first name, because titles are the
-pieces before first names.
+pieces before first names.
But "Hon" is also sometimes a first name. If your dataset contains more
"Hon"s than "Honorable"s, you may wish to remove it from the titles
@@ -107,25 +83,25 @@ constant so that "Hon" can be parsed as a first name.
>>> hn = HumanName("Hon Solo")
>>> hn
<HumanName : [
- title: 'Hon'
- first: ''
- middle: ''
- last: 'Solo'
- suffix: ''
- nickname: ''
+ title: 'Hon'
+ first: ''
+ middle: ''
+ last: 'Solo'
+ suffix: ''
+ nickname: ''
]>
>>> from nameparser.config import CONSTANTS
>>> CONSTANTS.titles.remove('hon')
- SetManager(set([u'msgt', ..., u'adjutant']))
+ SetManager({'right', ..., 'tax'})
>>> hn = HumanName("Hon Solo")
>>> hn
<HumanName : [
- title: ''
- first: 'Hon'
- middle: ''
- last: 'Solo'
- suffix: ''
- nickname: ''
+ title: ''
+ first: 'Hon'
+ middle: ''
+ last: 'Solo'
+ suffix: ''
+ nickname: ''
]>
@@ -133,7 +109,8 @@ constant so that "Hon" can be parsed as a first name.
constant. But in some contexts it is more common as a title. If you would
like "Dean" to be parsed as a title, simply add it to the titles constant.
-You can pass multiple strings to both the ``add()`` and ``remove()``
+You can pass multiple strings to both the :py:func:`~nameparser.config.SetManager.add`
+and :py:func:`~nameparser.config.SetManager.remove`
methods and each string will be added or removed. Both functions
automatically normalize the strings for the parser's comparison method by
making them lower case and removing periods.
@@ -144,20 +121,20 @@ making them lower case and removing periods.
>>> from nameparser import HumanName
>>> from nameparser.config import CONSTANTS
>>> CONSTANTS.titles.add('dean', 'Chemistry')
- SetManager(set([u'msgt', ..., u'adjutant']))
+ SetManager({'right', ..., 'tax'})
>>> hn = HumanName("Assoc Dean of Chemistry Robert Johns")
>>> hn
<HumanName : [
- title: 'Assoc Dean of Chemistry'
- first: 'Robert'
- middle: ''
- last: 'Johns'
- suffix: ''
- nickname: ''
+ title: 'Assoc Dean of Chemistry'
+ first: 'Robert'
+ middle: ''
+ last: 'Johns'
+ suffix: ''
+ nickname: ''
]>
-Parser Customizations Are Module-Wide
+Parser Customizations Are Module-Wide
+++++++++++++++++++++++++++++++++++++
When you modify the configuration, by default this will modify the behavior all
@@ -171,16 +148,16 @@ the config on one instance could modify the behavior of another instance.
>>> from nameparser import HumanName
>>> instance = HumanName("")
>>> instance.C.titles.add('dean')
- SetManager(set([u'msgt', ..., u'adjutant']))
+ SetManager({'right', ..., 'tax'})
>>> other_instance = HumanName("Dean Robert Johns")
>>> other_instance # Dean parses as title
<HumanName : [
- title: 'Dean'
- first: 'Robert'
- middle: ''
- last: 'Johns'
- suffix: ''
- nickname: ''
+ title: 'Dean'
+ first: 'Robert'
+ middle: ''
+ last: 'Johns'
+ suffix: ''
+ nickname: ''
]>
@@ -198,16 +175,16 @@ reference to the module-level config values with the behavior described above.
>>> instance.has_own_config
False
>>> instance.C.titles.add('dean')
- SetManager(set([u'msgt', ..., u'adjutant']))
+ SetManager({'right', ..., 'tax'})
>>> other_instance = HumanName("Dean Robert Johns", None) # <-- pass None for per-instance config
>>> other_instance
<HumanName : [
- title: ''
- first: 'Dean'
- middle: 'Robert'
- last: 'Johns'
- suffix: ''
- nickname: ''
+ title: ''
+ first: 'Dean'
+ middle: 'Robert'
+ last: 'Johns'
+ suffix: ''
+ nickname: ''
]>
>>> other_instance.has_own_config
True
@@ -217,9 +194,9 @@ Config Changes May Need Parse Refresh
+++++++++++++++++++++++++++++++++++++
The full name is parsed upon assignment to the ``full_name`` attribute or
-instantiation. Sometimes after making changes to configuration or other inner
+instantiation. Sometimes after making changes to configuration or other inner
data after assigning the full name, the name will need to be re-parsed with the
-:py:func:`~nameparser.parser.HumanName.parse_full_name()` method before you see
+:py:func:`~nameparser.parser.HumanName.parse_full_name()` method before you see
those changes with ``repr()``.
::
@@ -229,33 +206,78 @@ those changes with ``repr()``.
>>> hn = HumanName("Dean Robert Johns")
>>> hn
<HumanName : [
- title: ''
- first: 'Dean'
- middle: 'Robert'
- last: 'Johns'
- suffix: ''
- nickname: ''
+ title: ''
+ first: 'Dean'
+ middle: 'Robert'
+ last: 'Johns'
+ suffix: ''
+ nickname: ''
]>
>>> CONSTANTS.titles.add('dean')
- SetManager(set([u'msgt', ..., u'adjutant']))
+ SetManager({'right', ..., 'tax'})
>>> hn
<HumanName : [
- title: ''
- first: 'Dean'
- middle: 'Robert'
- last: 'Johns'
- suffix: ''
- nickname: ''
+ title: ''
+ first: 'Dean'
+ middle: 'Robert'
+ last: 'Johns'
+ suffix: ''
+ nickname: ''
]>
>>> hn.parse_full_name()
>>> hn
<HumanName : [
- title: 'Dean'
- first: 'Robert'
- middle: ''
- last: 'Johns'
- suffix: ''
- nickname: ''
+ title: 'Dean'
+ first: 'Robert'
+ middle: ''
+ last: 'Johns'
+ suffix: ''
+ nickname: ''
]>
+Adjusting names after parsing them
+===================================
+
+Each attribute has a corresponding ordered list of name pieces. If you're doing
+pre- or post-processing you may wish to manipulate these lists directly.
+The strings returned by the attribute names just join these lists with spaces.
+
+
+* o.title_list
+* o.first_list
+* o.middle_list
+* o.last_list
+* o.suffix_list
+* o.nickname_list
+
+::
+
+ >>> hn = HumanName("Juan Q. Xavier Velasquez y Garcia, Jr.")
+ >>> hn.middle_list
+ [u'Q.', u'Xavier']
+ >>> hn.middle_list += ["Ricardo"]
+ >>> hn.middle_list
+ [u'Q.', u'Xavier', 'Ricardo']
+
+
+You can also replace any name bucket's contents by assigning a string or a list
+directly to the attribute.
+
+::
+
+ >>> hn = HumanName("Dr. John A. Kenneth Doe")
+ >>> hn.title = ["Associate","Professor"]
+ >>> hn.suffix = "Md."
+ >>> hn.suffix
+ <HumanName : [
+ title: 'Associate Processor'
+ first: 'John'
+ middle: 'A. Kenneth'
+ last: 'Doe'
+ suffix: 'Md.'
+ nickname: ''
+ ]>
+
+
+
diff --git a/docs/index.rst b/docs/index.rst
index 4eba9d9..f644d8f 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -9,42 +9,39 @@ Python Human Name Parser
Version |release|
A simple Python module for parsing human names into their individual
-components. It attempts the best guess that can be made with a simple, rule-based
+components.
+
+* hn.title
+* hn.first
+* hn.middle
+* hn.last
+* hn.suffix
+* hn.nickname
+
+Supports 3 different comma placement variations in the input string.
+
+1. Title Firstname "Nickname" Middle Middle Lastname Suffix
+2. Lastname [Suffix], Title Firstname (Nickname) Middle Middle[,] Suffix [, Suffix]
+3. Title Firstname M Lastname [Suffix], Suffix [Suffix] [, Suffix]
+
+
+It attempts the best guess that can be made with a simple, rule-based
approach. It's not perfect, but it gets you pretty far.
Its main use case is English, but it may be useful for other latin-based languages, especially
-if you are willing to `customize it`_. Unicode is supported, but the
-parser is not likely to be useful for languages that to not share the same structure
-as English names.
+if you are willing to `customize it`_, but it is not likely to be useful for languages
+that do not share the same structure as English names.
.. _customize it: customize.html
-The HumanName class splits a name string up into name parts based
-on placement in the string and matches against known name pieces like titles.
-It joins name pieces on conjunctions and special prefixes to last names like
+Instantiating the `HumanName` class with a string splits on commas and then spaces,
+classifying name parts based on placement in the string and matches against known name
+pieces like titles. It joins name pieces on conjunctions and special prefixes to last names like
"del". Titles can be chained together and include conjunctions to handle
titles like "Asst Secretary of State". It can also try to correct
capitalization.
-
-
-HumanName Instance Attributes
------------------------------
-
-* o.title
-* o.first
-* o.middle
-* o.last
-* o.suffix
-* o.nickname
-
-Supports 3 different comma placement variations in the input string.
-
-* Title Firstname "Nickname" Middle Middle Lastname Suffix
-* Lastname [Suffix], Title Firstname (Nickname) Middle Middle[,] Suffix [, Suffix]
-* Title Firstname M Lastname [Suffix], Suffix [Suffix] [, Suffix]
-
-When there is ambiguity that cannot be resolved by a rule-based approach,
+It does not attempt to correct input mistakes. When there is ambiguity that cannot be resolved by a rule-based approach,
HumanName prefers to handle the most common cases correctly. For example,
"Dean" is not parsed as title because it is more common as a first name
(You can customize this behavior though, see `Parser Customization Examples`_).
diff --git a/docs/release_log.rst b/docs/release_log.rst
index aba299a..f4f7e39 100644
--- a/docs/release_log.rst
+++ b/docs/release_log.rst
@@ -1,5 +1,13 @@
Release Log
===========
+* 0.3.14 - March 18, 2016
+ - Add `CONSTANTS.empty_attribute_default` to customize value returned for empty attributes (#44)
+* 0.3.13 - March 14, 2016
+ - Improve string format handling (#41)
+* 0.3.12 - March 13, 2016
+ - Fix first name clash with suffixes (#42)
+ - Fix encoding of constants added via the python shell
+ - Add "MSC" to suffixes, fix #41
* 0.3.11 - October 17, 2015
- Fix bug capitalization exceptions (#39)
* 0.3.10 - September 19, 2015
@@ -30,7 +38,7 @@ Release Log
* 0.3.3 - Aug 4, 2014
- Allow suffixes to be chained (#8)
- Handle trailing suffix in last name comma format (#3). Removes support for titles
- with periods but no spaces in them, e.g. "Lt.Gen.". (#21)
+ with periods but no spaces in them, e.g. "Lt.Gen.". (#21)
* 0.3.2 - July 16, 2014
- Retain original string in "original" attribute.
- Collapse white space when using custom string format.
diff --git a/docs/usage.rst b/docs/usage.rst
index fcf5376..af33ae4 100644
--- a/docs/usage.rst
+++ b/docs/usage.rst
@@ -1,8 +1,10 @@
Using the HumanName Parser
==========================
-Example
--------
+Example Usage
+-------------
+
+The examples use Python 3, but Python 2.6+ is supported.
.. doctest::
:options: +NORMALIZE_WHITESPACE
@@ -10,17 +12,17 @@ Example
>>> from nameparser import HumanName
>>> name = HumanName("Dr. Juan Q. Xavier de la Vega III")
>>> name.title
- u'Dr.'
+ 'Dr.'
>>> name["title"]
- u'Dr.'
+ 'Dr.'
>>> name.first
- u'Juan'
+ 'Juan'
>>> name.middle
- u'Q. Xavier'
+ 'Q. Xavier'
>>> name.last
- u'de la Vega'
+ 'de la Vega'
>>> name.suffix
- u'III'
+ 'III'
>>> name.full_name = "Juan Q. Xavier Velasquez y Garcia, Jr."
>>> name
<HumanName : [
@@ -33,7 +35,7 @@ Example
]>
>>> name.middle = "Jason Alexander"
>>> name.middle
- u'Jason Alexander'
+ 'Jason Alexander'
>>> name
<HumanName : [
title: ''
@@ -43,14 +45,14 @@ Example
suffix: 'Jr.'
nickname: ''
]>
- >>> name.suffix = ["custom","values"]
- >>> name.suffix
- u'custom values'
+ >>> name.middle = ["custom","values"]
+ >>> name.middle
+ 'custom values'
>>> name.full_name = 'Doe-Ray, Jonathan "John" A. Harris'
>>> name.as_dict()
- {u'last': u'Doe-Ray', u'suffix': u'', u'title': u'', u'middle': u'A. Harris', u'nickname': u'John', u'first': u'Jonathan'}
+ {'last': 'Doe-Ray', 'suffix': '', 'title': '', 'middle': 'A. Harris', 'nickname': 'John', 'first': 'Jonathan'}
>>> name.as_dict(False) # add False to hide keys with empty values
- {u'middle': u'A. Harris', u'nickname': u'John', u'last': u'Doe-Ray', u'first': u'Jonathan'}
+ {'middle': 'A. Harris', 'nickname': 'John', 'last': 'Doe-Ray', 'first': 'Jonathan'}
>>> name = HumanName("Dr. Juan Q. Xavier de la Vega III")
>>> name2 = HumanName("de la vega, dr. juan Q. xavier III")
>>> name == name2
@@ -58,25 +60,24 @@ Example
>>> len(name)
5
>>> list(name)
- [u'Dr.', u'Juan', u'Q. Xavier', u'de la Vega', u'III']
+ ['Dr.', 'Juan', 'Q. Xavier', 'de la Vega', 'III']
>>> name[1:-2]
- [u'Juan', u'Q. Xavier', u'de la Vega']
+ ['Juan', 'Q. Xavier', 'de la Vega']
>>> name = HumanName('bob v. de la macdole-eisenhower phd')
>>> name.capitalize()
- >>> unicode(name)
- u'Bob V. de la MacDole-Eisenhower Ph.D.'
+ >>> str(name)
+ 'Bob V. de la MacDole-Eisenhower Ph.D.'
>>> # Don't touch mixed case names
>>> name = HumanName('Shirley Maclaine')
>>> name.capitalize()
- >>> unicode(name)
- u'Shirley Maclaine'
-
+ >>> str(name)
+ 'Shirley Maclaine'
Capitalization Support
----------------------
The HumanName class can try to guess the correct capitalization of name
-entered in all upper or lower case.
+entered in all upper or lower case.
Capitalize the name.
@@ -87,17 +88,17 @@ entered in all upper or lower case.
>>> name = HumanName("bob v. de la macdole-eisenhower phd")
>>> name.capitalize()
- >>> unicode(name)
- u'Bob V. de la MacDole-Eisenhower Ph.D.'
+ >>> str(name)
+ 'Bob V. de la MacDole-Eisenhower Ph.D.'
It will not adjust the case of mixed case names.
-Handling Nicknames
+Nickname Handling
------------------
The content of parenthesis or double quotes in the name will be
-available from the nickname attribute. (Added in v0.2.9)
+available from the nickname attribute.
.. doctest:: nicknames
:options: +NORMALIZE_WHITESPACE
@@ -105,31 +106,45 @@ available from the nickname attribute. (Added in v0.2.9)
>>> name = HumanName('Jonathan "John" A. Smith')
>>> name
<HumanName : [
- title: ''
- first: 'Jonathan'
- middle: 'A.'
- last: 'Smith'
- suffix: ''
- nickname: 'John'
+ title: ''
+ first: 'Jonathan'
+ middle: 'A.'
+ last: 'Smith'
+ suffix: ''
+ nickname: 'John'
]>
+Change the output string with string formatting
+-----------------------------------------------
-String Format
--------------
+The string representation of a `HumanName` instance is controlled by its `string_format` attribute. The default value, "{title} {first} {middle} {last} {suffix} ({nickname})", includes parenthesis around nicknames. Trailing commas and empty quotes and parenthesis are automatically removed if the name has no nickname pieces.
-The format of the strings returned with ``str()`` or ``unicode()`` can be adjusted
-using standard python string formatting. The string's ``format()``
-method will be passed a dictionary of names.
+You can change the default formatting for all `HumanName` instances by setting a new
+:py:attr:`~nameparser.config.Constants.string_format` value on the shared
+:py:class:`~nameparser.config.CONSTANTS` configuration instance.
.. doctest:: string format
- >>> name = HumanName("Rev John A. Kenneth Doe III")
- >>> str(name)
- 'Rev John A. Kenneth Doe III'
- >>> name.string_format = "{last}, {title} {first} {middle}, {suffix}"
- >>> str(name)
- 'Doe, Rev John A. Kenneth, III'
- >>> name.string_format = "{first} {last}"
- >>> str(name)
- 'John Doe'
+ >>> from nameparser.config import CONSTANTS
+ >>> CONSTANTS.string_format = "{title} {first} ({nickname}) {middle} {last} {suffix}"
+ >>> name = HumanName('Robert Johnson')
+ >>> str(name)
+ 'Robert Johnson'
+ >>> name = HumanName('Robert "Rob" Johnson')
+ >>> str(name)
+ 'Robert (Rob) Johnson'
+
+You can control the order and presense of any name fields by changing the
+:py:attr:`~nameparser.config.Constants.string_format` attribute of the shared CONSTANTS instance.
+Don't want to include nicknames in your output? No problem. Just omit that keyword from the
+`string_format` attribute.
+
+.. doctest:: string format
+
+ >>> from nameparser.config import CONSTANTS
+ >>> CONSTANTS.string_format = "{title} {first} {last}"
+ >>> name = HumanName("Dr. Juan Ruiz de la Vega III (Doc Vega)")
+ >>> str(name)
+ 'Dr. Juan de la Vega'
+
diff --git a/nameparser/__init__.py b/nameparser/__init__.py
index 56ecebe..0c57706 100644
--- a/nameparser/__init__.py
+++ b/nameparser/__init__.py
@@ -1,4 +1,4 @@
-VERSION = (0, 3, 11)
+VERSION = (0, 3, 14)
__version__ = '.'.join(map(str, VERSION))
__author__ = "Derek Gulbranson"
__author_email__ = 'derek73 at gmail.com'
diff --git a/nameparser/config/__init__.py b/nameparser/config/__init__.py
index b48b82c..d358504 100644
--- a/nameparser/config/__init__.py
+++ b/nameparser/config/__init__.py
@@ -30,7 +30,9 @@ unexpected results. See `Customizing the Parser <customize.html>`_.
"""
from __future__ import unicode_literals
import collections
+import sys
+from nameparser.util import binary_type
from nameparser.util import lc
from nameparser.config.prefixes import PREFIXES
from nameparser.config.capitalization import CAPITALIZATION_EXCEPTIONS
@@ -88,7 +90,10 @@ class SetManager(collections.Set):
Add the lower case and no-period version of the string arguments to the set.
Returns ``self`` for chaining.
"""
- [self.elements.add(lc(s)) for s in strings]
+ for s in strings:
+ if type(s) == binary_type:
+ s = s.decode(sys.stdin.encoding)
+ self.elements.add(lc(s))
return self
def remove(self, *strings):
@@ -128,18 +133,43 @@ class Constants(object):
:param set titles:
:py:attr:`titles` wrapped with :py:class:`SetManager`.
:param set first_name_titles:
- :py:attr:`first_name_titles` wrapped with :py:class:`SetManager`.
- :param set suffixes:
- :py:attr:`suffixes` wrapped with :py:class:`SetManager`.
+ :py:attr:`~titles.FIRST_NAME_TITLES` wrapped with :py:class:`SetManager`.
+ :param set suffix_acronyms:
+ :py:attr:`~suffixes.SUFFIX_ACRONYMS` wrapped with :py:class:`SetManager`.
+ :param set suffix_not_acronyms:
+ :py:attr:`~suffixes.SUFFIX_NOT_ACRONYMS` wrapped with :py:class:`SetManager`.
:param set conjunctions:
:py:attr:`conjunctions` wrapped with :py:class:`SetManager`.
:type capitalization_exceptions: tuple or dict
:param capitalization_exceptions:
- :py:attr:`capitalization_exceptions` wrapped with :py:class:`TupleManager`.
+ :py:attr:`~capitalization.CAPITALIZATION_EXCEPTIONS` wrapped with :py:class:`TupleManager`.
:type regexes: tuple or dict
:param regexes:
:py:attr:`regexes` wrapped with :py:class:`TupleManager`.
"""
+
+ string_format = "{title} {first} {middle} {last} {suffix} ({nickname})"
+ """
+ The default string format use for all new `HumanName` instances.
+ """
+ empty_attribute_default = ''
+ """
+ Default return value for empty attributes. Setting this to something other than empty
+ string will causes :py:attr:`string_format` not to work.
+
+ .. doctest::
+
+ >>> from nameparser.config import CONSTANTS
+ >>> CONSTANTS.empty_attribute_default = None
+ >>> name = HumanName("John Doe")
+ >>> name.title
+ None
+ >>>name.first
+ 'John'
+
+ """
... 327 lines suppressed ...
--
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/python-modules/packages/python-nameparser.git
More information about the Python-modules-commits
mailing list