[Python-modules-commits] [python-parsimonious] 01/08: Imported Upstream version 0.6.2

ChangZhuo Chen czchen at moszumanska.debian.org
Fri Jul 8 09:39:59 UTC 2016


This is an automated email from the git hooks/post-receive script.

czchen pushed a commit to branch master
in repository python-parsimonious.

commit 948f0ed48c7ae4ffdb131ac69e5164eab3b583dd
Author: ChangZhuo Chen (陳昌倬) <czchen at debian.org>
Date:   Fri Jul 8 17:12:57 2016 +0800

    Imported Upstream version 0.6.2
---
 LICENSE                                    |  19 ++
 MANIFEST.in                                |   2 +
 PKG-INFO                                   | 450 +++++++++++++++++++++++++++++
 README.rst                                 | 424 +++++++++++++++++++++++++++
 parsimonious.egg-info/PKG-INFO             | 450 +++++++++++++++++++++++++++++
 parsimonious.egg-info/SOURCES.txt          |  20 ++
 parsimonious.egg-info/dependency_links.txt |   1 +
 parsimonious.egg-info/top_level.txt        |   1 +
 parsimonious/__init__.py                   |   9 +
 parsimonious/exceptions.py                 |  95 ++++++
 parsimonious/expressions.py                | 428 +++++++++++++++++++++++++++
 parsimonious/grammar.py                    | 450 +++++++++++++++++++++++++++++
 parsimonious/nodes.py                      | 316 ++++++++++++++++++++
 parsimonious/tests/__init__.py             |   0
 parsimonious/tests/benchmarks.py           |  93 ++++++
 parsimonious/tests/test_benchmarks.py      |  46 +++
 parsimonious/tests/test_expressions.py     | 261 +++++++++++++++++
 parsimonious/tests/test_grammar.py         | 383 ++++++++++++++++++++++++
 parsimonious/tests/test_nodes.py           | 145 ++++++++++
 parsimonious/utils.py                      |  18 ++
 setup.cfg                                  |   5 +
 setup.py                                   |  44 +++
 22 files changed, 3660 insertions(+)

diff --git a/LICENSE b/LICENSE
new file mode 100644
index 0000000..0a523be
--- /dev/null
+++ b/LICENSE
@@ -0,0 +1,19 @@
+Copyright (c) 2012 Erik Rose
+
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+of the Software, and to permit persons to whom the Software is furnished to do
+so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/MANIFEST.in b/MANIFEST.in
new file mode 100644
index 0000000..a5021c6
--- /dev/null
+++ b/MANIFEST.in
@@ -0,0 +1,2 @@
+include README.rst
+include LICENSE
diff --git a/PKG-INFO b/PKG-INFO
new file mode 100644
index 0000000..2944490
--- /dev/null
+++ b/PKG-INFO
@@ -0,0 +1,450 @@
+Metadata-Version: 1.1
+Name: parsimonious
+Version: 0.6.2
+Summary: (Soon to be) the fastest pure-Python PEG parser I could muster
+Home-page: https://github.com/erikrose/parsimonious
+Author: Erik Rose
+Author-email: erikrose at grinchcentral.com
+License: MIT
+Description: ============
+        Parsimonious
+        ============
+        
+        Parsimonious aims to be the fastest arbitrary-lookahead parser written in pure
+        Python—and the most usable. It's based on parsing expression grammars (PEGs),
+        which means you feed it a simplified sort of EBNF notation. Parsimonious was
+        designed to undergird a MediaWiki parser that wouldn't take 5 seconds or a GB
+        of RAM to do one page, but it's applicable to all sorts of languages.
+        
+        
+        Goals
+        =====
+        
+        * Speed
+        * Frugal RAM use
+        * Minimalistic, understandable, idiomatic Python code
+        * Readable grammars
+        * Extensible grammars
+        * Complete test coverage
+        * Separation of concerns. Some Python parsing kits mix recognition with
+          instructions about how to turn the resulting tree into some kind of other
+          representation. This is limiting when you want to do several different things
+          with a tree: for example, render wiki markup to HTML *or* to text.
+        * Good error reporting. I want the parser to work *with* me as I develop a
+          grammar.
+        
+        
+        Example Usage
+        =============
+        
+        Here's how to build a simple grammar::
+        
+            >>> from parsimonious.grammar import Grammar
+            >>> grammar = Grammar(
+            ...     """
+            ...     bold_text  = bold_open text bold_close
+            ...     text       = ~"[A-Z 0-9]*"i
+            ...     bold_open  = "(("
+            ...     bold_close = "))"
+            ...     """)
+        
+        You can have forward references and even right recursion; it's all taken care
+        of by the grammar compiler. The first rule is taken to be the default start
+        symbol, but you can override that.
+        
+        Next, let's parse something and get an abstract syntax tree::
+        
+            >>> print grammar.parse('((bold stuff))')
+            <Node called "bold_text" matching "((bold stuff))">
+                <Node called "bold_open" matching "((">
+                <RegexNode called "text" matching "bold stuff">
+                <Node called "bold_close" matching "))">
+        
+        You'd typically then use a ``nodes.NodeVisitor`` subclass (see below) to walk
+        the tree and do something useful with it.
+        
+        
+        Status
+        ======
+        
+        * Everything that exists works. Test coverage is good.
+        * I don't plan on making any backward-incompatible changes to the rule syntax
+          in the future, so you can write grammars with confidence.
+        * It may be slow and use a lot of RAM; I haven't measured either yet. However,
+          I have yet to begin optimizing in earnest.
+        * Error reporting is now in place. ``repr`` methods of expressions, grammars,
+          and nodes are clear and helpful as well. The ``Grammar`` ones are
+          even round-trippable!
+        * The grammar extensibility story is underdeveloped at the moment. You should
+          be able to extend a grammar by simply concatening more rules onto the
+          existing ones; later rules of the same name should override previous ones.
+          However, this is untested and may not be the final story.
+        * Sphinx docs are coming, but the docstrings are quite useful now.
+        * Note that there may be API changes until we get to 1.0, so be sure to pin to
+          the version you're using.
+        
+        Coming Soon
+        -----------
+        
+        * Optimizations to make Parsimonious worthy of its name
+        * Tighter RAM use
+        * Better-thought-out grammar extensibility story
+        * Amazing grammar debugging
+        
+        
+        A Little About PEG Parsers
+        ==========================
+        
+        PEG parsers don't draw a distinction between lexing and parsing; everything is
+        done at once. As a result, there is no lookahead limit, as there is with, for
+        instance, Yacc. And, due to both of these properties, PEG grammars are easier
+        to write: they're basically just a more practical dialect of EBNF. With
+        caching, they take O(grammar size * text length) memory (though I plan to do
+        better), but they run in O(text length) time.
+        
+        More Technically
+        ----------------
+        
+        PEGs can describe a superset of *LL(k)* languages, any deterministic *LR(k)*
+        language, and many others—including some that aren't context-free
+        (http://www.brynosaurus.com/pub/lang/peg.pdf). They can also deal with what
+        would be ambiguous languages if described in canonical EBNF. They do this by
+        trading the ``|`` alternation operator for the ``/`` operator, which works the
+        same except that it makes priority explicit: ``a / b / c`` first tries matching
+        ``a``. If that fails, it tries ``b``, and, failing that, moves on to ``c``.
+        Thus, ambiguity is resolved by always yielding the first successful recognition.
+        
+        
+        Writing Grammars
+        ================
+        
+        Grammars are defined by a series of rules. The syntax should be familiar to
+        anyone who uses regexes or reads programming language manuals. An example will
+        serve best::
+        
+            my_grammar = Grammar(r"""
+                styled_text = bold_text / italic_text
+                bold_text   = "((" text "))"
+                italic_text = "''" text "''"
+                text        = ~"[A-Z 0-9]*"i
+                """)
+        
+        You can wrap a rule across multiple lines if you like; the syntax is very
+        forgiving.
+        
+        
+        Syntax Reference
+        ----------------
+        
+        ====================    ========================================================
+        ``"some literal"``      Used to quote literals. Backslash escaping and Python
+                                conventions for "raw" and Unicode strings help support
+                                fiddly characters.
+        
+        [space]                 Sequences are made out of space- or tab-delimited
+                                things. ``a b c`` matches spots where those 3
+                                terms appear in that order.
+        
+        ``a / b / c``           Alternatives. The first to succeed of ``a / b / c``
+                                wins.
+        
+        ``thing?``              An optional expression. This is greedy, always consuming
+                                ``thing`` if it exists.
+        
+        ``&thing``              A lookahead assertion. Ensures ``thing`` matches at the
+                                current position but does not consume it.
+        
+        ``!thing``              A negative lookahead assertion. Matches if ``thing``
+                                isn't found here. Doesn't consume any text.
+        
+        ``things*``             Zero or more things. This is greedy, always consuming as
+                                many repetitions as it can.
+        
+        ``things+``             One or more things. This is greedy, always consuming as
+                                many repetitions as it can.
+        
+        ``~r"regex"ilmsux``     Regexes have ``~`` in front and are quoted like
+                                literals. Any flags follow the end quotes as single
+                                chars. Regexes are good for representing character
+                                classes (``[a-z0-9]``) and optimizing for speed. The
+                                downside is that they won't be able to take advantage
+                                of our fancy debugging, once we get that working.
+                                Ultimately, I'd like to deprecate explicit regexes and
+                                instead have Parsimonious dynamically build them out of
+                                simpler primitives.
+        
+        ``(things)``            Parentheses are used for grouping, like in every other
+                                language.
+        ====================    ========================================================
+        
+        
+        Optimizing Grammars
+        ===================
+        
+        Don't Repeat Expressions
+        ------------------------
+        
+        If you need a ``~"[a-z0-9]"i`` at two points in your grammar, don't type it
+        twice. Make it a rule of its own, and reference it from wherever you need it. 
+        You'll get the most out of the caching this way, since cache lookups are by 
+        expression object identity (for speed). 
+        
+        Even if you have an expression that's very simple, not repeating it will 
+        save RAM, as there can, at worst, be a cached int for every char in the text 
+        you're parsing. In the future, we may identify repeated subexpressions 
+        automatically and factor them up while building the grammar.
+        
+        How much should you shove into one regex, versus how much should you break them
+        up to not repeat yourself? That's a fine balance and worthy of benchmarking.
+        More stuff jammed into a regex will execute faster, because it doesn't have to
+        run any Python between pieces, but a broken-up one will give better cache
+        performance if the individual pieces are re-used elsewhere. If the pieces of a
+        regex aren't used anywhere else, by all means keep the whole thing together.
+        
+        
+        Quantifiers
+        -----------
+        
+        Bring your ``?`` and ``*`` quantifiers up to the highest level you
+        can. Otherwise, lower-level patterns could succeed but be empty and put a bunch
+        of useless nodes in your tree that didn't really match anything.
+        
+        
+        Processing Parse Trees
+        ======================
+        
+        A parse tree has a node for each expression matched, even if it matched a
+        zero-length string, like ``"thing"?`` might.
+        
+        The ``NodeVisitor`` class provides an inversion-of-control framework for
+        walking a tree and returning a new construct (tree, string, or whatever) based
+        on it. For now, have a look at its docstrings for more detail. There's also a
+        good example in ``grammar.RuleVisitor``. Notice how we take advantage of nodes'
+        iterability by using tuple unpacks in the formal parameter lists::
+        
+            def visit_or_term(self, or_term, (slash, _, term)):
+                ...
+        
+        For reference, here is the production the above unpacks::
+        
+            or_term = "/" _ term
+        
+        When something goes wrong in your visitor, you get a nice error like this::
+        
+            [normal traceback here...]
+            VisitationException: 'Node' object has no attribute 'foo'
+        
+            Parse tree:
+            <Node called "rules" matching "number = ~"[0-9]+"">  <-- *** We were here. ***
+                <Node matching "number = ~"[0-9]+"">
+                    <Node called "rule" matching "number = ~"[0-9]+"">
+                        <Node matching "">
+                        <Node called "label" matching "number">
+                        <Node matching " ">
+                            <Node called "_" matching " ">
+                        <Node matching "=">
+                        <Node matching " ">
+                            <Node called "_" matching " ">
+                        <Node called "rhs" matching "~"[0-9]+"">
+                            <Node called "term" matching "~"[0-9]+"">
+                                <Node called "atom" matching "~"[0-9]+"">
+                                    <Node called "regex" matching "~"[0-9]+"">
+                                        <Node matching "~">
+                                        <Node called "literal" matching ""[0-9]+"">
+                                        <Node matching "">
+                        <Node matching "">
+                        <Node called "eol" matching "
+                        ">
+                <Node matching "">
+        
+        The parse tree is tacked onto the exception, and the node whose visitor method
+        raised the error is pointed out.
+        
+        Why No Streaming Tree Processing?
+        ---------------------------------
+        
+        Some have asked why we don't process the tree as we go, SAX-style. There are
+        two main reasons:
+        
+        1. It wouldn't work. With a PEG parser, no parsing decision is final until the
+           whole text is parsed. If we had to change a decision, we'd have to backtrack
+           and redo the SAX-style interpretation as well, which would involve
+           reconstituting part of the AST and quite possibly scuttling whatever you
+           were doing with the streaming output. (Note that some bursty SAX-style
+           processing may be possible in the future if we use cuts.)
+        
+        2. It interferes with the ability to derive multiple representations from the
+           AST: for example, turning wiki markup into first HTML and then text.
+        
+        
+        Future Directions
+        =================
+        
+        Rule Syntax Changes
+        -------------------
+        
+        * Maybe support left-recursive rules like PyMeta, if anybody cares.
+        * Ultimately, I'd like to get rid of explicit regexes and break them into more
+          atomic things like character classes. Then we can dynamically compile bits
+          of the grammar into regexes as necessary to boost speed.
+        
+        Optimizations
+        -------------
+        
+        * Make RAM use almost constant by automatically inserting "cuts", as described
+          in
+          http://ialab.cs.tsukuba.ac.jp/~mizusima/publications/paste513-mizushima.pdf.
+          This would also improve error reporting, as we wouldn't backtrack out of
+          everything informative before finally failing.
+        * Find all the distinct subexpressions, and unify duplicates for a better cache
+          hit ratio.
+        * Think about having the user (optionally) provide some representative input
+          along with a grammar. We can then profile against it, see which expressions
+          are worth caching, and annotate the grammar. Perhaps there will even be
+          positions at which a given expression is more worth caching. Or we could keep
+          a count of how many times each cache entry has been used and evict the most
+          useless ones as RAM use grows.
+        * We could possibly compile the grammar into VM instructions, like in "A
+          parsing machine for PEGs" by Medeiros.
+        * If the recursion gets too deep in practice, use trampolining to dodge it.
+        
+        Niceties
+        --------
+        
+        * Pijnu has a raft of tree manipulators. I don't think I want all of them, but
+          a judicious subset might be nice. Don't get into mixing formatting with tree
+          manipulation.
+          https://github.com/erikrose/pijnu/blob/master/library/node.py#L333. PyPy's
+          parsing lib exposes a sane subset:
+          http://doc.pypy.org/en/latest/rlib.html#tree-transformations.
+        
+        
+        Version History
+        ===============
+        
+        0.6.2
+            * Make grammar compilation 100x faster. Thanks to dmoisset for the initial
+              patch.
+        
+        0.6.1
+            * Fix bug which made the default rule of a grammar invalid when it
+              contained a forward reference.
+        
+        0.6
+          .. warning::
+        
+              This release makes backward-incompatible changes:
+        
+              * The ``default_rule`` arg to Grammar's constructor has been replaced
+                with a method, ``some_grammar.default('rule_name')``, which returns a
+                new grammar just like the old except with its default rule changed.
+                This is to free up the constructor kwargs for custom rules.
+              * ``UndefinedLabel`` is no longer a subclass of ``VisitationError``. This
+                matters only in the unlikely case that you were catching
+                ``VisitationError`` exceptions and expecting to thus also catch
+                ``UndefinedLabel``.
+        
+          * Add support for "custom rules" in Grammars. These provide a hook for simple
+            custom parsing hooks spelled as Python lambdas. For heavy-duty needs,
+            you can put in Compound Expressions with LazyReferences as subexpressions,
+            and the Grammar will hook them up for optimal efficiency--no calling
+            ``__getitem__`` on Grammar at parse time.
+          * Allow grammars without a default rule (in cases where there are no string
+            rules), which leads to also allowing empty grammars. Perhaps someone
+            building up grammars dynamically will find that useful.
+          * Add ``@rule`` decorator, allowing grammars to be constructed out of
+            notations on ``NodeVisitor`` methods. This saves looking back and forth
+            between the visitor and the grammar when there is only one visitor per
+            grammar.
+          * Add ``parse()`` and ``match()`` convenience methods to ``NodeVisitor``.
+            This makes the common case of parsing a string and applying exactly one
+            visitor to the AST shorter and simpler.
+          * Improve exception message when you forget to declare a visitor method.
+          * Add ``unwrapped_exceptions`` attribute to ``NodeVisitor``, letting you
+            name certain exceptions which propagate out of visitors without being
+            wrapped by ``VisitationError`` exceptions.
+          * Expose much more of the library in ``__init__``, making your imports
+            shorter.
+          * Drastically simplify reference resolution machinery. (Vladimir Keleshev)
+        
+        0.5
+          .. warning::
+        
+              This release makes some backward-incompatible changes. See below.
+        
+          * Add alpha-quality error reporting. Now, rather than returning ``None``,
+            ``parse()`` and ``match()`` raise ``ParseError`` if they don't succeed.
+            This makes more sense, since you'd rarely attempt to parse something and
+            not care if it succeeds. It was too easy before to forget to check for a
+            ``None`` result. ``ParseError`` gives you a human-readable unicode
+            representation as well as some attributes that let you construct your own
+            custom presentation.
+          * Grammar construction now raises ``ParseError`` rather than ``BadGrammar``
+            if it can't parse your rules.
+          * ``parse()`` now takes an optional ``pos`` argument, like ``match()``.
+          * Make the ``_str__()`` method of ``UndefinedLabel`` return the right type.
+          * Support splitting rules across multiple lines, interleaving comments,
+            putting multiple rules on one line (but don't do that) and all sorts of
+            other horrific behavior.
+          * Tolerate whitespace after opening parens.
+          * Add support for single-quoted literals.
+        
+        0.4
+          * Support Python 3.
+          * Fix ``import *`` for ``parsimonious.expressions``.
+          * Rewrite grammar compiler so right-recursive rules can be compiled and
+            parsing no longer fails in some cases with forward rule references.
+        
+        0.3
+          * Support comments, the ``!`` ("not") operator, and parentheses in grammar
+            definition syntax.
+          * Change the ``&`` operator to a prefix operator to conform to the original
+            PEG syntax. The version in Parsing Techniques was infix, and that's what I
+            used as a reference. However, the unary version is more convenient, as it
+            lets you spell ``AB & A`` as simply ``A &B``.
+          * Take the ``print`` statements out of the benchmark tests.
+          * Give Node an evaluate-able ``__repr__``.
+        
+        0.2
+          * Support matching of prefixes and other not-to-the-end slices of strings by
+            making ``match()`` public and able to initialize a new cache. Add
+            ``match()`` callthrough method to ``Grammar``.
+          * Report a ``BadGrammar`` exception (rather than crashing) when there are
+            mistakes in a grammar definition.
+          * Simplify grammar compilation internals: get rid of superfluous visitor
+            methods and factor up repetitive ones. Simplify rule grammar as well.
+          * Add ``NodeVisitor.lift_child`` convenience method.
+          * Rename ``VisitationException`` to ``VisitationError`` for consistency with
+            the standard Python exception hierarchy.
+          * Rework ``repr`` and ``str`` values for grammars and expressions. Now they
+            both look like rule syntax. Grammars are even round-trippable! This fixes a
+            unicode encoding error when printing nodes that had parsed unicode text.
+          * Add tox for testing. Stop advertising Python 2.5 support, which never
+            worked (and won't unless somebody cares a lot, since it makes Python 3
+            support harder).
+          * Settle (hopefully) on the term "rule" to mean "the string representation of
+            a production". Get rid of the vague, mysterious "DSL".
+        
+        0.1
+          * A rough but useable preview release
+        
+        Thanks to Wiki Loves Monuments Panama for showing their support with a generous
+        gift.
+        
+Keywords: parse,parser,parsing,peg,packrat,grammar,language
+Platform: UNKNOWN
+Classifier: Intended Audience :: Developers
+Classifier: Natural Language :: English
+Classifier: Development Status :: 3 - Alpha
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Operating System :: OS Independent
+Classifier: Programming Language :: Python :: 2
+Classifier: Programming Language :: Python :: 2.6
+Classifier: Programming Language :: Python :: 2.7
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.1
+Classifier: Programming Language :: Python :: 3.2
+Classifier: Programming Language :: Python :: 3.3
+Classifier: Topic :: Scientific/Engineering :: Information Analysis
+Classifier: Topic :: Software Development :: Libraries
+Classifier: Topic :: Text Processing :: General
diff --git a/README.rst b/README.rst
new file mode 100644
index 0000000..02deebb
--- /dev/null
+++ b/README.rst
@@ -0,0 +1,424 @@
+============
+Parsimonious
+============
+
+Parsimonious aims to be the fastest arbitrary-lookahead parser written in pure
+Python—and the most usable. It's based on parsing expression grammars (PEGs),
+which means you feed it a simplified sort of EBNF notation. Parsimonious was
+designed to undergird a MediaWiki parser that wouldn't take 5 seconds or a GB
+of RAM to do one page, but it's applicable to all sorts of languages.
+
+
+Goals
+=====
+
+* Speed
+* Frugal RAM use
+* Minimalistic, understandable, idiomatic Python code
+* Readable grammars
+* Extensible grammars
+* Complete test coverage
+* Separation of concerns. Some Python parsing kits mix recognition with
+  instructions about how to turn the resulting tree into some kind of other
+  representation. This is limiting when you want to do several different things
+  with a tree: for example, render wiki markup to HTML *or* to text.
+* Good error reporting. I want the parser to work *with* me as I develop a
+  grammar.
+
+
+Example Usage
+=============
+
+Here's how to build a simple grammar::
+
+    >>> from parsimonious.grammar import Grammar
+    >>> grammar = Grammar(
+    ...     """
+    ...     bold_text  = bold_open text bold_close
+    ...     text       = ~"[A-Z 0-9]*"i
+    ...     bold_open  = "(("
+    ...     bold_close = "))"
+    ...     """)
+
+You can have forward references and even right recursion; it's all taken care
+of by the grammar compiler. The first rule is taken to be the default start
+symbol, but you can override that.
+
+Next, let's parse something and get an abstract syntax tree::
+
+    >>> print grammar.parse('((bold stuff))')
+    <Node called "bold_text" matching "((bold stuff))">
+        <Node called "bold_open" matching "((">
+        <RegexNode called "text" matching "bold stuff">
+        <Node called "bold_close" matching "))">
+
+You'd typically then use a ``nodes.NodeVisitor`` subclass (see below) to walk
+the tree and do something useful with it.
+
+
+Status
+======
+
+* Everything that exists works. Test coverage is good.
+* I don't plan on making any backward-incompatible changes to the rule syntax
+  in the future, so you can write grammars with confidence.
+* It may be slow and use a lot of RAM; I haven't measured either yet. However,
+  I have yet to begin optimizing in earnest.
+* Error reporting is now in place. ``repr`` methods of expressions, grammars,
+  and nodes are clear and helpful as well. The ``Grammar`` ones are
+  even round-trippable!
+* The grammar extensibility story is underdeveloped at the moment. You should
+  be able to extend a grammar by simply concatening more rules onto the
+  existing ones; later rules of the same name should override previous ones.
+  However, this is untested and may not be the final story.
+* Sphinx docs are coming, but the docstrings are quite useful now.
+* Note that there may be API changes until we get to 1.0, so be sure to pin to
+  the version you're using.
+
+Coming Soon
+-----------
+
+* Optimizations to make Parsimonious worthy of its name
+* Tighter RAM use
+* Better-thought-out grammar extensibility story
+* Amazing grammar debugging
+
+
+A Little About PEG Parsers
+==========================
+
+PEG parsers don't draw a distinction between lexing and parsing; everything is
+done at once. As a result, there is no lookahead limit, as there is with, for
+instance, Yacc. And, due to both of these properties, PEG grammars are easier
+to write: they're basically just a more practical dialect of EBNF. With
+caching, they take O(grammar size * text length) memory (though I plan to do
+better), but they run in O(text length) time.
+
+More Technically
+----------------
+
+PEGs can describe a superset of *LL(k)* languages, any deterministic *LR(k)*
+language, and many others—including some that aren't context-free
+(http://www.brynosaurus.com/pub/lang/peg.pdf). They can also deal with what
+would be ambiguous languages if described in canonical EBNF. They do this by
+trading the ``|`` alternation operator for the ``/`` operator, which works the
+same except that it makes priority explicit: ``a / b / c`` first tries matching
+``a``. If that fails, it tries ``b``, and, failing that, moves on to ``c``.
+Thus, ambiguity is resolved by always yielding the first successful recognition.
+
+
+Writing Grammars
+================
+
+Grammars are defined by a series of rules. The syntax should be familiar to
+anyone who uses regexes or reads programming language manuals. An example will
+serve best::
+
+    my_grammar = Grammar(r"""
+        styled_text = bold_text / italic_text
+        bold_text   = "((" text "))"
+        italic_text = "''" text "''"
+        text        = ~"[A-Z 0-9]*"i
+        """)
+
+You can wrap a rule across multiple lines if you like; the syntax is very
+forgiving.
+
+
+Syntax Reference
+----------------
+
+====================    ========================================================
+``"some literal"``      Used to quote literals. Backslash escaping and Python
+                        conventions for "raw" and Unicode strings help support
+                        fiddly characters.
+
+[space]                 Sequences are made out of space- or tab-delimited
+                        things. ``a b c`` matches spots where those 3
+                        terms appear in that order.
+
+``a / b / c``           Alternatives. The first to succeed of ``a / b / c``
+                        wins.
+
+``thing?``              An optional expression. This is greedy, always consuming
+                        ``thing`` if it exists.
+
+``&thing``              A lookahead assertion. Ensures ``thing`` matches at the
+                        current position but does not consume it.
+
+``!thing``              A negative lookahead assertion. Matches if ``thing``
+                        isn't found here. Doesn't consume any text.
+
+``things*``             Zero or more things. This is greedy, always consuming as
+                        many repetitions as it can.
+
+``things+``             One or more things. This is greedy, always consuming as
+                        many repetitions as it can.
+
+``~r"regex"ilmsux``     Regexes have ``~`` in front and are quoted like
+                        literals. Any flags follow the end quotes as single
+                        chars. Regexes are good for representing character
+                        classes (``[a-z0-9]``) and optimizing for speed. The
+                        downside is that they won't be able to take advantage
+                        of our fancy debugging, once we get that working.
+                        Ultimately, I'd like to deprecate explicit regexes and
+                        instead have Parsimonious dynamically build them out of
+                        simpler primitives.
+
+``(things)``            Parentheses are used for grouping, like in every other
+                        language.
+====================    ========================================================
+
+
+Optimizing Grammars
+===================
+
+Don't Repeat Expressions
+------------------------
+
+If you need a ``~"[a-z0-9]"i`` at two points in your grammar, don't type it
+twice. Make it a rule of its own, and reference it from wherever you need it. 
+You'll get the most out of the caching this way, since cache lookups are by 
+expression object identity (for speed). 
+
+Even if you have an expression that's very simple, not repeating it will 
+save RAM, as there can, at worst, be a cached int for every char in the text 
+you're parsing. In the future, we may identify repeated subexpressions 
+automatically and factor them up while building the grammar.
+
+How much should you shove into one regex, versus how much should you break them
+up to not repeat yourself? That's a fine balance and worthy of benchmarking.
+More stuff jammed into a regex will execute faster, because it doesn't have to
+run any Python between pieces, but a broken-up one will give better cache
+performance if the individual pieces are re-used elsewhere. If the pieces of a
+regex aren't used anywhere else, by all means keep the whole thing together.
+
+
+Quantifiers
+-----------
+
+Bring your ``?`` and ``*`` quantifiers up to the highest level you
+can. Otherwise, lower-level patterns could succeed but be empty and put a bunch
+of useless nodes in your tree that didn't really match anything.
+
+
+Processing Parse Trees
+======================
+
+A parse tree has a node for each expression matched, even if it matched a
+zero-length string, like ``"thing"?`` might.
+
+The ``NodeVisitor`` class provides an inversion-of-control framework for
+walking a tree and returning a new construct (tree, string, or whatever) based
+on it. For now, have a look at its docstrings for more detail. There's also a
+good example in ``grammar.RuleVisitor``. Notice how we take advantage of nodes'
+iterability by using tuple unpacks in the formal parameter lists::
+
+    def visit_or_term(self, or_term, (slash, _, term)):
+        ...
+
+For reference, here is the production the above unpacks::
+
+    or_term = "/" _ term
+
+When something goes wrong in your visitor, you get a nice error like this::
+
+    [normal traceback here...]
+    VisitationException: 'Node' object has no attribute 'foo'
+
+    Parse tree:
+    <Node called "rules" matching "number = ~"[0-9]+"">  <-- *** We were here. ***
+        <Node matching "number = ~"[0-9]+"">
+            <Node called "rule" matching "number = ~"[0-9]+"">
+                <Node matching "">
+                <Node called "label" matching "number">
+                <Node matching " ">
+                    <Node called "_" matching " ">
+                <Node matching "=">
+                <Node matching " ">
+                    <Node called "_" matching " ">
+                <Node called "rhs" matching "~"[0-9]+"">
+                    <Node called "term" matching "~"[0-9]+"">
+                        <Node called "atom" matching "~"[0-9]+"">
+                            <Node called "regex" matching "~"[0-9]+"">
+                                <Node matching "~">
+                                <Node called "literal" matching ""[0-9]+"">
+                                <Node matching "">
+                <Node matching "">
+                <Node called "eol" matching "
+                ">
+        <Node matching "">
+
+The parse tree is tacked onto the exception, and the node whose visitor method
+raised the error is pointed out.
+
+Why No Streaming Tree Processing?
+---------------------------------
+
+Some have asked why we don't process the tree as we go, SAX-style. There are
+two main reasons:
+
+1. It wouldn't work. With a PEG parser, no parsing decision is final until the
+   whole text is parsed. If we had to change a decision, we'd have to backtrack
+   and redo the SAX-style interpretation as well, which would involve
+   reconstituting part of the AST and quite possibly scuttling whatever you
+   were doing with the streaming output. (Note that some bursty SAX-style
+   processing may be possible in the future if we use cuts.)
+
+2. It interferes with the ability to derive multiple representations from the
+   AST: for example, turning wiki markup into first HTML and then text.
+
+
+Future Directions
+=================
+
+Rule Syntax Changes
+-------------------
+
+* Maybe support left-recursive rules like PyMeta, if anybody cares.
+* Ultimately, I'd like to get rid of explicit regexes and break them into more
+  atomic things like character classes. Then we can dynamically compile bits
+  of the grammar into regexes as necessary to boost speed.
+
+Optimizations
+-------------
+
+* Make RAM use almost constant by automatically inserting "cuts", as described
+  in
+  http://ialab.cs.tsukuba.ac.jp/~mizusima/publications/paste513-mizushima.pdf.
+  This would also improve error reporting, as we wouldn't backtrack out of
+  everything informative before finally failing.
+* Find all the distinct subexpressions, and unify duplicates for a better cache
+  hit ratio.
+* Think about having the user (optionally) provide some representative input
+  along with a grammar. We can then profile against it, see which expressions
+  are worth caching, and annotate the grammar. Perhaps there will even be
+  positions at which a given expression is more worth caching. Or we could keep
+  a count of how many times each cache entry has been used and evict the most
+  useless ones as RAM use grows.
+* We could possibly compile the grammar into VM instructions, like in "A
+  parsing machine for PEGs" by Medeiros.
+* If the recursion gets too deep in practice, use trampolining to dodge it.
+
+Niceties
+--------
+
+* Pijnu has a raft of tree manipulators. I don't think I want all of them, but
+  a judicious subset might be nice. Don't get into mixing formatting with tree
+  manipulation.
+  https://github.com/erikrose/pijnu/blob/master/library/node.py#L333. PyPy's
+  parsing lib exposes a sane subset:
+  http://doc.pypy.org/en/latest/rlib.html#tree-transformations.
+
+
+Version History
+===============
+
+0.6.2
+    * Make grammar compilation 100x faster. Thanks to dmoisset for the initial
+      patch.
+
+0.6.1
+    * Fix bug which made the default rule of a grammar invalid when it
+      contained a forward reference.
+
+0.6
+  .. warning::
+
+      This release makes backward-incompatible changes:
+
+      * The ``default_rule`` arg to Grammar's constructor has been replaced
+        with a method, ``some_grammar.default('rule_name')``, which returns a
+        new grammar just like the old except with its default rule changed.
+        This is to free up the constructor kwargs for custom rules.
+      * ``UndefinedLabel`` is no longer a subclass of ``VisitationError``. This
+        matters only in the unlikely case that you were catching
+        ``VisitationError`` exceptions and expecting to thus also catch
+        ``UndefinedLabel``.
+
+  * Add support for "custom rules" in Grammars. These provide a hook for simple
+    custom parsing hooks spelled as Python lambdas. For heavy-duty needs,
+    you can put in Compound Expressions with LazyReferences as subexpressions,
+    and the Grammar will hook them up for optimal efficiency--no calling
+    ``__getitem__`` on Grammar at parse time.
+  * Allow grammars without a default rule (in cases where there are no string
+    rules), which leads to also allowing empty grammars. Perhaps someone
+    building up grammars dynamically will find that useful.
+  * Add ``@rule`` decorator, allowing grammars to be constructed out of
+    notations on ``NodeVisitor`` methods. This saves looking back and forth
+    between the visitor and the grammar when there is only one visitor per
+    grammar.
+  * Add ``parse()`` and ``match()`` convenience methods to ``NodeVisitor``.
+    This makes the common case of parsing a string and applying exactly one
+    visitor to the AST shorter and simpler.
+  * Improve exception message when you forget to declare a visitor method.
+  * Add ``unwrapped_exceptions`` attribute to ``NodeVisitor``, letting you
+    name certain exceptions which propagate out of visitors without being
+    wrapped by ``VisitationError`` exceptions.
+  * Expose much more of the library in ``__init__``, making your imports
+    shorter.
+  * Drastically simplify reference resolution machinery. (Vladimir Keleshev)
+
+0.5
+  .. warning::
+
+      This release makes some backward-incompatible changes. See below.
+
+  * Add alpha-quality error reporting. Now, rather than returning ``None``,
+    ``parse()`` and ``match()`` raise ``ParseError`` if they don't succeed.
+    This makes more sense, since you'd rarely attempt to parse something and
+    not care if it succeeds. It was too easy before to forget to check for a
+    ``None`` result. ``ParseError`` gives you a human-readable unicode
+    representation as well as some attributes that let you construct your own
+    custom presentation.
+  * Grammar construction now raises ``ParseError`` rather than ``BadGrammar``
+    if it can't parse your rules.
+  * ``parse()`` now takes an optional ``pos`` argument, like ``match()``.
+  * Make the ``_str__()`` method of ``UndefinedLabel`` return the right type.
+  * Support splitting rules across multiple lines, interleaving comments,
+    putting multiple rules on one line (but don't do that) and all sorts of
+    other horrific behavior.
+  * Tolerate whitespace after opening parens.
+  * Add support for single-quoted literals.
+
+0.4
+  * Support Python 3.
+  * Fix ``import *`` for ``parsimonious.expressions``.
+  * Rewrite grammar compiler so right-recursive rules can be compiled and
+    parsing no longer fails in some cases with forward rule references.
+
+0.3
+  * Support comments, the ``!`` ("not") operator, and parentheses in grammar
+    definition syntax.
+  * Change the ``&`` operator to a prefix operator to conform to the original
+    PEG syntax. The version in Parsing Techniques was infix, and that's what I
+    used as a reference. However, the unary version is more convenient, as it
+    lets you spell ``AB & A`` as simply ``A &B``.
+  * Take the ``print`` statements out of the benchmark tests.
+  * Give Node an evaluate-able ``__repr__``.
+
+0.2
+  * Support matching of prefixes and other not-to-the-end slices of strings by
+    making ``match()`` public and able to initialize a new cache. Add
+    ``match()`` callthrough method to ``Grammar``.
+  * Report a ``BadGrammar`` exception (rather than crashing) when there are
+    mistakes in a grammar definition.
+  * Simplify grammar compilation internals: get rid of superfluous visitor
+    methods and factor up repetitive ones. Simplify rule grammar as well.
+  * Add ``NodeVisitor.lift_child`` convenience method.
+  * Rename ``VisitationException`` to ``VisitationError`` for consistency with
+    the standard Python exception hierarchy.
+  * Rework ``repr`` and ``str`` values for grammars and expressions. Now they
+    both look like rule syntax. Grammars are even round-trippable! This fixes a
+    unicode encoding error when printing nodes that had parsed unicode text.
+  * Add tox for testing. Stop advertising Python 2.5 support, which never
+    worked (and won't unless somebody cares a lot, since it makes Python 3
+    support harder).
+  * Settle (hopefully) on the term "rule" to mean "the string representation of
+    a production". Get rid of the vague, mysterious "DSL".
+
+0.1
+  * A rough but useable preview release
+
+Thanks to Wiki Loves Monuments Panama for showing their support with a generous
+gift.
diff --git a/parsimonious.egg-info/PKG-INFO b/parsimonious.egg-info/PKG-INFO
new file mode 100644
index 0000000..2944490
--- /dev/null
+++ b/parsimonious.egg-info/PKG-INFO
@@ -0,0 +1,450 @@
+Metadata-Version: 1.1
+Name: parsimonious
+Version: 0.6.2
+Summary: (Soon to be) the fastest pure-Python PEG parser I could muster
+Home-page: https://github.com/erikrose/parsimonious
+Author: Erik Rose
+Author-email: erikrose at grinchcentral.com
+License: MIT
+Description: ============
+        Parsimonious
+        ============
+        
+        Parsimonious aims to be the fastest arbitrary-lookahead parser written in pure
+        Python—and the most usable. It's based on parsing expression grammars (PEGs),
+        which means you feed it a simplified sort of EBNF notation. Parsimonious was
+        designed to undergird a MediaWiki parser that wouldn't take 5 seconds or a GB
+        of RAM to do one page, but it's applicable to all sorts of languages.
+        
+        
+        Goals
+        =====
+        
+        * Speed
+        * Frugal RAM use
+        * Minimalistic, understandable, idiomatic Python code
+        * Readable grammars
+        * Extensible grammars
+        * Complete test coverage
+        * Separation of concerns. Some Python parsing kits mix recognition with
+          instructions about how to turn the resulting tree into some kind of other
+          representation. This is limiting when you want to do several different things
+          with a tree: for example, render wiki markup to HTML *or* to text.
+        * Good error reporting. I want the parser to work *with* me as I develop a
+          grammar.
+        
+        
+        Example Usage
+        =============
+        
+        Here's how to build a simple grammar::
+        
+            >>> from parsimonious.grammar import Grammar
+            >>> grammar = Grammar(
+            ...     """
+            ...     bold_text  = bold_open text bold_close
... 2820 lines suppressed ...

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/python-modules/packages/python-parsimonious.git



More information about the Python-modules-commits mailing list