[Python-modules-commits] [python-rebulk] 01/17: Import python-rebulk_0.9.0.orig.tar.gz

Etienne Millon emillon-guest at moszumanska.debian.org
Thu Sep 28 20:39:57 UTC 2017


This is an automated email from the git hooks/post-receive script.

emillon-guest pushed a commit to branch master
in repository python-rebulk.

commit d74f79ae4f614ea6b9d04ebb6bc564cb3b38cd23
Author: Etienne Millon <me at emillon.org>
Date:   Thu Sep 28 21:12:21 2017 +0200

    Import python-rebulk_0.9.0.orig.tar.gz
---
 .coveragerc                          |   10 +
 LICENSE                              |   22 +
 MANIFEST.in                          |    6 +
 PKG-INFO                             |  535 ++++++
 README.rst                           |  511 +++++
 dev-requirements.txt                 |    1 +
 pylintrc                             |  387 ++++
 pytest.ini                           |    2 +
 rebulk.egg-info/PKG-INFO             |  535 ++++++
 rebulk.egg-info/SOURCES.txt          |   49 +
 rebulk.egg-info/dependency_links.txt |    1 +
 rebulk.egg-info/requires.txt         |   14 +
 rebulk.egg-info/top_level.txt        |    1 +
 rebulk.egg-info/zip-safe             |    1 +
 rebulk/__init__.py                   |   10 +
 rebulk/__version__.py                |    7 +
 rebulk/chain.py                      |  467 +++++
 rebulk/debug.py                      |   56 +
 rebulk/formatters.py                 |   23 +
 rebulk/introspector.py               |  126 ++
 rebulk/loose.py                      |  198 ++
 rebulk/match.py                      |  868 +++++++++
 rebulk/pattern.py                    |  489 +++++
 rebulk/processors.py                 |  107 ++
 rebulk/rebulk.py                     |  363 ++++
 rebulk/remodule.py                   |   17 +
 rebulk/rules.py                      |  375 ++++
 rebulk/test/__init__.py              |    3 +
 rebulk/test/default_rules_module.py  |   79 +
 rebulk/test/rebulk_rules_module.py   |   38 +
 rebulk/test/rules_module.py          |   54 +
 rebulk/test/test_chain.py            |  411 ++++
 rebulk/test/test_debug.py            |   83 +
 rebulk/test/test_introspector.py     |  138 ++
 rebulk/test/test_loose.py            |   83 +
 rebulk/test/test_match.py            |  568 ++++++
 rebulk/test/test_pattern.py          |  858 +++++++++
 rebulk/test/test_processors.py       |  215 +++
 rebulk/test/test_rebulk.py           |  419 ++++
 rebulk/test/test_rules.py            |  197 ++
 rebulk/test/test_toposort.py         |  111 ++
 rebulk/test/test_validators.py       |   64 +
 rebulk/toposort.py                   |   84 +
 rebulk/utils.py                      |  153 ++
 rebulk/validators.py                 |   70 +
 requirements.txt                     |    2 +
 runtests.py                          | 3487 ++++++++++++++++++++++++++++++++++
 setup.cfg                            |   11 +
 setup.py                             |   65 +
 tox.ini                              |   18 +
 50 files changed, 12392 insertions(+)

diff --git a/.coveragerc b/.coveragerc
new file mode 100644
index 0000000..7bca8b9
--- /dev/null
+++ b/.coveragerc
@@ -0,0 +1,10 @@
+# .coveragerc to control coverage.py
+[run]
+include =
+    rebulk/*
+omit =
+    rebulk/__version__.py
+    rebulk/test/*
+[report]
+exclude_lines =
+    pragma: no cover
\ No newline at end of file
diff --git a/LICENSE b/LICENSE
new file mode 100644
index 0000000..445ff90
--- /dev/null
+++ b/LICENSE
@@ -0,0 +1,22 @@
+The MIT License (MIT)
+
+Copyright (c) 2015 Rémi Alvergnat
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
diff --git a/MANIFEST.in b/MANIFEST.in
new file mode 100644
index 0000000..f7ff8e4
--- /dev/null
+++ b/MANIFEST.in
@@ -0,0 +1,6 @@
+include *.py
+include *.txt
+include *.ini
+include .coveragerc
+include LICENSE
+include pylintrc
diff --git a/PKG-INFO b/PKG-INFO
new file mode 100644
index 0000000..18c768c
--- /dev/null
+++ b/PKG-INFO
@@ -0,0 +1,535 @@
+Metadata-Version: 1.1
+Name: rebulk
+Version: 0.9.0
+Summary: Rebulk - Define simple search patterns in bulk to perform advanced matching on any string.
+Home-page: https://github.com/Toilal/rebulk/
+Author: Rémi Alvergnat
+Author-email: toilal.dev at gmail.com
+License: MIT
+Download-URL: https://pypi.python.org/packages/source/r/rebulk/rebulk-0.9.0.tar.gz
+Description: ReBulk
+        =======
+        
+        .. image:: http://img.shields.io/pypi/v/rebulk.svg
+            :target: https://pypi.python.org/pypi/rebulk
+            :alt: Latest Version
+        
+        .. image:: http://img.shields.io/badge/license-MIT-blue.svg
+            :target: https://pypi.python.org/pypi/rebulk
+            :alt: MIT License
+        
+        .. image:: http://img.shields.io/travis/Toilal/rebulk.svg
+            :target: http://travis-ci.org/Toilal/rebulk?branch=master
+            :alt: Build Status
+        
+        .. image:: http://img.shields.io/coveralls/Toilal/rebulk.svg
+            :target: https://coveralls.io/r/Toilal/rebulk?branch=master
+            :alt: Coveralls
+        
+        ReBulk is a python library that performs advanced searches in strings that would be hard to implement using
+        `re module`_ or `String methods`_ only.
+        
+        It includes some features like ``Patterns``, ``Match``, ``Rule`` that allows developers to build a
+        custom and complex string matcher using a readable and extendable API.
+        
+        This project is hosted on GitHub: `<https://github.com/Toilal/rebulk>`_
+        
+        Install
+        -------
+        .. code-block:: sh
+        
+            $ pip install rebulk
+        
+        Usage
+        ------
+        Regular expression, string and function based patterns are declared in a ``Rebulk`` object. It use a fluent API to
+        chain ``string``, ``regex``, and ``functional`` methods to define various patterns types.
+        
+        .. code-block:: python
+        
+            >>> from rebulk import Rebulk
+            >>> bulk = Rebulk().string('brown').regex(r'qu\w+').functional(lambda s: (20, 25))
+        
+        When ``Rebulk`` object is fully configured, you can call ``matches`` method with an input string to retrieve all
+        ``Match`` objects found by registered pattern.
+        
+        .. code-block:: python
+        
+            >>> bulk.matches("The quick brown fox jumps over the lazy dog")
+            [<brown:(10, 15)>, <quick:(4, 9)>, <jumps:(20, 25)>]
+        
+        If multiple ``Match`` objects are found at the same position, only the longer one is kept.
+        
+        .. code-block:: python
+        
+            >>> bulk = Rebulk().string('lakers').string('la')
+            >>> bulk.matches("the lakers are from la")
+            [<lakers:(4, 10)>, <la:(20, 22)>]
+        
+        String Patterns
+        ---------------
+        String patterns are based on `str.find`_ method to find matches, but returns all matches in the string. ``ignore_case``
+        can be enabled to ignore case.
+        
+        .. code-block:: python
+        
+            >>> Rebulk().string('la').matches("lalalilala")
+            [<la:(0, 2)>, <la:(2, 4)>, <la:(6, 8)>, <la:(8, 10)>]
+        
+            >>> Rebulk().string('la').matches("LalAlilAla")
+            [<la:(8, 10)>]
+        
+            >>> Rebulk().string('la', ignore_case=True).matches("LalAlilAla")
+            [<La:(0, 2)>, <lA:(2, 4)>, <lA:(6, 8)>, <la:(8, 10)>]
+        
+        You can define several patterns with a single ``string`` method call.
+        
+        .. code-block:: python
+        
+            >>> Rebulk().string('Winter', 'coming').matches("Winter is coming...")
+            [<Winter:(0, 6)>, <coming:(10, 16)>]
+        
+        Regular Expression Patterns
+        ---------------------------
+        Regular Expression patterns are based on a compiled regular expression.
+        `re.finditer`_ method is used to find matches.
+        
+        If `regex module`_ is available, it will be used by rebulk instead of default `re module`_.
+        
+        .. code-block:: python
+        
+            >>> Rebulk().regex(r'l\w').matches("lolita")
+            [<lo:(0, 2)>, <li:(2, 4)>]
+        
+        You can define several patterns with a single ``regex`` method call.
+        
+        .. code-block:: python
+        
+            >>> Rebulk().regex(r'Wint\wr', 'com\w{3}').matches("Winter is coming...")
+            [<Winter:(0, 6)>, <coming:(10, 16)>]
+        
+        All keyword arguments from `re.compile`_ are supported.
+        
+        .. code-block:: python
+        
+            >>> import re  # import required for flags constant
+            >>> Rebulk().regex('L[A-Z]KERS', flags=re.IGNORECASE) \
+            ...         .matches("The LaKeRs are from La")
+            [<LaKeRs:(4, 10)>]
+        
+            >>> Rebulk().regex('L[A-Z]', 'L[A-Z]KERS', flags=re.IGNORECASE) \
+            ...         .matches("The LaKeRs are from La")
+            [<La:(20, 22)>, <LaKeRs:(4, 10)>]
+        
+            >>> Rebulk().regex(('L[A-Z]', re.IGNORECASE), ('L[a-z]KeRs')) \
+            ...         .matches("The LaKeRs are from La")
+            [<La:(20, 22)>, <LaKeRs:(4, 10)>]
+        
+        If `regex module`_ is available, it automatically supports repeated captures.
+        
+        .. code-block:: python
+        
+            >>> # If regex module is available, repeated_captures is True by default.
+            >>> matches = Rebulk().regex(r'(\d+)(?:-(\d+))+').matches("01-02-03-04")
+            >>> matches[0].children # doctest:+SKIP
+            [<01:(0, 2)>, <02:(3, 5)>, <03:(6, 8)>, <04:(9, 11)>]
+        
+            >>> # If regex module is not available, or if repeated_captures is forced to False.
+            >>> matches = Rebulk().regex(r'(\d+)(?:-(\d+))+', repeated_captures=False) \
+            ...                   .matches("01-02-03-04")
+            >>> matches[0].children
+            [<01:(0, 2)+initiator=01-02-03-04>, <04:(9, 11)+initiator=01-02-03-04>]
+        
+        - ``abbreviations``
+        
+          Defined as a list of 2-tuple, each tuple is an abbreviation. It simply replace ``tuple[0]`` with ``tuple[1]`` in the
+          expression.
+        
+          >>> Rebulk().regex(r'Custom-separators', abbreviations=[("-", "[\W_]+")])\
+          ...         .matches("Custom_separators using-abbreviations")
+          [<Custom_separators:(0, 17)>]
+        
+        
+        Functional Patterns
+        -------------------
+        Functional Patterns are based on the evaluation of a function.
+        
+        The function should have the same parameters as ``Rebulk.matches`` method, that is the input string,
+        and must return at least start index and end index of the ``Match`` object.
+        
+        .. code-block:: python
+        
+            >>> def func(string):
+            ...     index = string.find('?')
+            ...     if index > -1:
+            ...         return 0, index - 11
+            >>> Rebulk().functional(func).matches("Why do simple ? Forget about it ...")
+            [<Why:(0, 3)>]
+        
+        You can also return a dict of keywords arguments for ``Match`` object.
+        
+        You can define several patterns with a single ``functional`` method call, and function used can return multiple
+        matches.
+        
+        Chain Patterns
+        --------------
+        Chain Patterns are ordered composition of string, functional and regex patterns. Repeater can be set to define
+        repetition on chain part.
+        
+        .. code-block:: python
+        
+            >>> r = Rebulk().chain(children=True, formatter={'episode': int, 'version': int}, flags=re.IGNORECASE)\
+            ...             .regex(r'e(?P<episode>\d{1,4})').repeater(1)\
+            ...             .regex(r'v(?P<version>\d+)').repeater('?')\
+            ...             .regex(r'[ex-](?P<episode>\d{1,4})').repeater('*')\
+            ...             .close() # .repeater(1) could be omitted as it's the default behavior
+            >>> r.matches("This is E14v2-15-16-17").to_dict()  # converts matches to dict
+            MatchesDict([('episode', [14, 15, 16, 17]), ('version', 2)])
+        
+        Patterns parameters
+        -------------------
+        
+        All patterns have options that can be given as keyword arguments.
+        
+        - ``validator``
+        
+          Function to validate ``Match`` value given by the pattern. Can also be a ``dict``, to use ``validator`` with pattern
+          named with key.
+        
+          .. code-block:: python
+        
+              >>> def check_leap_year(match):
+              ...     return int(match.value) in [1980, 1984, 1988]
+              >>> matches = Rebulk().regex(r'\d{4}', validator=check_leap_year) \
+              ...                   .matches("In year 1982 ...")
+              >>> len(matches)
+              0
+              >>> matches = Rebulk().regex(r'\d{4}', validator=check_leap_year) \
+              ...                   .matches("In year 1984 ...")
+              >>> len(matches)
+              1
+        
+        Some base validator functions are available in ``rebulk.validators`` module. Most of those functions have to be
+        configured using ``functools.partial`` to map them to function accepting a single ``match`` argument.
+        
+        - ``formatter``
+        
+          Function to convert ``Match`` value given by the pattern. Can also be a ``dict``, to use ``formatter`` with matches
+          named with key.
+        
+          .. code-block:: python
+        
+              >>> def year_formatter(value):
+              ...     return int(value)
+              >>> matches = Rebulk().regex(r'\d{4}', formatter=year_formatter) \
+              ...                   .matches("In year 1982 ...")
+              >>> isinstance(matches[0].value, int)
+              True
+        
+        - ``post_processor``
+        
+          Function to change the default output of the pattern. Function parameters are Matches list and Pattern object.
+        
+        - ``name``
+        
+          The name of the pattern. It is automatically passed to ``Match`` objects generated by this pattern.
+        
+        - ``tags``
+        
+          A list of string that qualifies this pattern.
+        
+        - ``value``
+        
+          Override value property for generated ``Match`` objects. Can also be a ``dict``, to use ``value`` with pattern
+          named with key.
+        
+        - ``validate_all``
+        
+          By default, validator is called for returned ``Match`` objects only. Enable this option to validate them all, parent
+          and children included.
+        
+        - ``format_all``
+        
+          By default, formatter is called for returned ``Match`` values only. Enable this option to format them all, parent and
+          children included.
+        
+        - ``disabled``
+        
+          A ``function(context)`` to disable the pattern if returning ``True``.
+        
+        - ``children``
+        
+          If ``True``, all children ``Match`` objects will be retrieved instead of a single parent ``Match`` object.
+        
+        - ``private``
+        
+          If ``True``, ``Match`` objects generated from this pattern are available internally only. They will be removed at
+          the end of ``Rebulk.matches`` method call.
+        
+        - ``private_parent``
+        
+          Force parent matches to be returned and flag them as private.
+        
+        - ``private_children``
+        
+          Force children matches to be returned and flag them as private.
+        
+        - ``private_names``
+        
+          Matches names that will be declared as private
+        
+        - ``ignore_names``
+        
+          Matches names that will be ignored from the pattern output, after validation.
+        
+        - ``marker``
+        
+          If ``true``, ``Match`` objects generated from this pattern will be markers matches instead of standard matches.
+          They won't be included in ``Matches`` sequence, but will be available in ``Matches.markers`` sequence (see
+          ``Markers`` section).
+        
+        
+        Match
+        -----
+        
+        A ``Match`` object is the result created by a registered pattern.
+        
+        It has a ``value`` property defined, and position indices are available through ``start``, ``end`` and ``span``
+        properties.
+        
+        In some case, it contains children ``Match`` objects in ``children`` property, and each child ``Match`` object
+        reference its parent in ``parent`` property. Also, a ``name`` property can be defined for the match.
+        
+        If groups are defined in a Regular Expression pattern, each group match will be converted to a
+        single ``Match`` object. If a group has a name defined (``(?P<name>group)``), it is set as ``name`` property in a child
+        ``Match`` object. The whole regexp match (``re.group(0)``) will be converted to the main ``Match`` object,
+        and all subgroups (1, 2, ... n) will be converted to ``children`` matches of the main ``Match`` object.
+        
+        .. code-block:: python
+        
+            >>> matches = Rebulk() \
+            ...         .regex(r"One, (?P<one>\w+), Two, (?P<two>\w+), Three, (?P<three>\w+)") \
+            ...         .matches("Zero, 0, One, 1, Two, 2, Three, 3, Four, 4")
+            >>> matches
+            [<One, 1, Two, 2, Three, 3:(9, 33)>]
+            >>> for child in matches[0].children:
+            ...     '%s = %s' % (child.name, child.value)
+            'one = 1'
+            'two = 2'
+            'three = 3'
+        
+        It's possible to retrieve only children by using ``children`` parameters. You can also customize the way structure
+        is generated with ``every``, ``private_parent`` and ``private_children`` parameters.
+        
+        .. code-block:: python
+        
+            >>> matches = Rebulk() \
+            ...         .regex(r"One, (?P<one>\w+), Two, (?P<two>\w+), Three, (?P<three>\w+)", children=True) \
+            ...         .matches("Zero, 0, One, 1, Two, 2, Three, 3, Four, 4")
+            >>> matches
+            [<1:(14, 15)+name=one+initiator=One, 1, Two, 2, Three, 3>, <2:(22, 23)+name=two+initiator=One, 1, Two, 2, Three, 3>, <3:(32, 33)+name=three+initiator=One, 1, Two, 2, Three, 3>]
+        
+        Match object has the following properties that can be given to Pattern objects
+        
+        - ``formatter``
+        
+          Function to convert ``Match`` value given by the pattern. Can also be a ``dict``, to use ``formatter`` with matches
+          named with key.
+        
+          .. code-block:: python
+        
+              >>> def year_formatter(value):
+              ...     return int(value)
+              >>> matches = Rebulk().regex(r'\d{4}', formatter=year_formatter) \
+              ...                   .matches("In year 1982 ...")
+              >>> isinstance(matches[0].value, int)
+              True
+        
+        - ``format_all``
+        
+          By default, formatter is called for returned ``Match`` values only. Enable this option to format them all, parent and
+          children included.
+        
+        - ``conflict_solver``
+        
+          A ``function(match, conflicting_match)`` used to solve conflict. Returned object will be removed from matches by
+          ``ConflictSolver`` default rule. If ``__default__`` string is returned, it will fallback to default behavior
+          keeping longer match.
+        
+        
+        Matches
+        -------
+        
+        A ``Matches`` object holds the result of ``Rebulk.matches`` method call. It's a sequence of ``Match`` objects and
+        it behaves like a list.
+        
+        All methods accepts a ``predicate`` function to filter ``Match`` objects using a callable, and an ``index`` int to
+        retrieve a single element from default returned matches.
+        
+        It has the following additional methods and properties on it.
+        
+        - ``starting(index, predicate=None, index=None)``
+        
+          Retrieves a list of ``Match`` objects that starts at given index.
+        
+        - ``ending(index, predicate=None, index=None)``
+        
+          Retrieves a list of ``Match`` objects that ends at given index.
+        
+        - ``previous(match, predicate=None, index=None)``
+        
+          Retrieves a list of ``Match`` objects that are previous and nearest to match.
+        
+        - ``next(match, predicate=None, index=None)``
+        
+          Retrieves a list of ``Match`` objects that are next and nearest to match.
+        
+        - ``tagged(tag, predicate=None, index=None)``
+        
+          Retrieves a list of ``Match`` objects that have the given tag defined.
+        
+        - ``named(name, predicate=None, index=None)``
+        
+          Retrieves a list of ``Match`` objects that have the given name.
+        
+        - ``range(start=0, end=None, predicate=None, index=None)``
+        
+          Retrieves a list of ``Match`` objects for given range, sorted from start to end.
+        
+        - ``holes(start=0, end=None, formatter=None, ignore=None, predicate=None, index=None)``
+        
+          Retrieves a list of *hole* ``Match`` objects for given range. A hole match is created for each range where no match
+          is available.
+        
+        - ``conflicting(match, predicate=None, index=None)``
+        
+          Retrieves a list of ``Match`` objects that conflicts with given match.
+        
+        - ``chain_before(self, position, seps, start=0, predicate=None, index=None)``:
+        
+          Retrieves a list of chained matches, before position, matching predicate and separated by characters from seps only.
+        
+        - ``chain_after(self, position, seps, end=None, predicate=None, index=None)``:
+        
+          Retrieves a list of chained matches, after position, matching predicate and separated by characters from seps only.
+        
+        - ``at_match(match, predicate=None, index=None)``
+        
+          Retrieves a list of ``Match`` objects at the same position as match.
+        
+        - ``at_span(span, predicate=None, index=None)``
+        
+          Retrieves a list of ``Match`` objects from given (start, end) tuple.
+        
+        - ``at_index(pos, predicate=None, index=None)``
+        
+          Retrieves a list of ``Match`` objects from given position.
+        
+        - ``names``
+        
+          Retrieves a sequence of all ``Match.name`` properties.
+        
+        - ``tags``
+        
+          Retrieves a sequence of all ``Match.tags`` properties.
+        
+        - ``to_dict(details=False, first_value=False, enforce_list=False)``
+        
+          Convert to an ordered dict, with ``Match.name`` as key and ``Match.value`` as value.
+        
+          It's a subclass of `OrderedDict`_, that contains a ``matches`` property which is a dict with  ``Match.name`` as key
+          and list of ``Match`` objects as value.
+        
+          If ``first_value`` is ``True`` and distinct values are found for the same name, value will be wrapped to a list.
+          If ``False``, first value only will be kept and values lists can be retrieved with ``values_list`` which is a dict
+          with ``Match.name`` as key and list of ``Match.value`` as value.
+        
+          if ``enforce_list`` is ``True``, all values will be wrapped to a list, even if a single value is found.
+        
+          If ``details`` is True, ``Match.value`` objects are replaced with complete ``Match`` object.
+        
+        - ``markers``
+        
+          A custom ``Matches`` sequences specialized for ``markers`` matches (see below)
+        
+        Markers
+        -------
+        
+        If you have defined some patterns with ``markers`` property, then ``Matches.markers`` points to a special ``Matches``
+        sequence that contains only ``markers`` matches. This sequence supports all methods from ``Matches``.
+        
+        Markers matches are not intended to be used in final result, but can be used to implement a ``Rule``.
+        
+        Rules
+        -----
+        Rules are a convenient and readable way to implement advanced conditional logic involving several ``Match`` objects.
+        When a rule is triggered, it can perform an action on ``Matches`` object, like filtering out, adding additional tags or
+        renaming.
+        
+        Rules are implemented by extending the abstract ``Rule`` class. They are registered using ``Rebulk.rule`` method by
+        giving either a ``Rule`` instance, a ``Rule`` class or a module containing ``Rule classes`` only.
+        
+        For a rule to be triggered, ``Rule.when`` method must return ``True``, or a non empty list of ``Match``
+        objects, or any other truthy object. When triggered, ``Rule.then`` method is called to perform the action with
+        ``when_response`` parameter defined as the response of ``Rule.when`` call.
+        
+        Instead of implementing ``Rule.then`` method, you can define ``consequence`` class property with a Consequence classe
+        or instance, like ``RemoveMatch``, ``RenameMatch`` or ``AppendMatch``. You can also use a list of consequence when
+        required : ``when_response`` must then be iterable, and elements of this iterable will be given to each consequence in
+        the same order.
+        
+        When many rules are registered, it can be useful to set ``priority`` class variable to define a priority integer
+        between all rule executions (higher priorities will be executed first). You can also define ``dependency`` to declare
+        another Rule class as dependency for the current rule, meaning that it will be executed before.
+        
+        For all rules with the same ``priority`` value, ``when`` is called before, and ``then`` is called after all.
+        
+        .. code-block:: python
+        
+            >>> from rebulk import Rule, RemoveMatch
+        
+            >>> class FirstOnlyRule(Rule):
+            ...     consequence = RemoveMatch
+            ...
+            ...     def when(self, matches, context):
+            ...         grabbed = matches.named("grabbed", 0)
+            ...         if grabbed and matches.previous(grabbed):
+            ...             return grabbed
+        
+            >>> rebulk = Rebulk()
+        
+            >>> rebulk.regex("This match(.*?)grabbed", name="grabbed")
+            <...Rebulk object ...>
+            >>> rebulk.regex("if it's(.*?)first match", private=True)
+            <...Rebulk object at ...>
+            >>> rebulk.rules(FirstOnlyRule)
+            <...Rebulk object at ...>
+        
+            >>> rebulk.matches("This match is grabbed only if it's the first match")
+            [<This match is grabbed:(0, 21)+name=grabbed>]
+            >>> rebulk.matches("if it's NOT the first match, This match is NOT grabbed")
+            []
+        
+        .. _re module: https://docs.python.org/3/library/re.html
+        .. _regex module: https://pypi.python.org/pypi/regex
+        .. _String methods: https://docs.python.org/3/library/stdtypes.html#str
+        .. _str.find: https://docs.python.org/3/library/stdtypes.html#str.find
+        .. _re.finditer: https://docs.python.org/3/library/re.html#re.finditer
+        .. _re.compile: https://docs.python.org/3/library/re.html#re.compile
+        .. _OrderedDict: https://docs.python.org/2/library/collections.html#collections.OrderedDict
+        
+        
+Keywords: re regexp regular expression search pattern string match
+Platform: UNKNOWN
+Classifier: Development Status :: 5 - Production/Stable
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Operating System :: OS Independent
+Classifier: Intended Audience :: Developers
+Classifier: Programming Language :: Python :: 2
+Classifier: Programming Language :: Python :: 2.6
+Classifier: Programming Language :: Python :: 2.7
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3.3
+Classifier: Programming Language :: Python :: 3.4
+Classifier: Programming Language :: Python :: 3.5
+Classifier: Topic :: Software Development :: Libraries :: Python Modules
diff --git a/README.rst b/README.rst
new file mode 100644
index 0000000..6e1e589
--- /dev/null
+++ b/README.rst
@@ -0,0 +1,511 @@
+ReBulk
+=======
+
+.. image:: http://img.shields.io/pypi/v/rebulk.svg
+    :target: https://pypi.python.org/pypi/rebulk
+    :alt: Latest Version
+
+.. image:: http://img.shields.io/badge/license-MIT-blue.svg
+    :target: https://pypi.python.org/pypi/rebulk
+    :alt: MIT License
+
+.. image:: http://img.shields.io/travis/Toilal/rebulk.svg
+    :target: http://travis-ci.org/Toilal/rebulk?branch=master
+    :alt: Build Status
+
+.. image:: http://img.shields.io/coveralls/Toilal/rebulk.svg
+    :target: https://coveralls.io/r/Toilal/rebulk?branch=master
+    :alt: Coveralls
+
+ReBulk is a python library that performs advanced searches in strings that would be hard to implement using
+`re module`_ or `String methods`_ only.
+
+It includes some features like ``Patterns``, ``Match``, ``Rule`` that allows developers to build a
+custom and complex string matcher using a readable and extendable API.
+
+This project is hosted on GitHub: `<https://github.com/Toilal/rebulk>`_
+
+Install
+-------
+.. code-block:: sh
+
+    $ pip install rebulk
+
+Usage
+------
+Regular expression, string and function based patterns are declared in a ``Rebulk`` object. It use a fluent API to
+chain ``string``, ``regex``, and ``functional`` methods to define various patterns types.
+
+.. code-block:: python
+
+    >>> from rebulk import Rebulk
+    >>> bulk = Rebulk().string('brown').regex(r'qu\w+').functional(lambda s: (20, 25))
+
+When ``Rebulk`` object is fully configured, you can call ``matches`` method with an input string to retrieve all
+``Match`` objects found by registered pattern.
+
+.. code-block:: python
+
+    >>> bulk.matches("The quick brown fox jumps over the lazy dog")
+    [<brown:(10, 15)>, <quick:(4, 9)>, <jumps:(20, 25)>]
+
+If multiple ``Match`` objects are found at the same position, only the longer one is kept.
+
+.. code-block:: python
+
+    >>> bulk = Rebulk().string('lakers').string('la')
+    >>> bulk.matches("the lakers are from la")
+    [<lakers:(4, 10)>, <la:(20, 22)>]
+
+String Patterns
+---------------
+String patterns are based on `str.find`_ method to find matches, but returns all matches in the string. ``ignore_case``
+can be enabled to ignore case.
+
+.. code-block:: python
+
+    >>> Rebulk().string('la').matches("lalalilala")
+    [<la:(0, 2)>, <la:(2, 4)>, <la:(6, 8)>, <la:(8, 10)>]
+
+    >>> Rebulk().string('la').matches("LalAlilAla")
+    [<la:(8, 10)>]
+
+    >>> Rebulk().string('la', ignore_case=True).matches("LalAlilAla")
+    [<La:(0, 2)>, <lA:(2, 4)>, <lA:(6, 8)>, <la:(8, 10)>]
+
+You can define several patterns with a single ``string`` method call.
+
+.. code-block:: python
+
+    >>> Rebulk().string('Winter', 'coming').matches("Winter is coming...")
+    [<Winter:(0, 6)>, <coming:(10, 16)>]
+
+Regular Expression Patterns
+---------------------------
+Regular Expression patterns are based on a compiled regular expression.
+`re.finditer`_ method is used to find matches.
+
+If `regex module`_ is available, it will be used by rebulk instead of default `re module`_.
+
+.. code-block:: python
+
+    >>> Rebulk().regex(r'l\w').matches("lolita")
+    [<lo:(0, 2)>, <li:(2, 4)>]
+
+You can define several patterns with a single ``regex`` method call.
+
+.. code-block:: python
+
+    >>> Rebulk().regex(r'Wint\wr', 'com\w{3}').matches("Winter is coming...")
+    [<Winter:(0, 6)>, <coming:(10, 16)>]
+
+All keyword arguments from `re.compile`_ are supported.
+
+.. code-block:: python
+
+    >>> import re  # import required for flags constant
+    >>> Rebulk().regex('L[A-Z]KERS', flags=re.IGNORECASE) \
+    ...         .matches("The LaKeRs are from La")
+    [<LaKeRs:(4, 10)>]
+
+    >>> Rebulk().regex('L[A-Z]', 'L[A-Z]KERS', flags=re.IGNORECASE) \
+    ...         .matches("The LaKeRs are from La")
+    [<La:(20, 22)>, <LaKeRs:(4, 10)>]
+
+    >>> Rebulk().regex(('L[A-Z]', re.IGNORECASE), ('L[a-z]KeRs')) \
+    ...         .matches("The LaKeRs are from La")
+    [<La:(20, 22)>, <LaKeRs:(4, 10)>]
+
+If `regex module`_ is available, it automatically supports repeated captures.
+
+.. code-block:: python
+
+    >>> # If regex module is available, repeated_captures is True by default.
+    >>> matches = Rebulk().regex(r'(\d+)(?:-(\d+))+').matches("01-02-03-04")
+    >>> matches[0].children # doctest:+SKIP
+    [<01:(0, 2)>, <02:(3, 5)>, <03:(6, 8)>, <04:(9, 11)>]
+
+    >>> # If regex module is not available, or if repeated_captures is forced to False.
+    >>> matches = Rebulk().regex(r'(\d+)(?:-(\d+))+', repeated_captures=False) \
+    ...                   .matches("01-02-03-04")
+    >>> matches[0].children
+    [<01:(0, 2)+initiator=01-02-03-04>, <04:(9, 11)+initiator=01-02-03-04>]
+
+- ``abbreviations``
+
+  Defined as a list of 2-tuple, each tuple is an abbreviation. It simply replace ``tuple[0]`` with ``tuple[1]`` in the
+  expression.
+
+  >>> Rebulk().regex(r'Custom-separators', abbreviations=[("-", "[\W_]+")])\
+  ...         .matches("Custom_separators using-abbreviations")
+  [<Custom_separators:(0, 17)>]
+
+
+Functional Patterns
+-------------------
+Functional Patterns are based on the evaluation of a function.
+
+The function should have the same parameters as ``Rebulk.matches`` method, that is the input string,
+and must return at least start index and end index of the ``Match`` object.
+
+.. code-block:: python
+
+    >>> def func(string):
+    ...     index = string.find('?')
+    ...     if index > -1:
+    ...         return 0, index - 11
+    >>> Rebulk().functional(func).matches("Why do simple ? Forget about it ...")
+    [<Why:(0, 3)>]
+
+You can also return a dict of keywords arguments for ``Match`` object.
+
+You can define several patterns with a single ``functional`` method call, and function used can return multiple
+matches.
+
+Chain Patterns
+--------------
+Chain Patterns are ordered composition of string, functional and regex patterns. Repeater can be set to define
+repetition on chain part.
+
+.. code-block:: python
+
+    >>> r = Rebulk().chain(children=True, formatter={'episode': int, 'version': int}, flags=re.IGNORECASE)\
+    ...             .regex(r'e(?P<episode>\d{1,4})').repeater(1)\
+    ...             .regex(r'v(?P<version>\d+)').repeater('?')\
+    ...             .regex(r'[ex-](?P<episode>\d{1,4})').repeater('*')\
+    ...             .close() # .repeater(1) could be omitted as it's the default behavior
+    >>> r.matches("This is E14v2-15-16-17").to_dict()  # converts matches to dict
+    MatchesDict([('episode', [14, 15, 16, 17]), ('version', 2)])
+
+Patterns parameters
+-------------------
+
+All patterns have options that can be given as keyword arguments.
+
+- ``validator``
+
+  Function to validate ``Match`` value given by the pattern. Can also be a ``dict``, to use ``validator`` with pattern
+  named with key.
+
+  .. code-block:: python
+
+      >>> def check_leap_year(match):
+      ...     return int(match.value) in [1980, 1984, 1988]
+      >>> matches = Rebulk().regex(r'\d{4}', validator=check_leap_year) \
+      ...                   .matches("In year 1982 ...")
+      >>> len(matches)
+      0
+      >>> matches = Rebulk().regex(r'\d{4}', validator=check_leap_year) \
+      ...                   .matches("In year 1984 ...")
+      >>> len(matches)
+      1
+
+Some base validator functions are available in ``rebulk.validators`` module. Most of those functions have to be
+configured using ``functools.partial`` to map them to function accepting a single ``match`` argument.
+
+- ``formatter``
+
+  Function to convert ``Match`` value given by the pattern. Can also be a ``dict``, to use ``formatter`` with matches
+  named with key.
+
+  .. code-block:: python
+
+      >>> def year_formatter(value):
+      ...     return int(value)
+      >>> matches = Rebulk().regex(r'\d{4}', formatter=year_formatter) \
+      ...                   .matches("In year 1982 ...")
+      >>> isinstance(matches[0].value, int)
+      True
+
+- ``post_processor``
+
+  Function to change the default output of the pattern. Function parameters are Matches list and Pattern object.
+
+- ``name``
+
+  The name of the pattern. It is automatically passed to ``Match`` objects generated by this pattern.
+
+- ``tags``
+
+  A list of string that qualifies this pattern.
+
+- ``value``
+
+  Override value property for generated ``Match`` objects. Can also be a ``dict``, to use ``value`` with pattern
+  named with key.
+
+- ``validate_all``
+
+  By default, validator is called for returned ``Match`` objects only. Enable this option to validate them all, parent
+  and children included.
+
+- ``format_all``
+
+  By default, formatter is called for returned ``Match`` values only. Enable this option to format them all, parent and
+  children included.
+
+- ``disabled``
+
+  A ``function(context)`` to disable the pattern if returning ``True``.
+
+- ``children``
+
+  If ``True``, all children ``Match`` objects will be retrieved instead of a single parent ``Match`` object.
+
+- ``private``
+
+  If ``True``, ``Match`` objects generated from this pattern are available internally only. They will be removed at
+  the end of ``Rebulk.matches`` method call.
+
+- ``private_parent``
+
+  Force parent matches to be returned and flag them as private.
+
+- ``private_children``
+
+  Force children matches to be returned and flag them as private.
+
+- ``private_names``
+
+  Matches names that will be declared as private
+
+- ``ignore_names``
+
+  Matches names that will be ignored from the pattern output, after validation.
+
+- ``marker``
+
+  If ``true``, ``Match`` objects generated from this pattern will be markers matches instead of standard matches.
+  They won't be included in ``Matches`` sequence, but will be available in ``Matches.markers`` sequence (see
+  ``Markers`` section).
+
+
+Match
+-----
+
+A ``Match`` object is the result created by a registered pattern.
+
+It has a ``value`` property defined, and position indices are available through ``start``, ``end`` and ``span``
+properties.
+
+In some case, it contains children ``Match`` objects in ``children`` property, and each child ``Match`` object
+reference its parent in ``parent`` property. Also, a ``name`` property can be defined for the match.
+
+If groups are defined in a Regular Expression pattern, each group match will be converted to a
+single ``Match`` object. If a group has a name defined (``(?P<name>group)``), it is set as ``name`` property in a child
+``Match`` object. The whole regexp match (``re.group(0)``) will be converted to the main ``Match`` object,
+and all subgroups (1, 2, ... n) will be converted to ``children`` matches of the main ``Match`` object.
+
+.. code-block:: python
+
+    >>> matches = Rebulk() \
+    ...         .regex(r"One, (?P<one>\w+), Two, (?P<two>\w+), Three, (?P<three>\w+)") \
+    ...         .matches("Zero, 0, One, 1, Two, 2, Three, 3, Four, 4")
+    >>> matches
+    [<One, 1, Two, 2, Three, 3:(9, 33)>]
+    >>> for child in matches[0].children:
+    ...     '%s = %s' % (child.name, child.value)
+    'one = 1'
+    'two = 2'
+    'three = 3'
+
+It's possible to retrieve only children by using ``children`` parameters. You can also customize the way structure
+is generated with ``every``, ``private_parent`` and ``private_children`` parameters.
+
+.. code-block:: python
+
+    >>> matches = Rebulk() \
+    ...         .regex(r"One, (?P<one>\w+), Two, (?P<two>\w+), Three, (?P<three>\w+)", children=True) \
+    ...         .matches("Zero, 0, One, 1, Two, 2, Three, 3, Four, 4")
+    >>> matches
+    [<1:(14, 15)+name=one+initiator=One, 1, Two, 2, Three, 3>, <2:(22, 23)+name=two+initiator=One, 1, Two, 2, Three, 3>, <3:(32, 33)+name=three+initiator=One, 1, Two, 2, Three, 3>]
+
+Match object has the following properties that can be given to Pattern objects
+
+- ``formatter``
+
+  Function to convert ``Match`` value given by the pattern. Can also be a ``dict``, to use ``formatter`` with matches
+  named with key.
+
+  .. code-block:: python
+
+      >>> def year_formatter(value):
+      ...     return int(value)
+      >>> matches = Rebulk().regex(r'\d{4}', formatter=year_formatter) \
+      ...                   .matches("In year 1982 ...")
+      >>> isinstance(matches[0].value, int)
+      True
+
... 11752 lines suppressed ...

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/python-modules/packages/python-rebulk.git



More information about the Python-modules-commits mailing list