[Python-modules-commits] [python-zxcvbn] 01/01: new upstream version

Sat May 6 17:48:37 UTC 2017

This is an automated email from the git hooks/post-receive script.

sprab-guest pushed a commit to branch upstream
in repository python-zxcvbn.

commit 874defbdaede2ed33c25e75b480c311a220dd9f8
Author: sprab-guest <sprab at onenetbeyond.org>
Date:   Wed Apr 19 23:01:12 2017 +0200

    new upstream version
---
 LICENSE.txt                                      |  20 -
 PKG-INFO                                         |  12 +
 README                                           |  72 ---
 README.rst                                       | 138 +++++
 setup.py                                         |  22 +-
 tests.txt                                        |  34 --
 zxcvbn/__init__.py                               |  34 +-
 zxcvbn/adjacency_graphs.py                       |   7 +
 zxcvbn/scoring.py                                | 613 +++++++++++++----------
 zxcvbn/scripts/build_frequency_lists.py          | 188 -------
 zxcvbn/scripts/build_keyboard_adjacency_graph.py |  92 ----
 zxcvbn/time_estimates.py                         |  77 +++
 12 files changed, 612 insertions(+), 697 deletions(-)

diff --git a/LICENSE.txt b/LICENSE.txt
deleted file mode 100644
index 6613022..0000000
--- a/LICENSE.txt
+++ /dev/null
@@ -1,20 +0,0 @@
-Copyright (c) 2012 Dropbox, Inc.
-
-Permission is hereby granted, free of charge, to any person obtaining
-a copy of this software and associated documentation files (the
-"Software"), to deal in the Software without restriction, including
-without limitation the rights to use, copy, modify, merge, publish,
-distribute, sublicense, and/or sell copies of the Software, and to
-permit persons to whom the Software is furnished to do so, subject to
-the following conditions:
-
-The above copyright notice and this permission notice shall be
-included in all copies or substantial portions of the Software.
-
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
-EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
-MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
-NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
-LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
-OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
-WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
diff --git a/PKG-INFO b/PKG-INFO
new file mode 100644
index 0000000..561a1c3
--- /dev/null
+++ b/PKG-INFO
@@ -0,0 +1,12 @@
+Metadata-Version: 1.1
+Name: zxcvbn-python
+Version: 4.4.14
+Summary: Python implementation of Dropbox's realistic password strength estimator, zxcvbn
+Home-page: https://github.com/dwolfhub/zxcvbn-python
+Author: Daniel Wolf
+Author-email: danielrwolf5 at gmail.com
+License: MIT
+Download-URL: https://github.com/dwolfhub/zxcvbn-python/tarball/v4.4.14
+Description: UNKNOWN
+Keywords: zxcvbn,password,security
+Platform: UNKNOWN
diff --git a/README b/README
deleted file mode 100644
index 0bf744d..0000000
--- a/README
+++ /dev/null
@@ -1,72 +0,0 @@
-This is a python port of zxcvbn, which is a JavaScript password strength
-generator. zxcvbn attempts to give sound password advice through pattern
-matching and conservative entropy calculations. It finds 10k common passwords,
-common American names and surnames, common English words, and common patterns
-like dates, repeats (aaa), sequences (abcd), and QWERTY patterns.
-
-Please refer to http://tech.dropbox.com/?p=165 for the full details and
-motivation behind zxcbvn. The source code for the original JavaScript (well,
-actually CoffeeScript) implementation can be found at:
-
-https://github.com/lowe/zxcvbn
-
-
-For full motivation, see:
-
-http://tech.dropbox.com/?p=165
-
-------------------------------------------------------------------------
-Use
-------------------------------------------------------------------------
-
-The zxcvbn module exports the password_strength() function. Import zxcvbn, and
-call password_strength(password, user_inputs=[]).  The function will return a
-result dictionary with the following keys:
-
-entropy            # bits
-
-crack_time         # estimation of actual crack time, in seconds.
-
-crack_time_display # same crack time, as a friendlier string:
-                   # "instant", "6 minutes", "centuries", etc.
-
-score              # [0,1,2,3,4] if crack time is less than
-                   # [10**2, 10**4, 10**6, 10**8, Infinity].
-                   # (useful for implementing a strength bar.)
-
-match_sequence     # the list of patterns that zxcvbn based the
-                   # entropy calculation on.
-
-calculation_time   # how long it took to calculate an answer,
-                   # in milliseconds. usually only a few ms.
-
-The optional user_inputs argument is an array of strings that zxcvbn
-will add to its internal dictionary. This can be whatever list of
-strings you like, but is meant for user inputs from other fields of the
-form, like name and email. That way a password that includes the user's
-personal info can be heavily penalized. This list is also good for
-site-specific vocabulary.
-
-Bug reports and pull requests welcome!
-
-------------------------------------------------------------------------
-Acknowledgments
-------------------------------------------------------------------------
-
-Dropbox, thank you again for supporting independent projects both inside and
-outside of hackweek.
-
-Thanks to Dan Wheeler (https://github.com/lowe) for the CoffeeScript implementation
-(see above.) To repeat his outside acknowledgements (which remain useful, as always):
-
-Many thanks to Mark Burnett for releasing his 10k top passwords list:
-http://xato.net/passwords/more-top-worst-passwords
-and for his 2006 book,
-"Perfect Passwords: Selection, Protection, Authentication"
-
-Huge thanks to Wiktionary contributors for building a frequency list
-of English as used in television and movies:
-http://en.wiktionary.org/wiki/Wiktionary:Frequency_lists
-
-Last but not least, big thanks to xkcd :)
-https://xkcd.com/936/
diff --git a/README.rst b/README.rst
new file mode 100644
index 0000000..fc92a0c
--- /dev/null
+++ b/README.rst
@@ -0,0 +1,138 @@
+|Build Status|
+
+zxcvbn-python
+=============
+
+A realistic password strength estimator.
+
+This is a Python implementation of the library created by the team at Dropbox.
+The original library, written for JavaScript, can be found
+`here <https://github.com/dropbox/zxcvbn>`__.
+
+While there may be other Python ports available, this one is the most up
+to date and is recommended by the original developers of zxcvbn at this
+time.
+
+
+Features
+--------
+- **Tested in Python versions 2.6-2.7, 3.3-3.6**
+- Accepts user data to be added to the dictionaries that are tested against (name, birthdate, etc)
+- Gives a score to the password, from 0 (terrible) to 4 (great)
+- Provides feedback on the password and ways to improve it
+- Returns time estimates on how long it would take to guess the password in different situations
+
+Installation
+------------
+
+Install the package using pip: ``pip install zxcvbn-python``
+
+Usage
+-----
+
+Pass a password as the first parameter, and a list of user-provided
+inputs as the ``user_inputs`` parameter (optional).
+
+.. code:: python
+
+    from zxcvbn import zxcvbn
+
+    results = zxcvbn('JohnSmith123', user_inputs=['John', 'Smith'])
+
+    print(results)
+
+Output:
+
+::
+
+    {
+        'password': 'JohnSmith123', 
+        'score': 2, 
+        'guesses': 2567800, 
+        'guesses_log10': 6.409561194521849, 
+        'calc_time': datetime.timedelta(0, 0, 5204)
+        'feedback': {
+            'warning': '', 
+            'suggestions': [
+                'Add another word or two. Uncommon words are better.', 
+                "Capitalization doesn't help very much"
+            ]
+        }, 
+        'crack_times_display': {
+            'offline_fast_hashing_1e10_per_second': 'less than a second'
+            'offline_slow_hashing_1e4_per_second': '4 minutes', 
+            'online_no_throttling_10_per_second': '3 days', 
+            'online_throttling_100_per_hour': '3 years', 
+        }, 
+        'crack_times_seconds': {
+            'offline_fast_hashing_1e10_per_second': 0.00025678, 
+            'offline_slow_hashing_1e4_per_second': 256.78
+            'online_no_throttling_10_per_second': 256780.0, 
+            'online_throttling_100_per_hour': 92440800.0, 
+        }, 
+        'sequence': [{
+            'matched_word': 'john', 
+            'rank': 2, 
+            'pattern': 'dictionary', 
+            'reversed': False, 
+            'token': 'John', 
+            'l33t': False, 
+            'uppercase_variations': 2, 
+            'i': 0, 
+            'guesses': 50, 
+            'l33t_variations': 1, 
+            'dictionary_name': 'male_names', 
+            'base_guesses': 2, 
+            'guesses_log10': 1.6989700043360185, 
+            'j': 3
+        }, {
+            'matched_word': 'smith123', 
+            'rank': 12789, 
+            'pattern': 'dictionary', 
+            'reversed': False, 
+            'token': 'Smith123', 
+            'l33t': False, 
+            'uppercase_variations': 2, 
+            'i': 4, 
+            'guesses': 25578, 
+            'l33t_variations': 1, 
+            'dictionary_name': 'passwords', 
+            'base_guesses': 12789, 
+            'guesses_log10': 4.407866583030775, 
+            'j': 11
+        }], 
+    }
+
+
+Custom Ranked Dictionaries
+--------------------------
+
+In order to support more languages or just add password dictionaries of your own, there is a helper function you may use.
+
+.. code:: python
+
+    from zxcvbn.matching import add_frequency_lists
+
+    add_frequency_lists({
+        'my_list': ['foo', 'bar'],
+        'another_list': ['baz']
+    })
+
+These lists will be added to the current ones, but you can also overwrite the current ones if you wish.
+The lists you add should be in order of how common the word is used with the most common words appearing first.
+
+
+Contribute
+----------
+
+- Report an Issue: https://github.com/dwolfhub/zxcvbn-python/issues
+- Submit a Pull Request: https://github.com/dwolfhub/zxcvbn-python/pulls
+
+License
+-------
+
+The project is licensed under the MIT license.
+
+
+.. |Build Status| image:: https://travis-ci.org/dwolfhub/zxcvbn-python.svg?branch=master
+   :target: https://travis-ci.org/dwolfhub/zxcvbn-python
diff --git a/setup.py b/setup.py
index 24af8dc..5550a37 100644
--- a/setup.py
+++ b/setup.py
@@ -1,11 +1,15 @@
 from distutils.core import setup
 
-setup(name='zxcvbn',
-      version='1.0',
-      description='Password strength estimator',
-      author='Ryan Pearl',
-      author_email='rpearl at dropbox.com',
-      url='https://www.github.com/rpearl/python-zxcvbn',
-      packages=['zxcvbn'],
-      package_data={'zxcvbn': ['generated/frequency_lists.json', 'generated/adjacency_graphs.json']}
-     )
+setup(
+    name='zxcvbn-python',
+    version='4.4.14',
+    packages=['zxcvbn'],
+    url='https://github.com/dwolfhub/zxcvbn-python',
+    download_url='https://github.com/dwolfhub/zxcvbn-python/tarball/v4.4.14',
+    license='MIT',
+    author='Daniel Wolf',
+    author_email='danielrwolf5 at gmail.com',
+    description='Python implementation of Dropbox\'s realistic password '
+                'strength estimator, zxcvbn',
+    keywords=['zxcvbn', 'password', 'security'],
+)
diff --git a/tests.txt b/tests.txt
deleted file mode 100644
index c7c2d60..0000000
--- a/tests.txt
+++ /dev/null
@@ -1,34 +0,0 @@
-zxcvbn
-qwER43@!
-Tr0ub4dour&3
-correcthorsebatterystaple
-coRrecth0rseba++ery9.23.2007staple$
-D0g..................
-abcdefghijk987654321
-neverforget13/3/1997
-1qaz2wsx3edc
-temppass22
-briansmith
-briansmith4mayor
-password1
-viking
-thx1138
-ScoRpi0ns
-do you know
-ryanhunter2000
-rianhunter2000
-asdfghju7654rewq
-AOEUIDHG&*()LS_
-12345678
-defghi6789
-rosebud
-Rosebud
-ROSEBUD
-rosebuD
-ros3bud99
-r0s3bud99
-R0$38uD99
-verlineVANDERMARK
-eheuczkqyq
-rWibMFACxAUGZmxhVncy
-Ba9ZyWABu99[BK#6MBgbH88Tofv)vs$
diff --git a/zxcvbn/__init__.py b/zxcvbn/__init__.py
index 7b444a4..11964eb 100644
--- a/zxcvbn/__init__.py
+++ b/zxcvbn/__init__.py
@@ -1,18 +1,26 @@
-from zxcvbn import main
+from datetime import datetime
 
-__all__ = ['password_strength']
+from . import matching, scoring, time_estimates, feedback
 
-password_strength = main.password_strength
 
+def zxcvbn(password, user_inputs=None):
+    if user_inputs is None:
+        user_inputs = []
 
-if __name__ == '__main__':
-    import fileinput
-    ignored = ('match_sequence', 'password')
+    start = datetime.now()
 
-    for line in fileinput.input():
-        pw = line.strip()
-        print "Password: " + pw
-        out = password_strength(pw)
-        for key, value in out.iteritems():
-            if key not in ignored:
-                print "\t%s: %s" % (key, value)
+    sanitized_inputs = [str(arg).lower() for arg in user_inputs]
+    matching.set_user_input_dictionary(sanitized_inputs)
+
+    matches = matching.omnimatch(password)
+    result = scoring.most_guessable_match_sequence(password, matches)
+    result['calc_time'] = datetime.now() - start
+
+    attack_times = time_estimates.estimate_attack_times(result['guesses'])
+    for prop, val in attack_times.items():
+        result[prop] = val
+
+    result['feedback'] = feedback.get_feedback(result['score'],
+                                               result['sequence'])
+
+    return result
diff --git a/zxcvbn/adjacency_graphs.py b/zxcvbn/adjacency_graphs.py
new file mode 100644
index 0000000..6fd08d2
--- /dev/null
+++ b/zxcvbn/adjacency_graphs.py
@@ -0,0 +1,7 @@
+# generated by scripts/build_keyboard_adjacency_graphs.py
+ADJACENCY_GRAPHS = {
+    "qwerty": {"!": ["`~", None, None, "2@", "qQ", None], "\"": [";:", "[{", "]}", None, None, "/?"], "#": ["2@", None, None, "4$", "eE", "wW"], "$": ["3#", None, None, "5%", "rR", "eE"], "%": ["4$", None, None, "6^", "tT", "rR"], "&": ["6^", None, None, "8*", "uU", "yY"], "'": [";:", "[{", "]}", None, None, "/?"], "(": ["8*", None, None, "0)", "oO", "iI"], ")": ["9(", None, None, "-_", "pP", "oO"], "*": ["7&", None, None, "9(", "iI", "uU"], "+": ["-_", None, None, None, "]}", "[{"], "," [...]
+    "dvorak": {"!": ["`~", None, None, "2@", "'\"", None], "\"": [None, "1!", "2@", ",<", "aA", None], "#": ["2@", None, None, "4$", ".>", ",<"], "$": ["3#", None, None, "5%", "pP", ".>"], "%": ["4$", None, None, "6^", "yY", "pP"], "&": ["6^", None, None, "8*", "gG", "fF"], "'": [None, "1!", "2@", ",<", "aA", None], "(": ["8*", None, None, "0)", "rR", "cC"], ")": ["9(", None, None, "[{", "lL", "rR"], "*": ["7&", None, None, "9(", "cC", "gG"], "+": ["/?", "]}", None, "\\|", None, "-_"], " [...]
+    "keypad": {"*": ["/", None, None, None, "-", "+", "9", "8"], "+": ["9", "*", "-", None, None, None, None, "6"], "-": ["*", None, None, None, None, None, "+", "9"], ".": ["0", "2", "3", None, None, None, None, None], "/": [None, None, None, None, "*", "9", "8", "7"], "0": [None, "1", "2", "3", ".", None, None, None], "1": [None, None, "4", "5", "2", "0", None, None], "2": ["1", "4", "5", "6", "3", ".", "0", None], "3": ["2", "5", "6", None, None, None, ".", "0"], "4": [None, None, "7" [...]
+    "mac_keypad": {"*": ["/", None, None, None, None, None, "-", "9"], "+": ["6", "9", "-", None, None, None, None, "3"], "-": ["9", "/", "*", None, None, None, "+", "6"], ".": ["0", "2", "3", None, None, None, None, None], "/": ["=", None, None, None, "*", "-", "9", "8"], "0": [None, "1", "2", "3", ".", None, None, None], "1": [None, None, "4", "5", "2", "0", None, None], "2": ["1", "4", "5", "6", "3", ".", "0", None], "3": ["2", "5", "6", "+", None, None, ".", "0"], "4": [None, None, " [...]
+}
\ No newline at end of file
diff --git a/zxcvbn/scoring.py b/zxcvbn/scoring.py
index 8903f53..250ab49 100644
--- a/zxcvbn/scoring.py
+++ b/zxcvbn/scoring.py
@@ -1,343 +1,418 @@
-import math
+from math import log, factorial
+
 import re
 
-from zxcvbn.matching import (KEYBOARD_STARTING_POSITIONS, KEYBOARD_AVERAGE_DEGREE,
-                             KEYPAD_STARTING_POSITIONS, KEYPAD_AVERAGE_DEGREE)
+from .adjacency_graphs import ADJACENCY_GRAPHS
+
+
+def calc_average_degree(graph):
+    average = 0
+
+    for key, neighbors in graph.items():
+        average += len([n for n in neighbors if n])
+    average /= float(len(graph.items()))
+
+    return average
+
+
+BRUTEFORCE_CARDINALITY = 10
+MIN_GUESSES_BEFORE_GROWING_SEQUENCE = 10000
+MIN_SUBMATCH_GUESSES_SINGLE_CHAR = 10
+MIN_SUBMATCH_GUESSES_MULTI_CHAR = 50
+
+MIN_YEAR_SPACE = 20
+REFERENCE_YEAR = 2017
 
-def binom(n, k):
-    """
-    Returns binomial coefficient (n choose k).
-    """
-    # http://blog.plover.com/math/choose.html
+
+def nCk(n, k):
+    """http://blog.plover.com/math/choose.html"""
     if k > n:
         return 0
     if k == 0:
         return 1
-    result = 1
-    for denom in range(1, k + 1):
-        result *= n
-        result /= denom
+
+    r = 1
+    for d in range(1, k + 1):
+        r *= n
+        r /= d
         n -= 1
-    return result
 
+    return r
 
-def lg(n):
-    """
-    Returns logarithm of n in base 2.
-    """
-    return math.log(n, 2)
 
 # ------------------------------------------------------------------------------
-# minimum entropy search -------------------------------------------------------
+# search --- most guessable match sequence -------------------------------------
 # ------------------------------------------------------------------------------
 #
-# takes a list of overlapping matches, returns the non-overlapping sublist with
-# minimum entropy. O(nm) dp alg for length-n password with m candidate matches.
+# takes a sequence of overlapping matches, returns the non-overlapping sequence with
+# minimum guesses. the following is a O(l_max * (n + m)) dynamic programming algorithm
+# for a length-n password with m candidate matches. l_max is the maximum optimal
+# sequence length spanning each prefix of the password. In practice it rarely exceeds 5 and the
+# search terminates rapidly.
+#
+# the optimal "minimum guesses" sequence is here defined to be the sequence that
+# minimizes the following function:
+#
+#    g = l! * Product(m.guesses for m in sequence) + D^(l - 1)
+#
+# where l is the length of the sequence.
+#
+# the factorial term is the number of ways to order l patterns.
+#
+# the D^(l-1) term is another length penalty, roughly capturing the idea that an
+# attacker will try lower-length sequences first before trying length-l sequences.
+#
+# for example, consider a sequence that is date-repeat-dictionary.
+#  - an attacker would need to try other date-repeat-dictionary combinations,
+#    hence the product term.
+#  - an attacker would need to try repeat-date-dictionary, dictionary-repeat-date,
+#    ..., hence the factorial term.
+#  - an attacker would also likely try length-1 (dictionary) and length-2 (dictionary-date)
+#    sequences before length-3. assuming at minimum D guesses per pattern type,
+#    D^(l-1) approximates Sum(D^i for i in [1..l-1]
+#
 # ------------------------------------------------------------------------------
-def get(a, i):
-    if i < 0 or i >= len(a):
-        return 0
-    return a[i]
-
-
-def minimum_entropy_match_sequence(password, matches):
-    """
-    Returns minimum entropy
-
-    Takes a list of overlapping matches, returns the non-overlapping sublist with
-    minimum entropy. O(nm) dp alg for length-n password with m candidate matches.
-    """
-    bruteforce_cardinality = calc_bruteforce_cardinality(password) # e.g. 26 for lowercase
-    up_to_k = [0] * len(password) # minimum entropy up to k.
-    # for the optimal sequence of matches up to k, holds the final match (match['j'] == k). null means the sequence ends
-    # without a brute-force character.
-    backpointers = []
-    for k in range(0, len(password)):
-        # starting scenario to try and beat: adding a brute-force character to the minimum entropy sequence at k-1.
-        up_to_k[k] = get(up_to_k, k-1) + lg(bruteforce_cardinality)
-        backpointers.append(None)
-        for match in matches:
-            if match['j'] != k:
-                continue
-            i, j = match['i'], match['j']
-            # see if best entropy up to i-1 + entropy of this match is less than the current minimum at j.
-            up_to = get(up_to_k, i-1)
-            candidate_entropy = up_to + calc_entropy(match)
-            if candidate_entropy < up_to_k[j]:
-                #print "New minimum: using " + str(match)
-                #print "Entropy: " + str(candidate_entropy)
-                up_to_k[j] = candidate_entropy
-                backpointers[j] = match
-
-    # walk backwards and decode the best sequence
-    match_sequence = []
-    k = len(password) - 1
-    while k >= 0:
-        match = backpointers[k]
-        if match:
-            match_sequence.append(match)
-            k = match['i'] - 1
-        else:
-            k -= 1
-    match_sequence.reverse()
+def most_guessable_match_sequence(password, matches, _exclude_additive=False):
+    n = len(password)
+
+    # partition matches into sublists according to ending index j
+    matches_by_j = [[] for _ in range(n)]
+    try:
+        for m in matches:
+            matches_by_j[m['j']].append(m)
+    except TypeError:
+        pass
+    # small detail: for deterministic output, sort each sublist by i.
+    for lst in matches_by_j:
+        lst.sort(key=lambda m1: m1['i'])
+
+    optimal = {
+        # optimal.m[k][l] holds final match in the best length-l match sequence
+        # covering the password prefix up to k, inclusive.
+        # if there is no length-l sequence that scores better (fewer guesses)
+        # than a shorter match sequence spanning the same prefix,
+        # optimal.m[k][l] is undefined.
+        'm': [{} for _ in range(n)],
+
+        # same structure as optimal.m -- holds the product term Prod(m.guesses
+        # for m in sequence). optimal.pi allows for fast (non-looping) updates
+        # to the minimization function.
+        'pi': [{} for _ in range(n)],
+
+        # same structure as optimal.m -- holds the overall metric.
+        'g': [{} for _ in range(n)],
+    }
 
-    # fill in the blanks between pattern matches with bruteforce "matches"
-    # that way the match sequence fully covers the password: match1.j == match2.i - 1 for every adjacent match1, match2.
+    # helper: considers whether a length-l sequence ending at match m is better
+    # (fewer guesses) than previously encountered sequences, updating state if
+    # so.
+    def update(m, l):
+        k = m['j']
+        pi = estimate_guesses(m, password)
+        if l > 1:
+            # we're considering a length-l sequence ending with match m:
+            # obtain the product term in the minimization function by
+            # multiplying m's guesses by the product of the length-(l-1)
+            # sequence ending just before m, at m.i - 1.
+            pi *= optimal['pi'][m['i'] - 1][l - 1]
+        # calculate the minimization func
+        g = factorial(l) * pi
+        if not _exclude_additive:
+            g += MIN_GUESSES_BEFORE_GROWING_SEQUENCE ** (l - 1)
+
+        # update state if new best.
+        # first see if any competing sequences covering this prefix, with l or
+        # fewer matches, fare better than this sequence. if so, skip it and
+        # return.
+        for competing_l, competing_g in optimal['g'][k].items():
+            if competing_l > l:
+                continue
+            if competing_g <= g:
+                return
+
+        # this sequence might be part of the final optimal sequence.
+        optimal['g'][k][l] = g
+        optimal['m'][k][l] = m
+        optimal['pi'][k][l] = pi
+
+    # helper: evaluate bruteforce matches ending at k.
+    def bruteforce_update(k):
+        # see if a single bruteforce match spanning the k-prefix is optimal.
+        m = make_bruteforce_match(0, k)
+        update(m, 1)
+        for i in range(1, k):
+            # generate k bruteforce matches, spanning from (i=1, j=k) up to
+            # (i=k, j=k). see if adding these new matches to any of the
+            # sequences in optimal[i-1] leads to new bests.
+            m = make_bruteforce_match(i, k)
+            for l, last_m in optimal['m'][i - 1].items():
+                l = int(l)
+
+                # corner: an optimal sequence will never have two adjacent
+                # bruteforce matches. it is strictly better to have a single
+                # bruteforce match spanning the same region: same contribution
+                # to the guess product with a lower length.
+                # --> safe to skip those cases.
+                if last_m.get('pattern', False) == 'bruteforce':
+                    continue
+
+                # try adding m to this length-l sequence.
+                update(m, l + 1)
+
+    # helper: make bruteforce match objects spanning i to j, inclusive.
     def make_bruteforce_match(i, j):
         return {
             'pattern': 'bruteforce',
+            'token': password[i:j + 1],
             'i': i,
             'j': j,
-            'token': password[i:j+1],
-            'entropy': lg(math.pow(bruteforce_cardinality, j - i + 1)),
-            'cardinality': bruteforce_cardinality,
         }
-    k = 0
-    match_sequence_copy = []
-    for match in match_sequence:
-        i, j = match['i'], match['j']
-        if i - k > 0:
-            match_sequence_copy.append(make_bruteforce_match(k, i - 1))
-        k = j + 1
-        match_sequence_copy.append(match)
-
-    if k < len(password):
-        match_sequence_copy.append(make_bruteforce_match(k, len(password) - 1))
-    match_sequence = match_sequence_copy
-
-    min_entropy = 0 if len(password) == 0 else up_to_k[len(password) - 1] # corner case is for an empty password ''
-    crack_time = entropy_to_crack_time(min_entropy)
+
+    # helper: step backwards through optimal.m starting at the end,
+    # constructing the final optimal match sequence.
+    def unwind(n):
+        optimal_match_sequence = []
+        k = n - 1
+        # find the final best sequence length and score
+        l = None
+        g = float('inf')
+        for candidate_l, candidate_g in optimal['g'][k].items():
+            if candidate_g < g:
+                l = candidate_l
+                g = candidate_g
+
+        while k >= 0:
+            m = optimal['m'][k][l]
+            optimal_match_sequence.insert(0, m)
+            k = m['i'] - 1
+            l -= 1
+
+        return optimal_match_sequence
+
+    for k in range(n):
+        for m in matches_by_j[k]:
+            if m['i'] > 0:
+                for l in optimal['m'][m['i'] - 1]:
+                    l = int(l)
+                    update(m, l + 1)
+            else:
+                update(m, 1)
+        bruteforce_update(k)
+
+    optimal_match_sequence = unwind(n)
+    optimal_l = len(optimal_match_sequence)
+
+    # corner: empty password
+    if len(password) == 0:
+        guesses = 1
+    else:
+        guesses = optimal['g'][n - 1][optimal_l]
 
     # final result object
     return {
         'password': password,
-        'entropy': round_to_x_digits(min_entropy, 3),
-        'match_sequence': match_sequence,
-        'crack_time': round_to_x_digits(crack_time, 3),
-        'crack_time_display': display_time(crack_time),
-        'score': crack_time_to_score(crack_time),
+        'guesses': guesses,
+        'guesses_log10': log(guesses, 10),
+        'sequence': optimal_match_sequence,
     }
 
 
-def round_to_x_digits(number, digits):
-    """
-    Returns 'number' rounded to 'digits' digits.
-    """
-    return round(number * math.pow(10, digits)) / math.pow(10, digits)
+def estimate_guesses(match, password):
+    if match.get('guesses', False):
+        return match['guesses']
 
-# ------------------------------------------------------------------------------
-# threat model -- stolen hash catastrophe scenario -----------------------------
-# ------------------------------------------------------------------------------
-#
-# assumes:
-# * passwords are stored as salted hashes, different random salt per user.
-#   (making rainbow attacks infeasable.)
-# * hashes and salts were stolen. attacker is guessing passwords at max rate.
-# * attacker has several CPUs at their disposal.
-# ------------------------------------------------------------------------------
+    min_guesses = 1
+    if len(match['token']) < len(password):
+        if len(match['token']) == 1:
+            min_guesses = MIN_SUBMATCH_GUESSES_SINGLE_CHAR
+        else:
+            min_guesses = MIN_SUBMATCH_GUESSES_MULTI_CHAR
+
+    estimation_functions = {
+        'bruteforce': bruteforce_guesses,
+        'dictionary': dictionary_guesses,
+        'spatial': spatial_guesses,
+        'repeat': repeat_guesses,
+        'sequence': sequence_guesses,
+        'regex': regex_guesses,
+        'date': date_guesses,
+    }
 
-# for a hash function like bcrypt/scrypt/PBKDF2, 10ms per guess is a safe lower bound.
-# (usually a guess would take longer -- this assumes fast hardware and a small work factor.)
-# adjust for your site accordingly if you use another hash function, possibly by
-# several orders of magnitude!
-SINGLE_GUESS = .010
-NUM_ATTACKERS = 100 # number of cores guessing in parallel.
+    guesses = estimation_functions[match['pattern']](match)
+    match['guesses'] = max(guesses, min_guesses)
+    match['guesses_log10'] = log(match['guesses'], 10)
 
-SECONDS_PER_GUESS = SINGLE_GUESS / NUM_ATTACKERS
+    return match['guesses']
 
 
-def entropy_to_crack_time(entropy):
-    return (0.5 * math.pow(2, entropy)) * SECONDS_PER_GUESS # average, not total
+def bruteforce_guesses(match):
+    guesses = BRUTEFORCE_CARDINALITY ** len(match['token'])
+    # small detail: make bruteforce matches at minimum one guess bigger than
+    # smallest allowed submatch guesses, such that non-bruteforce submatches
+    # over the same [i..j] take precedence.
+    if len(match['token']) == 1:
+        min_guesses = MIN_SUBMATCH_GUESSES_SINGLE_CHAR + 1
+    else:
+        min_guesses = MIN_SUBMATCH_GUESSES_MULTI_CHAR + 1
 
+    return max(guesses, min_guesses)
 
-def crack_time_to_score(seconds):
-    if seconds < math.pow(10, 2):
-        return 0
-    if seconds < math.pow(10, 4):
-        return 1
-    if seconds < math.pow(10, 6):
-        return 2
-    if seconds < math.pow(10, 8):
-        return 3
-    return 4
 
-# ------------------------------------------------------------------------------
-# entropy calcs -- one function per match pattern ------------------------------
-# ------------------------------------------------------------------------------
+def dictionary_guesses(match):
+    # keep these as properties for display purposes
+    match['base_guesses'] = match['rank']
+    match['uppercase_variations'] = uppercase_variations(match)
+    match['l33t_variations'] = l33t_variations(match)
+    reversed_variations = match.get('reversed', False) and 2 or 1
+
+    return match['base_guesses'] * match['uppercase_variations'] * \
+           match['l33t_variations'] * reversed_variations
+
 
-def calc_entropy(match):
-    if 'entropy' in match: return match['entropy']
-
-    if match['pattern'] == 'repeat':
-        entropy_func = repeat_entropy
-    elif match['pattern'] == 'sequence':
-        entropy_func = sequence_entropy
-    elif match['pattern'] == 'digits':
-        entropy_func = digits_entropy
-    elif match['pattern'] == 'year':
-        entropy_func = year_entropy
-    elif match['pattern'] == 'date':
-        entropy_func = date_entropy
-    elif match['pattern'] == 'spatial':
-        entropy_func = spatial_entropy
-    elif match['pattern'] == 'dictionary':
-        entropy_func = dictionary_entropy
-    match['entropy'] = entropy_func(match)
-    return match['entropy']
-
-
-def repeat_entropy(match):
-    cardinality = calc_bruteforce_cardinality(match['token'])
-    return lg(cardinality * len(match['token']))
-
-
-def sequence_entropy(match):
-    first_chr = match['token'][0]
-    if first_chr in ['a', '1']:
-        base_entropy = 1
+def repeat_guesses(match):
+    return match['base_guesses'] * match['repeat_count']
+
+
+def sequence_guesses(match):
+    first_chr = match['token'][:1]
+    # lower guesses for obvious starting points
+    if first_chr in ['a', 'A', 'z', 'Z', '0', '1', '9']:
+        base_guesses = 4
     else:
-        if first_chr.isdigit():
-            base_entropy = lg(10) # digits
-        elif first_chr.isalpha():
-            base_entropy = lg(26) # lower
+        if re.compile('\d').match(first_chr):
+            base_guesses = 10  # digits
         else:
-            base_entropy = lg(26) + 1 # extra bit for uppercase
+            # could give a higher base for uppercase,
+            # assigning 26 to both upper and lower sequences is more
+            # conservative.
+            base_guesses = 26
     if not match['ascending']:
-        base_entropy += 1 # extra bit for descending instead of ascending
-    return base_entropy + lg(len(match['token']))
+        base_guesses *= 2
 
+    return base_guesses * len(match['token'])
 
-def digits_entropy(match):
-    return lg(math.pow(10, len(match['token'])))
 
+def regex_guesses(match):
+    char_class_bases = {
+        'alpha_lower': 26,
+        'alpha_upper': 26,
+        'alpha': 52,
+        'alphanumeric': 62,
+        'digits': 10,
+        'symbols': 33,
+    }
+    if match['regex_name'] in char_class_bases:
+        return char_class_bases[match['regex_name']] ** len(match['token'])
+    elif match['regex_name'] == 'recent_year':
+        # conservative estimate of year space: num years from REFERENCE_YEAR.
+        # if year is close to REFERENCE_YEAR, estimate a year space of
+        # MIN_YEAR_SPACE.
+        year_space = abs(int(match['regex_match'].group(0)) - REFERENCE_YEAR)
+        year_space = max(year_space, MIN_YEAR_SPACE)
 
-NUM_YEARS = 119 # years match against 1900 - 2019
-NUM_MONTHS = 12
-NUM_DAYS = 31
+        return year_space
 
 
-def year_entropy(match):
-    return lg(NUM_YEARS)
+def date_guesses(match):
+    year_space = max(abs(match['year'] - REFERENCE_YEAR), MIN_YEAR_SPACE)
+    guesses = year_space * 365
+    if match.get('separator', False):
+        guesses *= 4
 
+    return guesses
 
-def date_entropy(match):
-    if match['year'] < 100:
-        entropy = lg(NUM_DAYS * NUM_MONTHS * 100) # two-digit year
-    else:
-        entropy = lg(NUM_DAYS * NUM_MONTHS * NUM_YEARS) # four-digit year
 
-    if match['separator']:
-        entropy += 2 # add two bits for separator selection [/,-,.,etc]
-    return entropy
+KEYBOARD_AVERAGE_DEGREE = calc_average_degree(ADJACENCY_GRAPHS['qwerty'])
+# slightly different for keypad/mac keypad, but close enough
+KEYPAD_AVERAGE_DEGREE = calc_average_degree(ADJACENCY_GRAPHS['keypad'])
 
+KEYBOARD_STARTING_POSITIONS = len(ADJACENCY_GRAPHS['qwerty'].keys())
+KEYPAD_STARTING_POSITIONS = len(ADJACENCY_GRAPHS['keypad'].keys())
 
-def spatial_entropy(match):
+
+def spatial_guesses(match):
     if match['graph'] in ['qwerty', 'dvorak']:
         s = KEYBOARD_STARTING_POSITIONS
         d = KEYBOARD_AVERAGE_DEGREE
     else:
         s = KEYPAD_STARTING_POSITIONS
         d = KEYPAD_AVERAGE_DEGREE
-    possibilities = 0
+    guesses = 0
     L = len(match['token'])
     t = match['turns']
-    # estimate the number of possible patterns w/ length L or less with t turns or less.
+    # estimate the number of possible patterns w/ length L or less with t turns
+    # or less.
     for i in range(2, L + 1):
-        possible_turns = min(t, i - 1)
-        for j in range(1, possible_turns+1):
-            x =  binom(i - 1, j - 1) * s * math.pow(d, j)
-            possibilities += x
-    entropy = lg(possibilities)
-    # add extra entropy for shifted keys. (% instead of 5, A instead of a.)
-    # math is similar to extra entropy from uppercase letters in dictionary matches.
-    if 'shifted_count' in match:
+        possible_turns = min(t, i - 1) + 1
+        for j in range(1, possible_turns):
+            guesses += nCk(i - 1, j - 1) * s * pow(d, j)
+    # add extra guesses for shifted keys. (% instead of 5, A instead of a.)
+    # math is similar to extra guesses of l33t substitutions in dictionary
+    # matches.
+    if match['shifted_count']:
         S = match['shifted_count']
-        U = L - S # unshifted count
-        possibilities = sum(binom(S + U, i) for i in xrange(0, min(S, U) + 1))
-        entropy += lg(possibilities)
-    return entropy
-
+        U = len(match['token']) - match['shifted_count']  # unshifted count
+        if S == 0 or U == 0:
+            guesses *= 2
+        else:
+            shifted_variations = 0
+            for i in range(1, min(S, U) + 1):
+                shifted_variations += nCk(S + U, i)
+            guesses *= shifted_variations
 
-def dictionary_entropy(match):
-    match['base_entropy'] = lg(match['rank']) # keep these as properties for display purposes
-    match['uppercase_entropy'] = extra_uppercase_entropy(match)
-    match['l33t_entropy'] = extra_l33t_entropy(match)
-    ret = match['base_entropy'] + match['uppercase_entropy'] + match['l33t_entropy']
-    return ret
+    return guesses
 
 
 START_UPPER = re.compile('^[A-Z][^A-Z]+$')
 END_UPPER = re.compile('^[^A-Z]+[A-Z]$')
-ALL_UPPER = re.compile('^[A-Z]+$')
+ALL_UPPER = re.compile('^[^a-z]+$')
+ALL_LOWER = re.compile('^[^A-Z]+$')
 
 
-def extra_uppercase_entropy(match):
+def uppercase_variations(match):
     word = match['token']
-    if word.islower():
-        return 0
-    # a capitalized word is the most common capitalization scheme,
-    # so it only doubles the search space (uncapitalized + capitalized): 1 extra bit of entropy.
... 480 lines suppressed ...

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/python-modules/packages/python-zxcvbn.git