[Git][debian-gis-team/python-stetl][master] 4 commits: New upstream version 1.3+ds

Bas Couwenberg gitlab at salsa.debian.org
Wed Mar 20 14:32:28 GMT 2019


Bas Couwenberg pushed to branch master at Debian GIS Project / python-stetl


Commits:
cd33cc5e by Bas Couwenberg at 2019-03-20T14:24:11Z
New upstream version 1.3+ds
- - - - -
56f7a2ed by Bas Couwenberg at 2019-03-20T14:24:47Z
Merge tag 'upstream/1.3+ds'

Upstream version 1.3+ds

- - - - -
3e483f18 by Bas Couwenberg at 2019-03-20T14:25:03Z
New upstream release.

- - - - -
e3468483 by Bas Couwenberg at 2019-03-20T14:26:14Z
Set distribution to experimental.

- - - - -


21 changed files:

- CHANGES.txt
- CONTRIBUTING.md
- CREDITS.txt
- PKG-INFO
- VERSION.txt
- debian/changelog
- docs/install.rst
- examples/basics/10_jinja2_templating/output/cities-gjson.gml
- examples/basics/12_gdal_ogr/output/cities.dbf
- examples/basics/3_shape/output/gmlcities.dbf
- examples/basics/runall.log
- setup.py
- stetl/etl.py
- stetl/filters/xsltfilter.py
- stetl/inputs/ogrinput.py
- stetl/outputs/execoutput.py
- stetl/outputs/ogroutput.py
- stetl/util.py
- stetl/utils/apachelog.py
- stetl/version.py
- + tests/test_util.py


Changes:

=====================================
CHANGES.txt
=====================================
@@ -1,6 +1,25 @@
 Changes
 =======
 
+v2.0 - TO BE RELEASED
+---------------------
+
+FIRST VERSION SUPPORTING PYTHON3-ONLY!
+
+See closed issues in Milestone 2.0: https://github.com/geopython/stetl/milestone/10?closed=1
+These are all related to the Py2 to Py3 migration. Other issues arevmoved to later Milestones/releases.
+
+Main is the PR worked on for the Py2 to Py3 migration:
+https://github.com/geopython/stetl/pull/81
+
+v1.3 - march 20, 2019
+---------------------
+
+LAST VERSION SUPPORTING PYTHON2!
+See closed issues in Milestone 1.3: https://github.com/geopython/stetl/milestone/9?closed=1
+
+Very few changes, this release is mainly to make a baseline for v2.0 (Python3).
+
 v1.2 - july 7, 2018
 -------------------
 


=====================================
CONTRIBUTING.md
=====================================
@@ -87,6 +87,8 @@ project's developers might not want to merge into the project.
 
 Please adhere to the coding conventions used throughout a project (indentation,
 accurate comments, etc.) and any other requirements (such as test coverage).
+You can run the `nose` and `flake8` tools to check your code with respect to
+unit tests and coding style.
 
 Follow this process if you'd like your work considered for inclusion in the
 project:
@@ -144,4 +146,4 @@ license your work under the same license as that used by the project.
 ## Thanks
 
 This doc copied and adapted from original at:
-https://github.com/necolas/issue-guidelines/blob/master/CONTRIBUTING.md
\ No newline at end of file
+https://github.com/necolas/issue-guidelines/blob/master/CONTRIBUTING.md


=====================================
CREDITS.txt
=====================================
@@ -10,6 +10,8 @@ Stetl is developed by:
 
 Bas Couwenberg is providing Debian/Ubuntu packaging.
 
+Rob van Loon preparing Python3 migration and other.
+
 This project would not be possible without the great work of Frank Warmerdam and other
 GDAL/OGR developers (http://gdal.org).
 


=====================================
PKG-INFO
=====================================
@@ -1,6 +1,6 @@
-Metadata-Version: 1.2
+Metadata-Version: 2.1
 Name: Stetl
-Version: 1.2
+Version: 1.3
 Summary: Transformation and conversion framework (ETL) mainly for geospatial data
 Home-page: http://github.com/geopython/stetl
 Author: Just van den Broecke
@@ -98,6 +98,25 @@ Description: # Stetl - Streaming ETL
         Changes
         =======
         
+        v2.0 - TO BE RELEASED
+        ---------------------
+        
+        FIRST VERSION SUPPORTING PYTHON3-ONLY!
+        
+        See closed issues in Milestone 2.0: https://github.com/geopython/stetl/milestone/10?closed=1
+        These are all related to the Py2 to Py3 migration. Other issues arevmoved to later Milestones/releases.
+        
+        Main is the PR worked on for the Py2 to Py3 migration:
+        https://github.com/geopython/stetl/pull/81
+        
+        v1.3 - march 20, 2019
+        ---------------------
+        
+        LAST VERSION SUPPORTING PYTHON2!
+        See closed issues in Milestone 1.3: https://github.com/geopython/stetl/milestone/9?closed=1
+        
+        Very few changes, this release is mainly to make a baseline for v2.0 (Python3).
+        
         v1.2 - july 7, 2018
         -------------------
         
@@ -211,6 +230,8 @@ Description: # Stetl - Streaming ETL
         
         Bas Couwenberg is providing Debian/Ubuntu packaging.
         
+        Rob van Loon preparing Python3 migration and other.
+        
         This project would not be possible without the great work of Frank Warmerdam and other
         GDAL/OGR developers (http://gdal.org).
         
@@ -231,3 +252,4 @@ Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
 Classifier: Operating System :: OS Independent
 Classifier: Programming Language :: Python :: 2
 Classifier: Topic :: Scientific/Engineering :: GIS
+Description-Content-Type: text/markdown


=====================================
VERSION.txt
=====================================
@@ -1 +1 @@
-1.2
\ No newline at end of file
+1.3
\ No newline at end of file


=====================================
debian/changelog
=====================================
@@ -1,11 +1,12 @@
-python-stetl (1.2+ds-2) UNRELEASED; urgency=medium
+python-stetl (1.3+ds-1~exp1) experimental; urgency=medium
 
+  * New upstream release.
   * Bump Standards-Version to 4.3.0, no changes.
   * Drop autopkgtests to test installability & module import.
   * Add lintian override for testsuite-autopkgtest-missing.
   * Remove package name from lintian overrides.
 
- -- Bas Couwenberg <sebastic at debian.org>  Sun, 05 Aug 2018 20:54:36 +0200
+ -- Bas Couwenberg <sebastic at debian.org>  Wed, 20 Mar 2019 15:25:59 +0100
 
 python-stetl (1.2+ds-1) unstable; urgency=medium
 


=====================================
docs/install.rst
=====================================
@@ -3,10 +3,11 @@
 Installation
 ============
 
-Stetl currently only runs with Python 2 (2.7+). `Work is underway <https://github.com/geopython/stetl/pull/27>`_ for Python3 support.
+Stetl up to and including version 1.3 only runs with Python 2 (2.7+).
+Starting with Stetl v2.0 only Python 3 (3.4.2+) will be supported.
 
 Easiest is to first install the Stetl-dependencies (see below) and then
-install and maintain Stetl on your system as a Python package (pip is preferred). ::
+install and maintain Stetl on your system as a Python package (`pip` is preferred). ::
 
     (sudo) pip install stetl
     or
@@ -106,12 +107,16 @@ choose to install the same packages via `pip` to have more recent versions like
 
 	apt-get install python-jinja2
 
-
 Mac OSX
 ~~~~~~~
 
 Dependencies can best be installed via `Homebrew <http://brew.sh/>`_.
 
+Tip: sometimes installing GDAL Python bindings can be tricky as the
+installed GDAL binaries must be compatible. To install the right version you may use: ::
+
+	pip install GDAL==`gdalinfo --version | cut -d' ' -f2 | cut -d',' -f1`
+
 Windows
 ~~~~~~~
 


=====================================
examples/basics/10_jinja2_templating/output/cities-gjson.gml
=====================================
@@ -42,7 +42,7 @@
                 <cities:name>Amsterdam</cities:name>
                 <cities:population>779808</cities:population>
                 <cities:geometry>
-                    <gml:Point srsName="urn:ogc:def:crs:EPSG::4258" gml:id="point-1"><gml:pos>52.3730454545455 4.89483636363636</gml:pos></gml:Point>
+                    <gml:Point srsName="urn:ogc:def:crs:EPSG::4258" gml:id="point-1"><gml:pos>52.3730454554572 4.89483636363636</gml:pos></gml:Point>
                 </cities:geometry>
             </cities:City>
         </gml:featureMember>
@@ -51,7 +51,7 @@
                 <cities:name>Bonn</cities:name>
                 <cities:population>327913</cities:population>
                 <cities:geometry>
-                    <gml:Point srsName="urn:ogc:def:crs:EPSG::4258" gml:id="point-2"><gml:pos>50.7345545454545 7.09981818181818</gml:pos></gml:Point>
+                    <gml:Point srsName="urn:ogc:def:crs:EPSG::4258" gml:id="point-2"><gml:pos>50.7345545463786 7.09981818181818</gml:pos></gml:Point>
                 </cities:geometry>
             </cities:City>
         </gml:featureMember>
@@ -60,7 +60,7 @@
                 <cities:name>Rome</cities:name>
                 <cities:population>2753000</cities:population>
                 <cities:geometry>
-                    <gml:Point srsName="urn:ogc:def:crs:EPSG::4258" gml:id="point-3"><gml:pos>41.88 12.52</gml:pos></gml:Point>
+                    <gml:Point srsName="urn:ogc:def:crs:EPSG::4258" gml:id="point-3"><gml:pos>41.8800000009378 12.52</gml:pos></gml:Point>
                 </cities:geometry>
             </cities:City>
         </gml:featureMember>


=====================================
examples/basics/12_gdal_ogr/output/cities.dbf
=====================================
Binary files a/examples/basics/12_gdal_ogr/output/cities.dbf and b/examples/basics/12_gdal_ogr/output/cities.dbf differ


=====================================
examples/basics/3_shape/output/gmlcities.dbf
=====================================
Binary files a/examples/basics/3_shape/output/gmlcities.dbf and b/examples/basics/3_shape/output/gmlcities.dbf differ


=====================================
examples/basics/runall.log
=====================================
The diff for this file was not included because it is too large.

=====================================
setup.py
=====================================
@@ -47,6 +47,7 @@ setup(
     maintainer_email='justb4 at gmail.com',
     url='http://github.com/geopython/stetl',
     long_description=readme + "\n" + changes + "\n" + credits,
+    long_description_content_type="text/markdown",
     packages=find_packages(exclude=['tests']),
     namespace_packages=['stetl'],
     include_package_data=True,


=====================================
stetl/etl.py
=====================================
@@ -69,7 +69,7 @@ class ETL:
 
             # Parse unique list of argument names from config file string.
             # https://www.machinelearningplus.com/python/python-regex-tutorial-examples/
-            args_names = list(set(re.findall('{[A-Z|a-z]\w+}', config_str)))
+            args_names = list(set(re.findall(r'{[A-Z|a-z]\w+}', config_str)))
             args_names = [name.split('{')[1].split('}')[0] for name in args_names]
 
             # Optional: expand from equivalent env vars


=====================================
stetl/filters/xsltfilter.py
=====================================
@@ -5,6 +5,7 @@
 #
 # Author:Just van den Broecke
 
+from stetl.component import Config
 from stetl.util import Util, etree
 from stetl.filter import Filter
 from stetl.packet import FORMAT
@@ -19,12 +20,19 @@ class XsltFilter(Filter):
     consumes=FORMAT.etree_doc, produces=FORMAT.etree_doc
     """
 
+    @Config(ptype=str, required=True)
+    def script(self):
+        """
+        Path to XSLT script file.
+        """
+        pass
+
     # Constructor
     def __init__(self, configdict, section):
         Filter.__init__(self, configdict, section, consumes=FORMAT.etree_doc, produces=FORMAT.etree_doc)
 
-        self.xslt_file_path = self.cfg.get('script')
-        self.xslt_file = open(self.xslt_file_path, 'r')
+        self.xslt_file = open(self.script, 'r')
+
         # Parse XSLT file only once
         self.xslt_doc = etree.parse(self.xslt_file)
         self.xslt_obj = etree.XSLT(self.xslt_doc)


=====================================
stetl/inputs/ogrinput.py
=====================================
@@ -109,7 +109,7 @@ class OgrInput(Input):
 
         # Report failure if failed
         if self.data_source_p is None:
-            log.error("Cannot open OGR datasource: %s with the following drivers." % self.data_source)
+            log.error("Cannot open OGR datasource: %s with the following drivers." % Util.safe_string_value(self.data_source))
 
             for iDriver in range(self.ogr.GetDriverCount()):
                 log.info("  ->  " + self.ogr.GetDriver(iDriver).GetName())
@@ -126,11 +126,11 @@ class OgrInput(Input):
                 self.layer_count = self.data_source_p.GetLayerCount()
                 self.layer_idx = 0
 
-            log.info("Opened OGR source ok: %s layer count=%d" % (self.data_source, self.layer_count))
+            log.info("Opened OGR source ok: %s layer count=%d" % (Util.safe_string_value(self.data_source), self.layer_count))
 
     def read(self, packet):
         if not self.data_source_p:
-            log.info("End reading from: %s" % self.data_source)
+            log.info("End reading from: %s" % Util.safe_string_value(self.data_source))
             return packet
 
         if self.layer is None:
@@ -145,11 +145,11 @@ class OgrInput(Input):
                 if self.layer is None:
                     log.error("Could not fetch layer %d" % 0)
                     raise Exception()
-                log.info("Start reading from OGR Source: %s, Layer: %s" % (self.data_source, self.layer.GetName()))
+                log.info("Start reading from OGR Source: %s, Layer: %s" % (Util.safe_string_value(self.data_source), self.layer.GetName()))
             else:
                 # No more Layers left: cleanup
                 packet.set_end_of_stream()
-                log.info("Closing OGR source: %s" % self.data_source)
+                log.info("Closing OGR source: %s" % Util.safe_string_value(self.data_source))
                 # Destroy not required anymore: http://trac.osgeo.org/gdal/wiki/PythonGotchas
                 # self.data_source_p.Destroy()
                 self.data_source_p = None
@@ -314,7 +314,7 @@ class OgrPostgisInput(Input):
         self.cmd = self.cmd.split('|')
 
     def exec_cmd(self):
-        log.info("start ogr2ogr cmd = %s" % repr(self.cmd))
+        log.info("start ogr2ogr cmd = %s" % Util.safe_string_value(repr(self.cmd)))
         self.ogr_process = subprocess.Popen(self.cmd,
                                             shell=False,
                                             stdout=subprocess.PIPE,


=====================================
stetl/outputs/execoutput.py
=====================================
@@ -48,7 +48,7 @@ class ExecOutput(Output):
 
         try:
             os.environ.update(env_vars)
-            log.info("executing cmd=%s" % cmd)
+            log.info("executing cmd=%s" % Util.safe_string_value(cmd))
             subprocess.call(cmd, shell=True)
             log.info("execute done")
         finally:


=====================================
stetl/outputs/ogroutput.py
=====================================
@@ -201,7 +201,7 @@ class OgrOutput(Output):
         if self.dest_fd is None:
             self.dest_fd = self.dest_driver.CreateDataSource(self.dest_data_source, options=self.dest_create_options)
             if self.dest_fd is None:
-                log.error("%s driver failed to create %s" % (self.dest_format, self.dest_data_source))
+                log.error("%s driver failed to create %s" % (self.dest_format, Util.safe_string_value(self.dest_data_source)))
                 raise Exception()
 
         # /* -------------------------------------------------------------------- */
@@ -218,7 +218,7 @@ class OgrOutput(Output):
                                               self.layer_create_options)
         self.feature_def = None
 
-        log.info("Opened OGR dest ok: %s " % self.dest_data_source)
+        log.info("Opened OGR dest ok: %s " % Util.safe_string_value(self.dest_data_source))
 
     def write(self, packet):
 
@@ -228,7 +228,7 @@ class OgrOutput(Output):
             return packet
 
         if self.layer is None:
-            log.info("No Layer, end writing to: %s" % self.dest_data_source)
+            log.info("No Layer, end writing to: %s" % Util.safe_string_value(self.dest_data_source))
             return packet
 
         # Assume ogr_feature_array input, otherwise convert ogr_feature to list
@@ -268,7 +268,7 @@ class OgrOutput(Output):
     def write_end(self, packet):
         # Destroy not required anymore: http://trac.osgeo.org/gdal/wiki/PythonGotchas
         # self.dest_fd.Destroy()
-        log.info("End writing to: %s" % self.dest_data_source)
+        log.info("End writing to: %s" % Util.safe_string_value(self.dest_data_source))
         self.dest_fd = None
         self.layer = None
         return packet


=====================================
stetl/util.py
=====================================
@@ -4,9 +4,10 @@
 #
 # Author:Just van den Broecke
 
+import glob
 import logging
 import os
-import glob
+import re
 import types
 from time import time
 from ConfigParser import ConfigParser
@@ -14,6 +15,15 @@ from ConfigParser import ConfigParser
 logging.basicConfig(level=logging.INFO,
                     format='%(asctime)s %(name)s %(levelname)s %(message)s')
 
+# Constants for precompiled regular expressions
+RE_PG_START = re.compile(r'\bPG:', flags=re.IGNORECASE)
+RE_PG_PWD = re.compile(r'\bpassword=[^\'"]\S*', flags=re.IGNORECASE)
+RE_PG_PWD_DBL = re.compile(r'\bpassword="(?:[^"\\]|\\.)*"', flags=re.IGNORECASE)
+RE_PG_PWD_SNG = re.compile(r'\bpassword=\'(?:[^\'\\]|\\.)*\'', flags=re.IGNORECASE)
+RE_PG_USER = re.compile(r'\buser=[^\'"]\S*', flags=re.IGNORECASE)
+RE_PG_USER_DBL = re.compile(r'\buser="(?:[^"\\]|\\.)*"', flags=re.IGNORECASE)
+RE_PG_USER_SNG = re.compile(r'\buser=\'(?:[^\'\\]|\\.)*\'', flags=re.IGNORECASE)
+
 
 # Static utility methods
 class Util:
@@ -348,6 +358,24 @@ class Util:
 
         return elem
 
+    # Hide user names and passwords in string values, like the Postgres connection string as used by GDAL/OGR
+    # See https://stackoverflow.com/questions/249791/regex-for-quoted-string-with-escaping-quotes for the escaped quotes expressions
+    @staticmethod
+    def safe_string_value(value, hide_value='***'):
+        # PostgreSQL connection strings as used by GDAL/OGR
+        if RE_PG_START.search(value) is not None:
+            value = RE_PG_PWD.sub('password=%s' % hide_value, value)
+            value = RE_PG_PWD_DBL.sub('password="%s"' % hide_value, value)
+            value = RE_PG_PWD_SNG.sub('password=\'%s\'' % hide_value, value)
+
+            value = RE_PG_USER.sub('user=%s' % hide_value, value)
+            value = RE_PG_USER_DBL.sub('user="%s"' % hide_value, value)
+            value = RE_PG_USER_SNG.sub('user=\'%s\'' % hide_value, value)
+
+        # Add more cases as needed ...
+
+        return value
+
 
 log = Util.get_log("util")
 
@@ -488,9 +516,14 @@ class ConfigSection():
         # Need to hide some sensitive values, usually used for logging
         safe_copy = self.config_dict.copy()
         hides = ['passw', 'pasw', 'token', 'user']
+        hide_value = '<hidden>'
+
         for key in safe_copy:
             for hide_key in hides:
                 if hide_key in key.lower():
-                    safe_copy[key] = '<hidden>'
+                    safe_copy[key] = hide_value
+
+            # Also hide usernames/passwords  in string values, like Postgres connection strings used by GDAL/OGR
+            safe_copy[key] = Util.safe_string_value(safe_copy[key], hide_value)
 
         return repr(safe_copy)


=====================================
stetl/utils/apachelog.py
=====================================
@@ -162,7 +162,7 @@ class parser:
 
             self._names.append(self.alias(element))
 
-            subpattern = '(\S*)'
+            subpattern = r'(\S*)'
 
             if hasquotes:
                 if element == '%r' or findreferreragent.search(element):


=====================================
stetl/version.py
=====================================
@@ -1 +1 @@
-__version__ = "1.2"
+__version__ = "1.3"


=====================================
tests/test_util.py
=====================================
@@ -0,0 +1,47 @@
+# testing: to be called by nosetests
+
+import os
+from ast import literal_eval
+
+from stetl.etl import ETL
+from stetl.util import ConfigSection
+from tests.stetl_test_case import StetlTestCase
+
+
+class UtilTest(StetlTestCase):
+    """Basic util tests"""
+
+    def setUp(self):
+        super(UtilTest, self).setUp()
+
+    def test_configsection_to_string(self):
+        cfg = {
+            'name': 'Stetl',
+            'password': 'something',
+            'paswoord': 'iets',
+            'token': 'abc123',
+            'user': 'John',
+            'username': 'Jane',
+            'gebruiker': 'Jan',
+            'ogrconn': 'PG:dbname=mydb host=myhost port=myport user=myuser password=mypassword active_schema=myschema',
+            'ogrconn_singlequotes': 'PG:dbname=\'mydb\' host=\'myhost\' port=\'myport\' user=\'myuser\' password=\'mypassword\' active_schema=\'myschema\'',
+            'ogrconn_doublequotes': 'PG:dbname="mydb" host="myhost" port="myport" user="myuser" password="mypassword" active_schema="myschema"',
+            'ogrconn_crazypwd1': 'PG:dbname=\'mydb\' host=\'myhost\' port=\'myport\' user=\'myuser\' password=\'my\\\'crazy\\"password\' active_schema=\'myschema\'',
+            'ogrconn_crazypwd2': 'PG:dbname="mydb" host="myhost" port="myport" user="myuser" password="my\\\'crazy\\"password" active_schema="myschema"',
+            'ogrconn_dkk': '"PG:dbname=mydb host=myhost port=myport user=myuser password=mypassword active_schema=myschema"',
+        }
+        obj = literal_eval(ConfigSection(cfg).to_string())
+        
+        self.assertEqual('Stetl', obj['name'])
+        self.assertEqual('<hidden>', obj['password'])
+        self.assertEqual('<hidden>', obj['paswoord'])
+        self.assertEqual('<hidden>', obj['token'])
+        self.assertEqual('<hidden>', obj['user'])
+        self.assertEqual('<hidden>', obj['username'])
+        self.assertEqual('Jan', obj['gebruiker'])
+        self.assertEqual('PG:dbname=mydb host=myhost port=myport user=<hidden> password=<hidden> active_schema=myschema', obj['ogrconn'])
+        self.assertEqual('PG:dbname=\'mydb\' host=\'myhost\' port=\'myport\' user=\'<hidden>\' password=\'<hidden>\' active_schema=\'myschema\'', obj['ogrconn_singlequotes'])
+        self.assertEqual('PG:dbname="mydb" host="myhost" port="myport" user="<hidden>" password="<hidden>" active_schema="myschema"', obj['ogrconn_doublequotes'])
+        self.assertEqual('PG:dbname=\'mydb\' host=\'myhost\' port=\'myport\' user=\'<hidden>\' password=\'<hidden>\' active_schema=\'myschema\'', obj['ogrconn_crazypwd1'])
+        self.assertEqual('PG:dbname="mydb" host="myhost" port="myport" user="<hidden>" password="<hidden>" active_schema="myschema"', obj['ogrconn_crazypwd2'])
+        self.assertEqual('"PG:dbname=mydb host=myhost port=myport user=<hidden> password=<hidden> active_schema=myschema"', obj['ogrconn_dkk'])



View it on GitLab: https://salsa.debian.org/debian-gis-team/python-stetl/compare/0ff8cec0b117a61fd79bf33e1fbfd6f0a0857971...e3468483f05300ced3a8e598e838a07746cb1754

-- 
View it on GitLab: https://salsa.debian.org/debian-gis-team/python-stetl/compare/0ff8cec0b117a61fd79bf33e1fbfd6f0a0857971...e3468483f05300ced3a8e598e838a07746cb1754
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/pkg-grass-devel/attachments/20190320/f3d2f6ba/attachment-0001.html>


More information about the Pkg-grass-devel mailing list