[Git][debian-gis-team/python-stetl][upstream] New upstream version 1.3+ds
Bas Couwenberg
gitlab at salsa.debian.org
Wed Mar 20 14:32:33 GMT 2019
Bas Couwenberg pushed to branch upstream at Debian GIS Project / python-stetl
Commits:
cd33cc5e by Bas Couwenberg at 2019-03-20T14:24:11Z
New upstream version 1.3+ds
- - - - -
20 changed files:
- CHANGES.txt
- CONTRIBUTING.md
- CREDITS.txt
- PKG-INFO
- VERSION.txt
- docs/install.rst
- examples/basics/10_jinja2_templating/output/cities-gjson.gml
- examples/basics/12_gdal_ogr/output/cities.dbf
- examples/basics/3_shape/output/gmlcities.dbf
- examples/basics/runall.log
- setup.py
- stetl/etl.py
- stetl/filters/xsltfilter.py
- stetl/inputs/ogrinput.py
- stetl/outputs/execoutput.py
- stetl/outputs/ogroutput.py
- stetl/util.py
- stetl/utils/apachelog.py
- stetl/version.py
- + tests/test_util.py
Changes:
=====================================
CHANGES.txt
=====================================
@@ -1,6 +1,25 @@
Changes
=======
+v2.0 - TO BE RELEASED
+---------------------
+
+FIRST VERSION SUPPORTING PYTHON3-ONLY!
+
+See closed issues in Milestone 2.0: https://github.com/geopython/stetl/milestone/10?closed=1
+These are all related to the Py2 to Py3 migration. Other issues arevmoved to later Milestones/releases.
+
+Main is the PR worked on for the Py2 to Py3 migration:
+https://github.com/geopython/stetl/pull/81
+
+v1.3 - march 20, 2019
+---------------------
+
+LAST VERSION SUPPORTING PYTHON2!
+See closed issues in Milestone 1.3: https://github.com/geopython/stetl/milestone/9?closed=1
+
+Very few changes, this release is mainly to make a baseline for v2.0 (Python3).
+
v1.2 - july 7, 2018
-------------------
=====================================
CONTRIBUTING.md
=====================================
@@ -87,6 +87,8 @@ project's developers might not want to merge into the project.
Please adhere to the coding conventions used throughout a project (indentation,
accurate comments, etc.) and any other requirements (such as test coverage).
+You can run the `nose` and `flake8` tools to check your code with respect to
+unit tests and coding style.
Follow this process if you'd like your work considered for inclusion in the
project:
@@ -144,4 +146,4 @@ license your work under the same license as that used by the project.
## Thanks
This doc copied and adapted from original at:
-https://github.com/necolas/issue-guidelines/blob/master/CONTRIBUTING.md
\ No newline at end of file
+https://github.com/necolas/issue-guidelines/blob/master/CONTRIBUTING.md
=====================================
CREDITS.txt
=====================================
@@ -10,6 +10,8 @@ Stetl is developed by:
Bas Couwenberg is providing Debian/Ubuntu packaging.
+Rob van Loon preparing Python3 migration and other.
+
This project would not be possible without the great work of Frank Warmerdam and other
GDAL/OGR developers (http://gdal.org).
=====================================
PKG-INFO
=====================================
@@ -1,6 +1,6 @@
-Metadata-Version: 1.2
+Metadata-Version: 2.1
Name: Stetl
-Version: 1.2
+Version: 1.3
Summary: Transformation and conversion framework (ETL) mainly for geospatial data
Home-page: http://github.com/geopython/stetl
Author: Just van den Broecke
@@ -98,6 +98,25 @@ Description: # Stetl - Streaming ETL
Changes
=======
+ v2.0 - TO BE RELEASED
+ ---------------------
+
+ FIRST VERSION SUPPORTING PYTHON3-ONLY!
+
+ See closed issues in Milestone 2.0: https://github.com/geopython/stetl/milestone/10?closed=1
+ These are all related to the Py2 to Py3 migration. Other issues arevmoved to later Milestones/releases.
+
+ Main is the PR worked on for the Py2 to Py3 migration:
+ https://github.com/geopython/stetl/pull/81
+
+ v1.3 - march 20, 2019
+ ---------------------
+
+ LAST VERSION SUPPORTING PYTHON2!
+ See closed issues in Milestone 1.3: https://github.com/geopython/stetl/milestone/9?closed=1
+
+ Very few changes, this release is mainly to make a baseline for v2.0 (Python3).
+
v1.2 - july 7, 2018
-------------------
@@ -211,6 +230,8 @@ Description: # Stetl - Streaming ETL
Bas Couwenberg is providing Debian/Ubuntu packaging.
+ Rob van Loon preparing Python3 migration and other.
+
This project would not be possible without the great work of Frank Warmerdam and other
GDAL/OGR developers (http://gdal.org).
@@ -231,3 +252,4 @@ Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 2
Classifier: Topic :: Scientific/Engineering :: GIS
+Description-Content-Type: text/markdown
=====================================
VERSION.txt
=====================================
@@ -1 +1 @@
-1.2
\ No newline at end of file
+1.3
\ No newline at end of file
=====================================
docs/install.rst
=====================================
@@ -3,10 +3,11 @@
Installation
============
-Stetl currently only runs with Python 2 (2.7+). `Work is underway <https://github.com/geopython/stetl/pull/27>`_ for Python3 support.
+Stetl up to and including version 1.3 only runs with Python 2 (2.7+).
+Starting with Stetl v2.0 only Python 3 (3.4.2+) will be supported.
Easiest is to first install the Stetl-dependencies (see below) and then
-install and maintain Stetl on your system as a Python package (pip is preferred). ::
+install and maintain Stetl on your system as a Python package (`pip` is preferred). ::
(sudo) pip install stetl
or
@@ -106,12 +107,16 @@ choose to install the same packages via `pip` to have more recent versions like
apt-get install python-jinja2
-
Mac OSX
~~~~~~~
Dependencies can best be installed via `Homebrew <http://brew.sh/>`_.
+Tip: sometimes installing GDAL Python bindings can be tricky as the
+installed GDAL binaries must be compatible. To install the right version you may use: ::
+
+ pip install GDAL==`gdalinfo --version | cut -d' ' -f2 | cut -d',' -f1`
+
Windows
~~~~~~~
=====================================
examples/basics/10_jinja2_templating/output/cities-gjson.gml
=====================================
@@ -42,7 +42,7 @@
<cities:name>Amsterdam</cities:name>
<cities:population>779808</cities:population>
<cities:geometry>
- <gml:Point srsName="urn:ogc:def:crs:EPSG::4258" gml:id="point-1"><gml:pos>52.3730454545455 4.89483636363636</gml:pos></gml:Point>
+ <gml:Point srsName="urn:ogc:def:crs:EPSG::4258" gml:id="point-1"><gml:pos>52.3730454554572 4.89483636363636</gml:pos></gml:Point>
</cities:geometry>
</cities:City>
</gml:featureMember>
@@ -51,7 +51,7 @@
<cities:name>Bonn</cities:name>
<cities:population>327913</cities:population>
<cities:geometry>
- <gml:Point srsName="urn:ogc:def:crs:EPSG::4258" gml:id="point-2"><gml:pos>50.7345545454545 7.09981818181818</gml:pos></gml:Point>
+ <gml:Point srsName="urn:ogc:def:crs:EPSG::4258" gml:id="point-2"><gml:pos>50.7345545463786 7.09981818181818</gml:pos></gml:Point>
</cities:geometry>
</cities:City>
</gml:featureMember>
@@ -60,7 +60,7 @@
<cities:name>Rome</cities:name>
<cities:population>2753000</cities:population>
<cities:geometry>
- <gml:Point srsName="urn:ogc:def:crs:EPSG::4258" gml:id="point-3"><gml:pos>41.88 12.52</gml:pos></gml:Point>
+ <gml:Point srsName="urn:ogc:def:crs:EPSG::4258" gml:id="point-3"><gml:pos>41.8800000009378 12.52</gml:pos></gml:Point>
</cities:geometry>
</cities:City>
</gml:featureMember>
=====================================
examples/basics/12_gdal_ogr/output/cities.dbf
=====================================
Binary files a/examples/basics/12_gdal_ogr/output/cities.dbf and b/examples/basics/12_gdal_ogr/output/cities.dbf differ
=====================================
examples/basics/3_shape/output/gmlcities.dbf
=====================================
Binary files a/examples/basics/3_shape/output/gmlcities.dbf and b/examples/basics/3_shape/output/gmlcities.dbf differ
=====================================
examples/basics/runall.log
=====================================
The diff for this file was not included because it is too large.
=====================================
setup.py
=====================================
@@ -47,6 +47,7 @@ setup(
maintainer_email='justb4 at gmail.com',
url='http://github.com/geopython/stetl',
long_description=readme + "\n" + changes + "\n" + credits,
+ long_description_content_type="text/markdown",
packages=find_packages(exclude=['tests']),
namespace_packages=['stetl'],
include_package_data=True,
=====================================
stetl/etl.py
=====================================
@@ -69,7 +69,7 @@ class ETL:
# Parse unique list of argument names from config file string.
# https://www.machinelearningplus.com/python/python-regex-tutorial-examples/
- args_names = list(set(re.findall('{[A-Z|a-z]\w+}', config_str)))
+ args_names = list(set(re.findall(r'{[A-Z|a-z]\w+}', config_str)))
args_names = [name.split('{')[1].split('}')[0] for name in args_names]
# Optional: expand from equivalent env vars
=====================================
stetl/filters/xsltfilter.py
=====================================
@@ -5,6 +5,7 @@
#
# Author:Just van den Broecke
+from stetl.component import Config
from stetl.util import Util, etree
from stetl.filter import Filter
from stetl.packet import FORMAT
@@ -19,12 +20,19 @@ class XsltFilter(Filter):
consumes=FORMAT.etree_doc, produces=FORMAT.etree_doc
"""
+ @Config(ptype=str, required=True)
+ def script(self):
+ """
+ Path to XSLT script file.
+ """
+ pass
+
# Constructor
def __init__(self, configdict, section):
Filter.__init__(self, configdict, section, consumes=FORMAT.etree_doc, produces=FORMAT.etree_doc)
- self.xslt_file_path = self.cfg.get('script')
- self.xslt_file = open(self.xslt_file_path, 'r')
+ self.xslt_file = open(self.script, 'r')
+
# Parse XSLT file only once
self.xslt_doc = etree.parse(self.xslt_file)
self.xslt_obj = etree.XSLT(self.xslt_doc)
=====================================
stetl/inputs/ogrinput.py
=====================================
@@ -109,7 +109,7 @@ class OgrInput(Input):
# Report failure if failed
if self.data_source_p is None:
- log.error("Cannot open OGR datasource: %s with the following drivers." % self.data_source)
+ log.error("Cannot open OGR datasource: %s with the following drivers." % Util.safe_string_value(self.data_source))
for iDriver in range(self.ogr.GetDriverCount()):
log.info(" -> " + self.ogr.GetDriver(iDriver).GetName())
@@ -126,11 +126,11 @@ class OgrInput(Input):
self.layer_count = self.data_source_p.GetLayerCount()
self.layer_idx = 0
- log.info("Opened OGR source ok: %s layer count=%d" % (self.data_source, self.layer_count))
+ log.info("Opened OGR source ok: %s layer count=%d" % (Util.safe_string_value(self.data_source), self.layer_count))
def read(self, packet):
if not self.data_source_p:
- log.info("End reading from: %s" % self.data_source)
+ log.info("End reading from: %s" % Util.safe_string_value(self.data_source))
return packet
if self.layer is None:
@@ -145,11 +145,11 @@ class OgrInput(Input):
if self.layer is None:
log.error("Could not fetch layer %d" % 0)
raise Exception()
- log.info("Start reading from OGR Source: %s, Layer: %s" % (self.data_source, self.layer.GetName()))
+ log.info("Start reading from OGR Source: %s, Layer: %s" % (Util.safe_string_value(self.data_source), self.layer.GetName()))
else:
# No more Layers left: cleanup
packet.set_end_of_stream()
- log.info("Closing OGR source: %s" % self.data_source)
+ log.info("Closing OGR source: %s" % Util.safe_string_value(self.data_source))
# Destroy not required anymore: http://trac.osgeo.org/gdal/wiki/PythonGotchas
# self.data_source_p.Destroy()
self.data_source_p = None
@@ -314,7 +314,7 @@ class OgrPostgisInput(Input):
self.cmd = self.cmd.split('|')
def exec_cmd(self):
- log.info("start ogr2ogr cmd = %s" % repr(self.cmd))
+ log.info("start ogr2ogr cmd = %s" % Util.safe_string_value(repr(self.cmd)))
self.ogr_process = subprocess.Popen(self.cmd,
shell=False,
stdout=subprocess.PIPE,
=====================================
stetl/outputs/execoutput.py
=====================================
@@ -48,7 +48,7 @@ class ExecOutput(Output):
try:
os.environ.update(env_vars)
- log.info("executing cmd=%s" % cmd)
+ log.info("executing cmd=%s" % Util.safe_string_value(cmd))
subprocess.call(cmd, shell=True)
log.info("execute done")
finally:
=====================================
stetl/outputs/ogroutput.py
=====================================
@@ -201,7 +201,7 @@ class OgrOutput(Output):
if self.dest_fd is None:
self.dest_fd = self.dest_driver.CreateDataSource(self.dest_data_source, options=self.dest_create_options)
if self.dest_fd is None:
- log.error("%s driver failed to create %s" % (self.dest_format, self.dest_data_source))
+ log.error("%s driver failed to create %s" % (self.dest_format, Util.safe_string_value(self.dest_data_source)))
raise Exception()
# /* -------------------------------------------------------------------- */
@@ -218,7 +218,7 @@ class OgrOutput(Output):
self.layer_create_options)
self.feature_def = None
- log.info("Opened OGR dest ok: %s " % self.dest_data_source)
+ log.info("Opened OGR dest ok: %s " % Util.safe_string_value(self.dest_data_source))
def write(self, packet):
@@ -228,7 +228,7 @@ class OgrOutput(Output):
return packet
if self.layer is None:
- log.info("No Layer, end writing to: %s" % self.dest_data_source)
+ log.info("No Layer, end writing to: %s" % Util.safe_string_value(self.dest_data_source))
return packet
# Assume ogr_feature_array input, otherwise convert ogr_feature to list
@@ -268,7 +268,7 @@ class OgrOutput(Output):
def write_end(self, packet):
# Destroy not required anymore: http://trac.osgeo.org/gdal/wiki/PythonGotchas
# self.dest_fd.Destroy()
- log.info("End writing to: %s" % self.dest_data_source)
+ log.info("End writing to: %s" % Util.safe_string_value(self.dest_data_source))
self.dest_fd = None
self.layer = None
return packet
=====================================
stetl/util.py
=====================================
@@ -4,9 +4,10 @@
#
# Author:Just van den Broecke
+import glob
import logging
import os
-import glob
+import re
import types
from time import time
from ConfigParser import ConfigParser
@@ -14,6 +15,15 @@ from ConfigParser import ConfigParser
logging.basicConfig(level=logging.INFO,
format='%(asctime)s %(name)s %(levelname)s %(message)s')
+# Constants for precompiled regular expressions
+RE_PG_START = re.compile(r'\bPG:', flags=re.IGNORECASE)
+RE_PG_PWD = re.compile(r'\bpassword=[^\'"]\S*', flags=re.IGNORECASE)
+RE_PG_PWD_DBL = re.compile(r'\bpassword="(?:[^"\\]|\\.)*"', flags=re.IGNORECASE)
+RE_PG_PWD_SNG = re.compile(r'\bpassword=\'(?:[^\'\\]|\\.)*\'', flags=re.IGNORECASE)
+RE_PG_USER = re.compile(r'\buser=[^\'"]\S*', flags=re.IGNORECASE)
+RE_PG_USER_DBL = re.compile(r'\buser="(?:[^"\\]|\\.)*"', flags=re.IGNORECASE)
+RE_PG_USER_SNG = re.compile(r'\buser=\'(?:[^\'\\]|\\.)*\'', flags=re.IGNORECASE)
+
# Static utility methods
class Util:
@@ -348,6 +358,24 @@ class Util:
return elem
+ # Hide user names and passwords in string values, like the Postgres connection string as used by GDAL/OGR
+ # See https://stackoverflow.com/questions/249791/regex-for-quoted-string-with-escaping-quotes for the escaped quotes expressions
+ @staticmethod
+ def safe_string_value(value, hide_value='***'):
+ # PostgreSQL connection strings as used by GDAL/OGR
+ if RE_PG_START.search(value) is not None:
+ value = RE_PG_PWD.sub('password=%s' % hide_value, value)
+ value = RE_PG_PWD_DBL.sub('password="%s"' % hide_value, value)
+ value = RE_PG_PWD_SNG.sub('password=\'%s\'' % hide_value, value)
+
+ value = RE_PG_USER.sub('user=%s' % hide_value, value)
+ value = RE_PG_USER_DBL.sub('user="%s"' % hide_value, value)
+ value = RE_PG_USER_SNG.sub('user=\'%s\'' % hide_value, value)
+
+ # Add more cases as needed ...
+
+ return value
+
log = Util.get_log("util")
@@ -488,9 +516,14 @@ class ConfigSection():
# Need to hide some sensitive values, usually used for logging
safe_copy = self.config_dict.copy()
hides = ['passw', 'pasw', 'token', 'user']
+ hide_value = '<hidden>'
+
for key in safe_copy:
for hide_key in hides:
if hide_key in key.lower():
- safe_copy[key] = '<hidden>'
+ safe_copy[key] = hide_value
+
+ # Also hide usernames/passwords in string values, like Postgres connection strings used by GDAL/OGR
+ safe_copy[key] = Util.safe_string_value(safe_copy[key], hide_value)
return repr(safe_copy)
=====================================
stetl/utils/apachelog.py
=====================================
@@ -162,7 +162,7 @@ class parser:
self._names.append(self.alias(element))
- subpattern = '(\S*)'
+ subpattern = r'(\S*)'
if hasquotes:
if element == '%r' or findreferreragent.search(element):
=====================================
stetl/version.py
=====================================
@@ -1 +1 @@
-__version__ = "1.2"
+__version__ = "1.3"
=====================================
tests/test_util.py
=====================================
@@ -0,0 +1,47 @@
+# testing: to be called by nosetests
+
+import os
+from ast import literal_eval
+
+from stetl.etl import ETL
+from stetl.util import ConfigSection
+from tests.stetl_test_case import StetlTestCase
+
+
+class UtilTest(StetlTestCase):
+ """Basic util tests"""
+
+ def setUp(self):
+ super(UtilTest, self).setUp()
+
+ def test_configsection_to_string(self):
+ cfg = {
+ 'name': 'Stetl',
+ 'password': 'something',
+ 'paswoord': 'iets',
+ 'token': 'abc123',
+ 'user': 'John',
+ 'username': 'Jane',
+ 'gebruiker': 'Jan',
+ 'ogrconn': 'PG:dbname=mydb host=myhost port=myport user=myuser password=mypassword active_schema=myschema',
+ 'ogrconn_singlequotes': 'PG:dbname=\'mydb\' host=\'myhost\' port=\'myport\' user=\'myuser\' password=\'mypassword\' active_schema=\'myschema\'',
+ 'ogrconn_doublequotes': 'PG:dbname="mydb" host="myhost" port="myport" user="myuser" password="mypassword" active_schema="myschema"',
+ 'ogrconn_crazypwd1': 'PG:dbname=\'mydb\' host=\'myhost\' port=\'myport\' user=\'myuser\' password=\'my\\\'crazy\\"password\' active_schema=\'myschema\'',
+ 'ogrconn_crazypwd2': 'PG:dbname="mydb" host="myhost" port="myport" user="myuser" password="my\\\'crazy\\"password" active_schema="myschema"',
+ 'ogrconn_dkk': '"PG:dbname=mydb host=myhost port=myport user=myuser password=mypassword active_schema=myschema"',
+ }
+ obj = literal_eval(ConfigSection(cfg).to_string())
+
+ self.assertEqual('Stetl', obj['name'])
+ self.assertEqual('<hidden>', obj['password'])
+ self.assertEqual('<hidden>', obj['paswoord'])
+ self.assertEqual('<hidden>', obj['token'])
+ self.assertEqual('<hidden>', obj['user'])
+ self.assertEqual('<hidden>', obj['username'])
+ self.assertEqual('Jan', obj['gebruiker'])
+ self.assertEqual('PG:dbname=mydb host=myhost port=myport user=<hidden> password=<hidden> active_schema=myschema', obj['ogrconn'])
+ self.assertEqual('PG:dbname=\'mydb\' host=\'myhost\' port=\'myport\' user=\'<hidden>\' password=\'<hidden>\' active_schema=\'myschema\'', obj['ogrconn_singlequotes'])
+ self.assertEqual('PG:dbname="mydb" host="myhost" port="myport" user="<hidden>" password="<hidden>" active_schema="myschema"', obj['ogrconn_doublequotes'])
+ self.assertEqual('PG:dbname=\'mydb\' host=\'myhost\' port=\'myport\' user=\'<hidden>\' password=\'<hidden>\' active_schema=\'myschema\'', obj['ogrconn_crazypwd1'])
+ self.assertEqual('PG:dbname="mydb" host="myhost" port="myport" user="<hidden>" password="<hidden>" active_schema="myschema"', obj['ogrconn_crazypwd2'])
+ self.assertEqual('"PG:dbname=mydb host=myhost port=myport user=<hidden> password=<hidden> active_schema=myschema"', obj['ogrconn_dkk'])
View it on GitLab: https://salsa.debian.org/debian-gis-team/python-stetl/commit/cd33cc5e7c0192e625a0c67ff4c72175ba7053d9
--
View it on GitLab: https://salsa.debian.org/debian-gis-team/python-stetl/commit/cd33cc5e7c0192e625a0c67ff4c72175ba7053d9
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/pkg-grass-devel/attachments/20190320/2b288766/attachment-0001.html>
More information about the Pkg-grass-devel
mailing list