[Git][debian-gis-team/pyosmium][master] 5 commits: New upstream version 2.15.4

Bas Couwenberg gitlab at salsa.debian.org
Sat Feb 29 18:53:36 GMT 2020



Bas Couwenberg pushed to branch master at Debian GIS Project / pyosmium


Commits:
d4ca3739 by Bas Couwenberg at 2020-02-29T19:36:49+01:00
New upstream version 2.15.4
- - - - -
0e776abb by Bas Couwenberg at 2020-02-29T19:36:53+01:00
Update upstream source from tag 'upstream/2.15.4'

Update to upstream version '2.15.4'
with Debian dir 81a41c215e0fe74d2d03e9e0511a066778b8e5b5
- - - - -
0cd37b2f by Bas Couwenberg at 2020-02-29T19:38:18+01:00
New upstream release.

- - - - -
918c0236 by Bas Couwenberg at 2020-02-29T19:39:20+01:00
Bump minimum required libosmium2-dev to 2.15.4.

- - - - -
6bf35319 by Bas Couwenberg at 2020-02-29T19:39:30+01:00
Set distribution to unstable.

- - - - -


21 changed files:

- .travis.yml
- CHANGELOG.md
- CMakeLists.txt
- README.md
- debian/changelog
- debian/control
- doc/conf.py
- doc/index.rst
- doc/intro.rst
- doc/tools.rst
- + doc/troubleshooting.rst
- + doc/updating_osm_data.rst
- lib/osm.cc
- lib/simple_handler.h
- lib/write_handler.cc
- src/osmium/replication/server.py
- src/osmium/version.py
- + test/test_dangling_references.py
- test/test_pyosmium_get_changes.py
- test/test_replication.py
- tools/pyosmium-get-changes


Changes:

=====================================
.travis.yml
=====================================
@@ -57,7 +57,7 @@ install:
     - git clone --quiet --depth 1 https://github.com/mapbox/protozero.git contrib/protozero
     - git clone --quiet --depth 1 https://github.com/pybind/pybind11.git contrib/pybind11
     - if [ "$TRAVIS_OS_NAME" = 'osx' ]; then
-          pip${USE_PYTHON_VERSION} install -q nose mock shapely;
+          pip${USE_PYTHON_VERSION} install --user -q nose mock shapely;
       fi
 
 script:


=====================================
CHANGELOG.md
=====================================
@@ -4,6 +4,23 @@
 All notable changes to this project will be documented in this file.
 This project adheres to [Semantic Versioning](http://semver.org/).
 
+## [2.15.4] - 2020-02-29
+
+### Added
+
+- pyosmium-get-changes: allow to pipe updates to stdout
+- doc: add more information about file updates
+
+### Changed
+
+- check for dangling references in callbacks
+- use a custom HTTP user agent when requesting diffs
+- use current libosmium
+
+### Fixed
+
+- replication: retry downloading truncated state files
+
 ## [2.15.3] - 2019-08-16
 
 ### Added


=====================================
CMakeLists.txt
=====================================
@@ -19,7 +19,7 @@ else()
     find_package(pybind11 2.2 REQUIRED)
 endif()
 
-find_package(Boost 1.55 REQUIRED COMPONENTS)
+find_package(Boost 1.41 REQUIRED COMPONENTS)
 include_directories(SYSTEM ${Boost_INCLUDE_DIRS})
 
 function(set_module_output module outdir)


=====================================
README.md
=====================================
@@ -12,6 +12,7 @@ manner.
 ## Dependencies
 
 Python >= 2.7 is supported but a version >= 3.3 is strongly recommended.
+Pypy is known to not work.
 
 Other requirements are:
 


=====================================
debian/changelog
=====================================
@@ -1,17 +1,19 @@
-pyosmium (2.15.3-2) UNRELEASED; urgency=medium
+pyosmium (2.15.4-1) unstable; urgency=medium
 
   [ Bas Couwenberg ]
+  * New upstream release.
   * Bump Standards-Version to 4.5.0, no changes.
   * Drop Provides: ${python3:Provides}.
   * Mark python3-shapely with <!nocheck> as well.
   * Drop Name field from upstream metadata.
+  * Bump minimum required libosmium2-dev to 2.15.4.
 
   [ Helmut Grohne ]
   * Reduce build dependencies for cross building. (Closes: #946012)
     + Move sphinx dependencies to Build-Depends-Indep.
     + Annotate test dependencies with <!nocheck>.
 
- -- Bas Couwenberg <sebastic at debian.org>  Mon, 30 Sep 2019 19:47:49 +0200
+ -- Bas Couwenberg <sebastic at debian.org>  Sat, 29 Feb 2020 19:39:21 +0100
 
 pyosmium (2.15.3-1) unstable; urgency=medium
 


=====================================
debian/control
=====================================
@@ -11,7 +11,7 @@ Build-Depends: cmake (>= 2.8.12),
                libexpat1-dev,
                libgdal-dev,
                libgeos++-dev,
-               libosmium2-dev (>= 2.15.2),
+               libosmium2-dev (>= 2.15.4),
                libsparsehash-dev,
                pybind11-dev,
                python3-all-dev,


=====================================
doc/conf.py
=====================================
@@ -67,7 +67,7 @@ master_doc = 'index'
 
 # General information about the project.
 project = 'Pyosmium'
-copyright = '2015-2017, Sarah Hoffmann'
+copyright = '2015-2020, Sarah Hoffmann'
 
 # The version info for the project you're documenting, acts as replacement for
 # |version| and |release|, also used in various other places throughout the


=====================================
doc/index.rst
=====================================
@@ -8,7 +8,7 @@ Welcome to Pyosmium's documentation!
 
 Pyosmium is a library to process OSM files in different formats. It is
 a wrapper of the C++ library `osmium <http://osmcode.org/libosmium/>`_
-and profits from its fast implementation.
+and allows fast and efficent sequential processing of OpenStreetMap data.
 
 .. toctree::
    :maxdepth: 2
@@ -16,6 +16,7 @@ and profits from its fast implementation.
    intro
    reference
    tools
+   troubleshooting
     
 
 * :ref:`genindex`


=====================================
doc/intro.rst
=====================================
@@ -68,11 +68,11 @@ or relations, so handler functions for all three types need to be implemented::
 
     class HotelCounterHandler(osmium.SimpleHandler):
         def __init__(self):
-            osmium.SimpleHandler.__init__(self)
+            super(HotelCounterHandler, self).__init__()
             self.num_nodes = 0
 
         def count_hotel(self, tags):
-            if tags['tourism'] == 'hotel':
+            if tags.get('tourism') == 'hotel':
                 self.num_nodes += 1
 
         def node(self, n):
@@ -90,11 +90,72 @@ listed in :py:class:`osmium.osm.OSMObject`, and some that are specific to
 each type. As all objects have tags, it is possible to reuse the same
 implementation for all types. The main function remains the same.
 
-It is important to remember that the object
-references that are handed to the handler are only temporary. They will
-become invalid as soon as the function returns. Handler functions *must*
-copy any data that should be kept for later use into their own data
-structures. This also includes attributes like tag lists.
+.. _intro-copying-data-from-object:
+
+Collecting data from an OSM file
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Let's say that we do not only want to count the hotels in the file but
+we want to print their names in alphabetical order. For simplicity, lets
+restrict outself to nodes tagged as hotels. A naive implementation
+might want to simply collect all hotels and then print their names::
+
+
+    class HotelHandler(osmium.SimpleHandler):
+        def __init__(self):
+            super(HotelHandler, self).__init__()
+            self.hotels = []
+
+        def node(self, o):
+            if o.tags.get('tourism') == 'hotel':
+                self.hotels.append(o)       # THIS IS WRONG!
+
+
+    h = HotelHandler()
+    h.apply_file(some_file)
+
+    hotel_names = []
+    for o in h.hotels:
+        if 'name' in o.tags:
+            self.hotels.append(o.tags['name'])
+
+    print(sorted(hotel_names))
+
+If you try to execute this, then python will immediately return with a
+Runtime error::
+
+    RuntimeError: Node callback keeps reference to OSM object. This is not allowed.
+
+The object references that are handed to the handler are only temporary.
+Osmium reads the object from the file, gives them to the handler function
+and then discards them to free the memory. If you keep a reference
+after the handler function returns, it points to invalid memory. Pyosmium
+does not allow that and throws the runtime error above. If you want to keep
+data for later use *the data must be copied out*.
+
+For the example, with the list of hotels, we only need to keep the name
+of each hotel. So a correct implementation is::
+
+    class HotelHandler(osmium.SimpleHandler):
+        def __init__(self):
+            super(HotelHandler, self).__init__()
+            self.hotels = []
+
+        def node(self, o):
+            if o.tags.get('tourism') == 'hotel' and 'name' in o.tags:
+                self.hotels.append(o.tags['name'])
+
+
+    h = HotelHandler()
+    h.apply_file(some_file)
+
+    print(sorted(h.hotels))
+
+Not only the object itself is a temporary reference. Also the tags, node and
+member lists must be copied, when they need to be store. As a general rule,
+it is good practise to store as little information as possible. In the example
+above, we could have stored the tags of all objects and then done the filtering
+later but that would need much more memory.
 
 Handling Geometries
 ^^^^^^^^^^^^^^^^^^^


=====================================
doc/tools.rst
=====================================
@@ -6,5 +6,6 @@ Pyosmium comes with a couple of scripts for handling change files:
 .. toctree::
     :maxdepth: 1
 
+    updating_osm_data
     tools_get_changes
     tools_uptodate


=====================================
doc/troubleshooting.rst
=====================================
@@ -0,0 +1,24 @@
+Troubleshooting
+===============
+
+``RuntimeError: callback keeps reference to OSM object``
+--------------------------------------------------------
+
+One of your callbacks tries to store the OSM object outside the scope of
+the function. This is not allowed because for performance reasons, Osmium
+gives you only a temporary view of the data. You must make a (deep) copy of all
+data that you want to use later outside of the callback. See also
+:ref:`intro-copying-data-from-object`.
+
+Segfault when importing another library
+---------------------------------------
+
+There have been cases reported where pyosmium does not play well with other
+python libraries that are compiled. If you see a segmentation fault when
+importing pyosmium together with other libraries, try installing the
+source code version of pyosmium. This can be done with pip::
+
+    pip install --no-binary :all: osmium
+
+You need to first install the depencies listed in the README.
+


=====================================
doc/updating_osm_data.rst
=====================================
@@ -0,0 +1,195 @@
+Updating OpenStreetMap data from change files
+=============================================
+
+OpenStreetMap is a database that is constantly extended and updated. When you
+download the planet or an extract of it, you only get a snapshot of the
+database at a given point in time. To keep up-to-date with the development
+of OSM, you either need to download a new snapshot or you can update your
+existing data from change files published along with the planet file.
+Pyosmium ships with two tools that help you to process change files:
+`pyosmium-get-changes` and `pyosmium-up-to-date`.
+
+This section explains the basics of OSM change files and how to use Pyosmium's
+tools to keep your data up to date.
+
+About change files
+------------------
+
+Regular `change files <https://wiki.openstreetmap.org/wiki/Planet.osm/diffs>`_
+are published for the planet and also by some extract services. These 
+change files are special OSM data files containing all changes to the database
+in a regular interval. Change files are not referentially complete. That means
+that they only contain OSM objects that have changed but not necessarily
+all the objects that are referenced by the changed objects. The result is
+that change file are rarely useful on their own. But they can be used
+to update an existing snapshot of OSM data.
+
+Getting change files
+--------------------
+
+There are multiple sources for OSM change files available:
+
+ * https://planet.openstreetmap.org/replication is the official source
+   for planet-wide updates. There are change files for
+   minutely, hourly and daily intervals available.
+
+ * `Geofabrik <http://download.geofabrik.de>`_ offers daily change files
+   for all its updates. See the extract page for a link to the replication URL.
+   Note that change files go only about 3 months back. Older files are deleted.
+
+ * `openstreetmap.fr <http://download.geofabrik.de>`_ offers minutely change
+   files for all its extracts.
+
+For other services also check out the list of services on the
+`OSM wiki <https://wiki.openstreetmap.org/wiki/Planet.osm>`_.
+
+Updating a planet or extract
+----------------------------
+
+If you have downloaded the planet or an extract with a replication service,
+then updating your OSM file can be as easy as::
+
+  pyosmium-up-to-date <osmfile.osm.pbf>
+
+This finds the right replication source and file to start with, downloads
+changes and updates the given file with the data. You can repeat this command
+whenever you want to have newer data. The command automatically picks up at
+the same point where it left off after the previous update.
+
+Choosing the replication source
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+OSM files in PBF format are able to save the replication source and the
+current status on their own. If you want to switch the replication source
+or have a file that does not have the information, you need to bootstrap
+the update process and manually point `pyosmium-up-to-date` to the right
+service::
+
+  pyosmium-up-to-date --ignore-osmosis-headers --server <replication URL> <osmfile.osm.pbf>
+
+`pyosmium-up-to-date` automatically finds the right sequence ID to use
+by looking at the age of the data in your OSM file. It updates the file
+and stores the new replication source in the file. The additional parameters
+are then not necessary anymore for subsequent updates.
+
+.. ATTENTION::
+   Always use the PBF format to store your data. Other format do not support
+   to save the replication information. pyosmium-up-to-date is still able to
+   update these kind of files if you manually point to the replication server
+   but the process is always more costly because it needs to find the right
+   starting point for updates first.
+
+Updating larger amounts of data
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+When used without any parameters, pyosmium downloads at a maximum about
+1GB of changes. That corresponds to about 3 days of planet-wide changes.
+You can increase the amount using the additional `--size` parameter::
+
+  pyosmium-up-to-date --size=10000 planet.osm.pbf
+
+This would download about 10GB or 30 days of change data. If your OSM data file is
+older than that, downloading the full file anew is likely going to be faster.
+
+`pyosmium-up-to-date` uses return codes to signal if it has downloaded all
+available updates. A return code of 0 means that it has downloaded and
+applied all available data. A return code of 1 indicates that it has applied
+some updates but more are available.
+
+A minimal script that updates a file until it is really up-to-date with the
+replcaition source would look like this::
+
+  status=1  # we wnat more data
+  while [ $status -eq 1 ]; do
+    pyosmium-up-to-date planet.osm.pbf
+    # save the return code
+    status=$?
+  done
+
+Creating change files for updating databases
+--------------------------------------------
+
+There are quite a few tools that can import OSM data into databases, for
+example osm2pgsql, imposm or Nominatim. These tools often can use change files
+to keep their database up-to-date. pyosmium can be used to create the appropriate
+change files. This is slightly more involved than updating a file.
+
+Preparing the state file
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+Before downloading the updates, you need to find out, with which sequence
+number to start. The easiest way to remember your current status is to save
+the number in a file. Pyosmium can then read and update the file for you.
+
+Method 1: Starting from the import file
+"""""""""""""""""""""""""""""""""""""""
+
+If you still have the OSM file you used to set up your database, then
+create a state file as follows::
+
+  pyosmium-get-changes -O <osmfile.osm.obf> -f sequence.state -v
+
+Note that there is no output file yet. This creates a new file `sequence.state`
+with the sequence ID where updates should start and prints the URL of the
+replication service to use.
+
+Method 2: Starting from a date
+""""""""""""""""""""""""""""""
+
+If you do not have the original OSM file anymore, then a good strategy is to
+look for the date of the newest node in the database to find the snapshot date
+of your database. Find the highest node ID, then look up the date for version 1
+on the OSM website. For example the date for node 2367234 can be found at
+https://www.openstreetmap.org/api/0.6/node/23672341/1 Find and copy the
+`timestamp` field. Then create a state file using this date::
+
+  pyosmium-get-changes -D 2007-01-01T14:16:21Z -f sequence.state -v
+
+Also here, this creates a new file `sequence.state` with the sequence ID where
+updates should start and prints the URL of the replication service to use.
+
+Creating a change file
+^^^^^^^^^^^^^^^^^^^^^^
+
+Now you can create change files using the state::
+
+  pyosmium-get-changes --server <replication server> -f sequence.state -o newchange.osm.gz
+
+This downloads the latest changes from the server, saves them in the file
+`newchange.osm.gz` and updates your state file. `<replication server>` is the
+URL that was printed, when you set up the state file. The parameter can be
+omitted when you use minutely change files from openstreetmap.org.
+
+`pyosmium-get-changes` loads only about 100MB worth of updates at once (about
+8 hours of planet updates). If you want more, then add a `--size` parameter.
+
+Continuously updating a database
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+`pyosmium-get-changes` emits special return codes that can be used to set
+up a script that continuously fetches updates and applies them to a
+database. The important error codes are:
+
+ * 0 - changes successfully downloaded and new change file created
+ * 3 - no new changes are available from the server
+
+All other error codes indicate fatal errors.
+
+A simple shell script can look like this::
+
+  while true; do
+    # get the next batch of changes
+    pyosmium-get-changes -f sequence.state -o newchange.osm.gz
+    # save the return code
+    status=$?
+
+    if [ $status -eq 0 ]; then
+      # apply newchange.osm.gz here
+      ....
+    elif [ $status -eq 3 ]; then
+      # No new data, so sleep for a bit
+      sleep 60
+    else
+      echo "Fatal error, stopping updates."
+      exit $status
+  done


=====================================
lib/osm.cc
=====================================
@@ -363,7 +363,9 @@ PYBIND11_MODULE(_osm, m) {
                                                        a.cend<osmium::OuterRing>()); },
                             py::keep_alive<0, 1>(),
              "Return an iterator over all outer rings of the multipolygon.")
-        .def("inner_rings", &osmium::Area::inner_rings, py::arg("outer_ring"),
+        .def("inner_rings", &osmium::Area::inner_rings,
+                            py::keep_alive<0, 1>(),
+             py::arg("outer_ring"),
              "Return an iterator over all inner rings of the multipolygon.")
     ;
 


=====================================
lib/simple_handler.h
=====================================
@@ -71,15 +71,15 @@ public:
     osmium::osm_entity_bits::type enabled_callbacks() override
     {
         auto callbacks = osmium::osm_entity_bits::nothing;
-        if (hasfunc("node"))
+        if (callback("node"))
             callbacks |= osmium::osm_entity_bits::node;
-        if (hasfunc("way"))
+        if (callback("way"))
             callbacks |= osmium::osm_entity_bits::way;
-        if (hasfunc("relation"))
+        if (callback("relation"))
             callbacks |= osmium::osm_entity_bits::relation;
-        if (hasfunc("area"))
+        if (callback("area"))
             callbacks |= osmium::osm_entity_bits::area;
-        if (hasfunc("changeset"))
+        if (callback("changeset"))
             callbacks |= osmium::osm_entity_bits::changeset;
 
         return callbacks;
@@ -89,36 +89,76 @@ public:
     void node(osmium::Node const *n) override
     {
         pybind11::gil_scoped_acquire acquire;
-        PYBIND11_OVERLOAD(void, SimpleHandler, node, n);
+        auto func = callback("node");
+        if (func) {
+            auto obj = pybind11::cast(n, pybind11::return_value_policy::reference);
+
+            func(obj);
+
+            if (obj.ref_count() != 1)
+                throw std::runtime_error("Node callback keeps reference to OSM object. This is not allowed.");
+        }
     }
 
     void way(osmium::Way const *w) override
     {
         pybind11::gil_scoped_acquire acquire;
-        PYBIND11_OVERLOAD(void, SimpleHandler, way, w);
+        auto func = callback("way");
+        if (func) {
+            auto obj = pybind11::cast(w, pybind11::return_value_policy::reference);
+
+            func(obj);
+
+            if (obj.ref_count() != 1)
+                throw std::runtime_error("Way callback keeps reference to OSM object. This is not allowed.");
+        }
     }
 
     void relation(osmium::Relation const *r) override
     {
         pybind11::gil_scoped_acquire acquire;
-        PYBIND11_OVERLOAD(void, SimpleHandler, relation, r);
+        auto func = callback("relation");
+        if (func) {
+            auto obj = pybind11::cast(r, pybind11::return_value_policy::reference);
+
+            func(obj);
+
+            if (obj.ref_count() != 1)
+                throw std::runtime_error("Relation callback keeps reference to OSM object. This is not allowed.");
+        }
     }
 
     void changeset(osmium::Changeset const *c) override
     {
         pybind11::gil_scoped_acquire acquire;
-        PYBIND11_OVERLOAD(void, SimpleHandler, changeset, c);
+        auto func = callback("changeset");
+        if (func) {
+            auto obj = pybind11::cast(c, pybind11::return_value_policy::reference);
+
+            func(obj);
+
+            if (obj.ref_count() != 1)
+                throw std::runtime_error("Changeset callback keeps reference to OSM object. This is not allowed.");
+        }
     }
 
     void area(osmium::Area const *a) override
     {
         pybind11::gil_scoped_acquire acquire;
-        PYBIND11_OVERLOAD(void, SimpleHandler, area, a);
+        auto func = callback("area");
+        if (func) {
+            auto obj = pybind11::cast(a, pybind11::return_value_policy::reference);
+
+            func(obj);
+
+            if (obj.ref_count() != 1)
+                throw std::runtime_error("Area callback keeps reference to OSM object. This is not allowed.");
+        }
     }
 
 private:
-    bool hasfunc(char const *name)
-    { return (bool)pybind11::get_overload(static_cast<SimpleHandler const *>(this), name); }
+    pybind11::function callback(char const *name)
+    { return pybind11::get_overload(static_cast<SimpleHandler const *>(this), name); }
 };
 
 #endif // PYOSMIUM_SIMPLE_HANDLER_HPP


=====================================
lib/write_handler.cc
=====================================
@@ -19,6 +19,13 @@ public:
              osmium::memory::Buffer::auto_grow::yes)
     {}
 
+    WriteHandler(const char* filename, size_t bufsz,
+                 const std::string& filetype)
+    : writer(osmium::io::File(filename, filetype)),
+      buffer(bufsz < 2 * BUFFER_WRAP ? 2 * BUFFER_WRAP : bufsz,
+             osmium::memory::Buffer::auto_grow::yes)
+    {}
+
     virtual ~WriteHandler()
     { close(); }
 
@@ -82,13 +89,17 @@ void init_write_handler(pybind11::module &m)
     py::class_<WriteHandler, BaseHandler>(m, "WriteHandler",
         "Handler function that writes all data directly to a file."
         "The handler takes a file name as its mandatory parameter. The file "
-        "must not yet exist. The file type to output is determined from the "
-        "file extension. "
+        "must not yet exist. If '-' is given, then stdout is used. "
         "The second (optional) parameter is the buffer size. osmium caches the "
         "output data in an internal memory buffer before writing it on disk. This "
         "parameter allows changing the default buffer size of 4MB. Larger buffers "
         "are normally better but you should be aware that there are normally multiple "
-        "buffers in use during the write process.")
+        "buffers in use during the write process."
+        "The third (optional) parameter defines the file type. Normally this "
+        "can be omitted because osmium determines the file type directly from "
+        "the filename. Only when stdout is used, then the parameter is "
+        "mandatory.")
+        .def(py::init<const char*, unsigned long, const char*>())
         .def(py::init<const char*, unsigned long>())
         .def(py::init<const char*>())
         .def("close", &WriteHandler::close,


=====================================
src/osmium/replication/server.py
=====================================
@@ -15,6 +15,7 @@ from collections import namedtuple
 from math import ceil
 from osmium import MergeInputReader
 from osmium import io as oio
+from osmium import version
 
 import logging
 
@@ -34,6 +35,10 @@ class ReplicationServer(object):
         self.baseurl = url
         self.diff_type = diff_type
 
+    def make_request(self, url):
+        headers = {"User-Agent" : "pyosmium/{}".format(version.pyosmium_release)}
+        return urlrequest.Request(url, headers=headers)
+
     def open_url(self, url):
         """ Download a resource from the given URL and return a byte sequence
             of the content.
@@ -266,39 +271,48 @@ class ReplicationServer(object):
                 return lower.sequence
 
 
-    def get_state_info(self, seq=None):
+    def get_state_info(self, seq=None, retries=2):
         """ Downloads and returns the state information for the given
             sequence. If the download is successful, a namedtuple with
             `sequence` and `timestamp` is returned, otherwise the function
-            returns `None`.
+            returns `None`. `retries` sets the number of times the download
+            is retried when pyosmium detects a truncated state file.
         """
-        try:
-            response = self.open_url(self.get_state_url(seq))
-        except Exception as err:
-            return None
+        for retry in range(retries + 1):
+            try:
+                response = self.open_url(self.make_request(self.get_state_url(seq)))
+            except Exception as err:
+                log.debug("Loading state info {} failed with: {}".format(seq, str(err)))
+                return None
 
-        ts = None
-        seq = None
-        line = response.readline()
-        while line:
-            line = line.decode('utf-8')
-            if '#' in line:
-                line = line[0:line.index('#')]
-            else:
-                line = line.strip()
-            if line:
-                kv = line.split('=', 2)
-                if len(kv) != 2:
-                    return None
-                if kv[0] == 'sequenceNumber':
-                    seq = int(kv[1])
-                elif kv[0] == 'timestamp':
-                    ts = dt.datetime.strptime(kv[1], "%Y-%m-%dT%H\\:%M\\:%SZ")
-                    if sys.version_info >= (3,0):
-                        ts = ts.replace(tzinfo=dt.timezone.utc)
+            ts = None
+            seq = None
             line = response.readline()
-
-        return OsmosisState(sequence=seq, timestamp=ts)
+            while line:
+                line = line.decode('utf-8')
+                if '#' in line:
+                    line = line[0:line.index('#')]
+                else:
+                    line = line.strip()
+                if line:
+                    kv = line.split('=', 2)
+                    if len(kv) != 2:
+                        return None
+                    if kv[0] == 'sequenceNumber':
+                        seq = int(kv[1])
+                    elif kv[0] == 'timestamp':
+                        try:
+                            ts = dt.datetime.strptime(kv[1], "%Y-%m-%dT%H\\:%M\\:%SZ")
+                        except ValueError:
+                            break
+                        if sys.version_info >= (3,0):
+                            ts = ts.replace(tzinfo=dt.timezone.utc)
+                line = response.readline()
+
+            if ts is not None and seq is not None:
+                return OsmosisState(sequence=seq, timestamp=ts)
+
+        return None
 
     def get_diff_block(self, seq):
         """ Downloads the diff with the given sequence number and returns
@@ -306,7 +320,7 @@ class ReplicationServer(object):
             (or :code:`urllib2.HTTPError` in python2)
             if the file cannot be downloaded.
         """
-        return self.open_url(self.get_diff_url(seq)).read()
+        return self.open_url(self.make_request(self.get_diff_url(seq))).read()
 
 
     def get_state_url(self, seq):


=====================================
src/osmium/version.py
=====================================
@@ -5,11 +5,11 @@ Version information.
 # the major version
 pyosmium_major = '2.15'
 # current release (Pip version)
-pyosmium_release = '2.15.3'
+pyosmium_release = '2.15.4'
 
 # libosmium version shipped with the Pip release
-libosmium_version = '2.15.2'
+libosmium_version = '2.15.4'
 # protozero version shipped with the Pip release
 protozero_version = '1.6.8'
 # pybind11 version shipped with the Pip release
-pybind11_version = '2.3.0'
+pybind11_version = '2.4.3'


=====================================
test/test_dangling_references.py
=====================================
@@ -0,0 +1,188 @@
+# vim: set fileencoding=utf-8 :
+from nose.tools import *
+import unittest
+from sys import version_info as python_version
+
+from helpers import create_osm_file
+
+import osmium as o
+
+class DanglingReferenceBase(object):
+    """ Base class for tests that try to keep a reference to the object
+        that was handed into the callback. We expect that the handler
+        bails out with a runtime error in such a case.
+    """
+
+
+    node = None
+    way = None
+    relation = None
+    area = None
+    refkeeper = []
+
+    def keep(self, obj):
+        self.refkeeper.append(obj)
+
+    def test_keep_reference(self):
+        h = o.make_simple_handler(node=self.node, way=self.way,
+                                  relation=self.relation, area=self.area)
+        if python_version < (3,0):
+            with self.assertRaisesRegexp(RuntimeError, "callback keeps reference"):
+                h.apply_file('example-test.pbf')
+        else:
+            with self.assertRaisesRegex(RuntimeError, "callback keeps reference"):
+                h.apply_file('example-test.pbf')
+        assert_greater(len(self.refkeeper), 0)
+        while len(self.refkeeper) > 0:
+            self.refkeeper.pop()
+#        self.refkeeper.clear()
+
+
+class TestKeepNodeRef(DanglingReferenceBase, unittest.TestCase):
+
+    def node(self, n):
+        self.keep(n)
+
+class TestKeepWayRef(DanglingReferenceBase, unittest.TestCase):
+
+    def way(self, w):
+        self.keep(w)
+
+class TestKeepRelationRef(DanglingReferenceBase, unittest.TestCase):
+
+    def relation(self, r):
+        self.keep(r)
+
+class TestKeepAreaRef(DanglingReferenceBase, unittest.TestCase):
+
+    def area(self, a):
+        self.keep(a)
+
+class TestKeepNodeTagsRef(DanglingReferenceBase, unittest.TestCase):
+
+    def node(self, n):
+        self.keep(n.tags)
+
+class TestKeepWayTagsRef(DanglingReferenceBase, unittest.TestCase):
+
+    def way(self, w):
+        self.keep(w.tags)
+
+class TestKeepRelationTagsRef(DanglingReferenceBase, unittest.TestCase):
+
+    def relation(self, r):
+        self.keep(r.tags)
+
+class TestKeepAreaTagsRef(DanglingReferenceBase, unittest.TestCase):
+
+    def area(self, a):
+        self.keep(a.tags)
+
+class TestKeepTagListIterator(DanglingReferenceBase, unittest.TestCase):
+
+    def node(self, n):
+        self.keep(n.tags.__iter__())
+
+class TestKeepSingleTag(DanglingReferenceBase, unittest.TestCase):
+
+    def node(self, n):
+        for t in n.tags:
+            self.keep(t)
+
+class TestKeepOuterRingIterator(DanglingReferenceBase, unittest.TestCase):
+
+    def area(self, r):
+        self.keep(r.outer_rings())
+
+class TestKeepOuterRing(DanglingReferenceBase, unittest.TestCase):
+
+    def area(self, r):
+        for ring in r.outer_rings():
+            self.keep(ring)
+
+class TestKeepInnerRingIterator(DanglingReferenceBase, unittest.TestCase):
+
+    def area(self, r):
+        for ring in r.outer_rings():
+            self.keep(r.inner_rings(ring))
+
+class TestKeepInnerRing(DanglingReferenceBase, unittest.TestCase):
+
+    def area(self, r):
+        for outer in r.outer_rings():
+            for inner in r.inner_rings(outer):
+                self.keep(inner)
+
+class TestKeepRelationMemberIterator(DanglingReferenceBase, unittest.TestCase):
+
+    def relation(self, r):
+        self.keep(r.members)
+
+class TestKeepRelationMember(DanglingReferenceBase, unittest.TestCase):
+
+    def relation(self, r):
+        for m in r.members:
+            self.keep(m)
+
+
+class NotADanglingReferenceBase(object):
+    """ Base class for tests that ensure that the callback does not
+        bail out because of dangling references when POD types are
+        kept.
+    """
+
+    node = None
+    way = None
+    relation = None
+    area = None
+    refkeeper = []
+
+    def keep(self, obj):
+        self.refkeeper.append(obj)
+
+    def test_keep_reference(self):
+        h = o.make_simple_handler(node=self.node, way=self.way,
+                                  relation=self.relation, area=self.area)
+        # Does not rise a dangling reference excpetion
+        h.apply_file('example-test.pbf')
+        assert_greater(len(self.refkeeper), 0)
+        #self.refkeeper.clear()
+        while len(self.refkeeper) > 0:
+            self.refkeeper.pop()
+
+class TestKeepId(NotADanglingReferenceBase, unittest.TestCase):
+
+    def node(self, n):
+        self.keep(n.id)
+
+class TestKeepChangeset(NotADanglingReferenceBase, unittest.TestCase):
+
+    def node(self, n):
+        self.keep(n.changeset)
+
+class TestKeepUid(NotADanglingReferenceBase, unittest.TestCase):
+
+    def node(self, n):
+        self.keep(n.uid)
+
+class TestKeepUser(NotADanglingReferenceBase, unittest.TestCase):
+
+    def node(self, n):
+        self.keep(n.user)
+
+class TestKeepLocation(NotADanglingReferenceBase, unittest.TestCase):
+
+    def node(self, n):
+        self.keep(n.location)
+
+class TestKeepKey(NotADanglingReferenceBase, unittest.TestCase):
+
+    def node(self, n):
+        for t in n.tags:
+            self.keep(t.k)
+
+class TestKeepValue(NotADanglingReferenceBase, unittest.TestCase):
+
+    def node(self, n):
+        for t in n.tags:
+            self.keep(t.v)


=====================================
test/test_pyosmium_get_changes.py
=====================================
@@ -44,7 +44,7 @@ class TestPyosmiumGetChanges(unittest.TestCase):
                                            "../../tools/pyosmium-get-changes"))
         self.url_mock = MagicMock()
         self.urls = dict()
-        self.url_mock.side_effect = lambda url : self.urls[url]
+        self.url_mock.side_effect = lambda url : self.urls[url.get_full_url()]
         self.script['rserv'].urlrequest.urlopen = self.url_mock
 
     def url(self, url, result):


=====================================
test/test_replication.py
=====================================
@@ -77,6 +77,99 @@ def test_get_state_valid(mock):
 
     assert_equal(mock.call_count, 1)
 
+ at patch('osmium.replication.server.urlrequest.urlopen', new_callable=UrllibMock)
+def test_get_state_sequence_cut(mock):
+    mock.set_script(("""\
+        #Sat Aug 26 11:04:04 UTC 2017
+        txnMaxQueried=1219304113
+        sequenceNumber=259""",
+        """\
+        #Sat Aug 26 11:04:04 UTC 2017
+        txnMaxQueried=1219304113
+        sequenceNumber=2594669
+        timestamp=2017-08-26T11\:04\:02Z"""))
+
+    res = rserv.ReplicationServer("http://test.io").get_state_info()
+
+    assert_is_not_none(res)
+    assert_equals(res.timestamp, mkdate(2017, 8, 26, 11, 4, 2))
+    assert_equals(res.sequence, 2594669)
+
+    assert_equal(mock.call_count, 2)
+
+ at patch('osmium.replication.server.urlrequest.urlopen', new_callable=UrllibMock)
+def test_get_state_date_cut(mock):
+    mock.set_script(("""\
+        #Sat Aug 26 11:04:04 UTC 2017
+        txnMaxQueried=1219304113
+        sequenceNumber=2594669
+        timestamp=2017-08-2""",
+        """\
+        #Sat Aug 26 11:04:04 UTC 2017
+        txnMaxQueried=1219304113
+        sequenceNumber=2594669
+        timestamp=2017-08-26T11\:04\:02Z"""))
+
+    res = rserv.ReplicationServer("http://test.io").get_state_info()
+
+    assert_is_not_none(res)
+    assert_equals(res.timestamp, mkdate(2017, 8, 26, 11, 4, 2))
+    assert_equals(res.sequence, 2594669)
+
+    assert_equal(mock.call_count, 2)
+
+ at patch('osmium.replication.server.urlrequest.urlopen', new_callable=UrllibMock)
+def test_get_state_timestamp_cut(mock):
+    mock.set_script(("""\
+        #Sat Aug 26 11:04:04 UTC 2017
+        txnMaxQueried=1219304113
+        sequenceNumber=2594669
+        timestamp=""",
+        """\
+        #Sat Aug 26 11:04:04 UTC 2017
+        txnMaxQueried=1219304113
+        sequenceNumber=2594669
+        timestamp=2017-08-26T11\:04\:02Z"""))
+
+    res = rserv.ReplicationServer("http://test.io").get_state_info()
+
+    assert_is_not_none(res)
+    assert_equals(res.timestamp, mkdate(2017, 8, 26, 11, 4, 2))
+    assert_equals(res.sequence, 2594669)
+
+    assert_equal(mock.call_count, 2)
+
+ at patch('osmium.replication.server.urlrequest.urlopen', new_callable=UrllibMock)
+def test_get_state_too_many_retries(mock):
+    mock.set_script(("""\
+        #Sat Aug 26 11:04:04 UTC 2017
+        txnMaxQueried=1219304113
+        sequenceNumber=2594669
+        timestamp=""",
+        """\
+        #Sat Aug 26 11:04:04 UTC 2017
+        txnMaxQueried=1219304113
+        sequenceNumber=2594669
+        timestamp=""",
+        """\
+        #Sat Aug 26 11:04:04 UTC 2017
+        txnMaxQueried=1219304113
+        sequenceNumber=2594669
+        timestamp=""",
+        """\
+        #Sat Aug 26 11:04:04 UTC 2017
+        txnMaxQueried=1219304113
+        sequenceNumber=2594669
+        timestamp=2017-08-26T11\:04\:02Z"""))
+
+    res = rserv.ReplicationServer("http://test.io").get_state_info()
+
+    assert_is_none(res)
+
+    assert_equal(mock.call_count, 3)
+
+
+
 @patch('osmium.replication.server.urlrequest.urlopen')
 def test_get_state_server_timeout(mock):
     mock.side_effect = URLError(reason='Mock')


=====================================
tools/pyosmium-get-changes
=====================================
@@ -15,6 +15,10 @@ On success, the program will print a single number to stdout, the sequence
 number where to continue updates in the next run. This output can also be
 written to (and later read from) a file.
 
+*Note:* you may pipe the diff also to standard output using '-o -'. Then
+the sequence number will not be printed. You must write it to a file in that
+case.
+
 Some OSM data sources require a cookie to be sent with the HTTP requests.
 pyosmium-get-changes does not fetch the cookie from these services for you.
 However, it can read cookies from a Netscape-style cookie jar file, send these
@@ -132,6 +136,8 @@ def get_arg_parser(from_main=False):
     parser.add_argument('-o', '--outfile', dest='outfile',
                         help=h("""Name of diff output file. If omitted, only the
                               sequence ID will be printed where updates would start."""))
+    parser.add_argument('--format', dest='outformat', metavar='FORMAT',
+                        help="Format the data should be saved in.")
     parser.add_argument('--server', action='store', dest='server_url',
                         help='Base URL of the replication server')
     parser.add_argument('--cookie', dest='cookie',
@@ -172,6 +178,10 @@ def main(args):
 
     log.setLevel(max(3 - options.loglevel, 0) * 10)
 
+    if options.outfile == '-' and options.outformat is None:
+        log.error("You must define a format when using stdout. See --format.")
+        return 1
+
     if options.start_file is not None:
         options.start = ReplicationStart.from_osm_file(options.start_file,
                                                        options.ignore_headers)
@@ -221,7 +231,10 @@ def main(args):
         return 0
 
     log.debug("Starting download at ID %d (max %d MB)" % (startseq, options.outsize))
-    outhandler = WriteHandler(options.outfile)
+    if options.outformat is not None:
+        outhandler = WriteHandler(options.outfile, 4096*1024, options.outformat)
+    else:
+        outhandler = WriteHandler(options.outfile)
 
     endseq = svr.apply_diffs(outhandler, startseq, max_size=options.outsize*1024,
                              simplify=options.simplify)
@@ -234,7 +247,8 @@ def main(args):
     if endseq is None:
         return 3
 
-    write_end_sequence(options.seq_file, endseq)
+    if options.outfile != '-' or options.seq_file is not None:
+        write_end_sequence(options.seq_file, endseq)
 
     return 0
 



View it on GitLab: https://salsa.debian.org/debian-gis-team/pyosmium/-/compare/7979fa83ed57f38ae1dc2aad9f8d9b07035db92e...6bf35319c978456f244f8856c7e0bec2aecd031f

-- 
View it on GitLab: https://salsa.debian.org/debian-gis-team/pyosmium/-/compare/7979fa83ed57f38ae1dc2aad9f8d9b07035db92e...6bf35319c978456f244f8856c7e0bec2aecd031f
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/pkg-grass-devel/attachments/20200229/6d397d19/attachment-0001.html>


More information about the Pkg-grass-devel mailing list