[Git][debian-gis-team/pyosmium][buster-backports] 13 commits: Bump Standards-Version to 4.4.1, no changes.
Bas Couwenberg
gitlab at salsa.debian.org
Fri Mar 6 05:43:14 GMT 2020
Bas Couwenberg pushed to branch buster-backports at Debian GIS Project / pyosmium
Commits:
f544c343 by Bas Couwenberg at 2019-09-30T19:47:51+02:00
Bump Standards-Version to 4.4.1, no changes.
- - - - -
111120d3 by Bas Couwenberg at 2019-11-07T18:40:54+01:00
Drop Provides: ${python3:Provides}.
- - - - -
4acf2953 by Bas Couwenberg at 2019-12-03T05:57:15+01:00
Reduce build dependencies for cross building. (Closes: #946012)
+ Move sphinx dependencies to Build-Depends-Indep.
+ Annotate test dependencies with <!nocheck>.
- - - - -
c128d537 by Bas Couwenberg at 2019-12-03T06:17:48+01:00
Mark python3-shapely with <!nocheck> as well.
- - - - -
410ff97a by Bas Couwenberg at 2019-12-09T09:35:07+01:00
Drop Name field from upstream metadata.
- - - - -
7979fa83 by Bas Couwenberg at 2020-01-25T11:03:19+01:00
Bump Standards-Version to 4.5.0, no changes.
- - - - -
d4ca3739 by Bas Couwenberg at 2020-02-29T19:36:49+01:00
New upstream version 2.15.4
- - - - -
0e776abb by Bas Couwenberg at 2020-02-29T19:36:53+01:00
Update upstream source from tag 'upstream/2.15.4'
Update to upstream version '2.15.4'
with Debian dir 81a41c215e0fe74d2d03e9e0511a066778b8e5b5
- - - - -
0cd37b2f by Bas Couwenberg at 2020-02-29T19:38:18+01:00
New upstream release.
- - - - -
918c0236 by Bas Couwenberg at 2020-02-29T19:39:20+01:00
Bump minimum required libosmium2-dev to 2.15.4.
- - - - -
6bf35319 by Bas Couwenberg at 2020-02-29T19:39:30+01:00
Set distribution to unstable.
- - - - -
9df66390 by Bas Couwenberg at 2020-03-05T17:24:02+01:00
Merge tag 'debian/2.15.4-1' into buster-backports
releasing package pyosmium version 2.15.4-1
- - - - -
badf7981 by Bas Couwenberg at 2020-03-05T17:24:10+01:00
Rebuild for buster-backports.
- - - - -
22 changed files:
- .travis.yml
- CHANGELOG.md
- CMakeLists.txt
- README.md
- debian/changelog
- debian/control
- debian/upstream/metadata
- doc/conf.py
- doc/index.rst
- doc/intro.rst
- doc/tools.rst
- + doc/troubleshooting.rst
- + doc/updating_osm_data.rst
- lib/osm.cc
- lib/simple_handler.h
- lib/write_handler.cc
- src/osmium/replication/server.py
- src/osmium/version.py
- + test/test_dangling_references.py
- test/test_pyosmium_get_changes.py
- test/test_replication.py
- tools/pyosmium-get-changes
Changes:
=====================================
.travis.yml
=====================================
@@ -57,7 +57,7 @@ install:
- git clone --quiet --depth 1 https://github.com/mapbox/protozero.git contrib/protozero
- git clone --quiet --depth 1 https://github.com/pybind/pybind11.git contrib/pybind11
- if [ "$TRAVIS_OS_NAME" = 'osx' ]; then
- pip${USE_PYTHON_VERSION} install -q nose mock shapely;
+ pip${USE_PYTHON_VERSION} install --user -q nose mock shapely;
fi
script:
=====================================
CHANGELOG.md
=====================================
@@ -4,6 +4,23 @@
All notable changes to this project will be documented in this file.
This project adheres to [Semantic Versioning](http://semver.org/).
+## [2.15.4] - 2020-02-29
+
+### Added
+
+- pyosmium-get-changes: allow to pipe updates to stdout
+- doc: add more information about file updates
+
+### Changed
+
+- check for dangling references in callbacks
+- use a custom HTTP user agent when requesting diffs
+- use current libosmium
+
+### Fixed
+
+- replication: retry downloading truncated state files
+
## [2.15.3] - 2019-08-16
### Added
=====================================
CMakeLists.txt
=====================================
@@ -19,7 +19,7 @@ else()
find_package(pybind11 2.2 REQUIRED)
endif()
-find_package(Boost 1.55 REQUIRED COMPONENTS)
+find_package(Boost 1.41 REQUIRED COMPONENTS)
include_directories(SYSTEM ${Boost_INCLUDE_DIRS})
function(set_module_output module outdir)
=====================================
README.md
=====================================
@@ -12,6 +12,7 @@ manner.
## Dependencies
Python >= 2.7 is supported but a version >= 3.3 is strongly recommended.
+Pypy is known to not work.
Other requirements are:
=====================================
debian/changelog
=====================================
@@ -1,3 +1,26 @@
+pyosmium (2.15.4-1~bpo10+1) buster-backports; urgency=medium
+
+ * Rebuild for buster-backports.
+
+ -- Bas Couwenberg <sebastic at debian.org> Thu, 05 Mar 2020 17:24:05 +0100
+
+pyosmium (2.15.4-1) unstable; urgency=medium
+
+ [ Bas Couwenberg ]
+ * New upstream release.
+ * Bump Standards-Version to 4.5.0, no changes.
+ * Drop Provides: ${python3:Provides}.
+ * Mark python3-shapely with <!nocheck> as well.
+ * Drop Name field from upstream metadata.
+ * Bump minimum required libosmium2-dev to 2.15.4.
+
+ [ Helmut Grohne ]
+ * Reduce build dependencies for cross building. (Closes: #946012)
+ + Move sphinx dependencies to Build-Depends-Indep.
+ + Annotate test dependencies with <!nocheck>.
+
+ -- Bas Couwenberg <sebastic at debian.org> Sat, 29 Feb 2020 19:39:21 +0100
+
pyosmium (2.15.3-1~bpo10+1) buster-backports; urgency=medium
* Rebuild for buster-backports.
=====================================
debian/control
=====================================
@@ -11,17 +11,18 @@ Build-Depends: cmake (>= 2.8.12),
libexpat1-dev,
libgdal-dev,
libgeos++-dev,
- libosmium2-dev (>= 2.15.2),
+ libosmium2-dev (>= 2.15.4),
libsparsehash-dev,
pybind11-dev,
python3-all-dev,
python3-setuptools,
- python3-mock,
- python3-nose,
- python3-shapely,
+ python3-mock <!nocheck>,
+ python3-nose <!nocheck>,
+ python3-shapely <!nocheck>
+Build-Depends-Indep:
python3-sphinx,
python3-sphinxcontrib.autoprogram
-Standards-Version: 4.4.0
+Standards-Version: 4.5.0
Vcs-Browser: https://salsa.debian.org/debian-gis-team/pyosmium/
Vcs-Git: https://salsa.debian.org/debian-gis-team/pyosmium.git -b buster-backports
Homepage: https://osmcode.org/pyosmium/
@@ -49,7 +50,6 @@ Section: python
Depends: ${python3:Depends},
${shlibs:Depends},
${misc:Depends}
-Provides: ${python3:Provides}
Suggests: python3-shapely,
pyosmium-doc
Description: Osmium library bindings for Python 3
=====================================
debian/upstream/metadata
=====================================
@@ -2,6 +2,5 @@
Bug-Database: https://github.com/osmcode/pyosmium/issues
Bug-Submit: https://github.com/osmcode/pyosmium/issues/new
Contact: Osmium Developers (https://osmcode.org/contact)
-Name: PyOsmium
Repository: https://github.com/osmcode/pyosmium.git
Repository-Browse: https://github.com/osmcode/pyosmium
=====================================
doc/conf.py
=====================================
@@ -67,7 +67,7 @@ master_doc = 'index'
# General information about the project.
project = 'Pyosmium'
-copyright = '2015-2017, Sarah Hoffmann'
+copyright = '2015-2020, Sarah Hoffmann'
# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
=====================================
doc/index.rst
=====================================
@@ -8,7 +8,7 @@ Welcome to Pyosmium's documentation!
Pyosmium is a library to process OSM files in different formats. It is
a wrapper of the C++ library `osmium <http://osmcode.org/libosmium/>`_
-and profits from its fast implementation.
+and allows fast and efficent sequential processing of OpenStreetMap data.
.. toctree::
:maxdepth: 2
@@ -16,6 +16,7 @@ and profits from its fast implementation.
intro
reference
tools
+ troubleshooting
* :ref:`genindex`
=====================================
doc/intro.rst
=====================================
@@ -68,11 +68,11 @@ or relations, so handler functions for all three types need to be implemented::
class HotelCounterHandler(osmium.SimpleHandler):
def __init__(self):
- osmium.SimpleHandler.__init__(self)
+ super(HotelCounterHandler, self).__init__()
self.num_nodes = 0
def count_hotel(self, tags):
- if tags['tourism'] == 'hotel':
+ if tags.get('tourism') == 'hotel':
self.num_nodes += 1
def node(self, n):
@@ -90,11 +90,72 @@ listed in :py:class:`osmium.osm.OSMObject`, and some that are specific to
each type. As all objects have tags, it is possible to reuse the same
implementation for all types. The main function remains the same.
-It is important to remember that the object
-references that are handed to the handler are only temporary. They will
-become invalid as soon as the function returns. Handler functions *must*
-copy any data that should be kept for later use into their own data
-structures. This also includes attributes like tag lists.
+.. _intro-copying-data-from-object:
+
+Collecting data from an OSM file
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Let's say that we do not only want to count the hotels in the file but
+we want to print their names in alphabetical order. For simplicity, lets
+restrict outself to nodes tagged as hotels. A naive implementation
+might want to simply collect all hotels and then print their names::
+
+
+ class HotelHandler(osmium.SimpleHandler):
+ def __init__(self):
+ super(HotelHandler, self).__init__()
+ self.hotels = []
+
+ def node(self, o):
+ if o.tags.get('tourism') == 'hotel':
+ self.hotels.append(o) # THIS IS WRONG!
+
+
+ h = HotelHandler()
+ h.apply_file(some_file)
+
+ hotel_names = []
+ for o in h.hotels:
+ if 'name' in o.tags:
+ self.hotels.append(o.tags['name'])
+
+ print(sorted(hotel_names))
+
+If you try to execute this, then python will immediately return with a
+Runtime error::
+
+ RuntimeError: Node callback keeps reference to OSM object. This is not allowed.
+
+The object references that are handed to the handler are only temporary.
+Osmium reads the object from the file, gives them to the handler function
+and then discards them to free the memory. If you keep a reference
+after the handler function returns, it points to invalid memory. Pyosmium
+does not allow that and throws the runtime error above. If you want to keep
+data for later use *the data must be copied out*.
+
+For the example, with the list of hotels, we only need to keep the name
+of each hotel. So a correct implementation is::
+
+ class HotelHandler(osmium.SimpleHandler):
+ def __init__(self):
+ super(HotelHandler, self).__init__()
+ self.hotels = []
+
+ def node(self, o):
+ if o.tags.get('tourism') == 'hotel' and 'name' in o.tags:
+ self.hotels.append(o.tags['name'])
+
+
+ h = HotelHandler()
+ h.apply_file(some_file)
+
+ print(sorted(h.hotels))
+
+Not only the object itself is a temporary reference. Also the tags, node and
+member lists must be copied, when they need to be store. As a general rule,
+it is good practise to store as little information as possible. In the example
+above, we could have stored the tags of all objects and then done the filtering
+later but that would need much more memory.
Handling Geometries
^^^^^^^^^^^^^^^^^^^
=====================================
doc/tools.rst
=====================================
@@ -6,5 +6,6 @@ Pyosmium comes with a couple of scripts for handling change files:
.. toctree::
:maxdepth: 1
+ updating_osm_data
tools_get_changes
tools_uptodate
=====================================
doc/troubleshooting.rst
=====================================
@@ -0,0 +1,24 @@
+Troubleshooting
+===============
+
+``RuntimeError: callback keeps reference to OSM object``
+--------------------------------------------------------
+
+One of your callbacks tries to store the OSM object outside the scope of
+the function. This is not allowed because for performance reasons, Osmium
+gives you only a temporary view of the data. You must make a (deep) copy of all
+data that you want to use later outside of the callback. See also
+:ref:`intro-copying-data-from-object`.
+
+Segfault when importing another library
+---------------------------------------
+
+There have been cases reported where pyosmium does not play well with other
+python libraries that are compiled. If you see a segmentation fault when
+importing pyosmium together with other libraries, try installing the
+source code version of pyosmium. This can be done with pip::
+
+ pip install --no-binary :all: osmium
+
+You need to first install the depencies listed in the README.
+
=====================================
doc/updating_osm_data.rst
=====================================
@@ -0,0 +1,195 @@
+Updating OpenStreetMap data from change files
+=============================================
+
+OpenStreetMap is a database that is constantly extended and updated. When you
+download the planet or an extract of it, you only get a snapshot of the
+database at a given point in time. To keep up-to-date with the development
+of OSM, you either need to download a new snapshot or you can update your
+existing data from change files published along with the planet file.
+Pyosmium ships with two tools that help you to process change files:
+`pyosmium-get-changes` and `pyosmium-up-to-date`.
+
+This section explains the basics of OSM change files and how to use Pyosmium's
+tools to keep your data up to date.
+
+About change files
+------------------
+
+Regular `change files <https://wiki.openstreetmap.org/wiki/Planet.osm/diffs>`_
+are published for the planet and also by some extract services. These
+change files are special OSM data files containing all changes to the database
+in a regular interval. Change files are not referentially complete. That means
+that they only contain OSM objects that have changed but not necessarily
+all the objects that are referenced by the changed objects. The result is
+that change file are rarely useful on their own. But they can be used
+to update an existing snapshot of OSM data.
+
+Getting change files
+--------------------
+
+There are multiple sources for OSM change files available:
+
+ * https://planet.openstreetmap.org/replication is the official source
+ for planet-wide updates. There are change files for
+ minutely, hourly and daily intervals available.
+
+ * `Geofabrik <http://download.geofabrik.de>`_ offers daily change files
+ for all its updates. See the extract page for a link to the replication URL.
+ Note that change files go only about 3 months back. Older files are deleted.
+
+ * `openstreetmap.fr <http://download.geofabrik.de>`_ offers minutely change
+ files for all its extracts.
+
+For other services also check out the list of services on the
+`OSM wiki <https://wiki.openstreetmap.org/wiki/Planet.osm>`_.
+
+Updating a planet or extract
+----------------------------
+
+If you have downloaded the planet or an extract with a replication service,
+then updating your OSM file can be as easy as::
+
+ pyosmium-up-to-date <osmfile.osm.pbf>
+
+This finds the right replication source and file to start with, downloads
+changes and updates the given file with the data. You can repeat this command
+whenever you want to have newer data. The command automatically picks up at
+the same point where it left off after the previous update.
+
+Choosing the replication source
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+OSM files in PBF format are able to save the replication source and the
+current status on their own. If you want to switch the replication source
+or have a file that does not have the information, you need to bootstrap
+the update process and manually point `pyosmium-up-to-date` to the right
+service::
+
+ pyosmium-up-to-date --ignore-osmosis-headers --server <replication URL> <osmfile.osm.pbf>
+
+`pyosmium-up-to-date` automatically finds the right sequence ID to use
+by looking at the age of the data in your OSM file. It updates the file
+and stores the new replication source in the file. The additional parameters
+are then not necessary anymore for subsequent updates.
+
+.. ATTENTION::
+ Always use the PBF format to store your data. Other format do not support
+ to save the replication information. pyosmium-up-to-date is still able to
+ update these kind of files if you manually point to the replication server
+ but the process is always more costly because it needs to find the right
+ starting point for updates first.
+
+Updating larger amounts of data
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+When used without any parameters, pyosmium downloads at a maximum about
+1GB of changes. That corresponds to about 3 days of planet-wide changes.
+You can increase the amount using the additional `--size` parameter::
+
+ pyosmium-up-to-date --size=10000 planet.osm.pbf
+
+This would download about 10GB or 30 days of change data. If your OSM data file is
+older than that, downloading the full file anew is likely going to be faster.
+
+`pyosmium-up-to-date` uses return codes to signal if it has downloaded all
+available updates. A return code of 0 means that it has downloaded and
+applied all available data. A return code of 1 indicates that it has applied
+some updates but more are available.
+
+A minimal script that updates a file until it is really up-to-date with the
+replcaition source would look like this::
+
+ status=1 # we wnat more data
+ while [ $status -eq 1 ]; do
+ pyosmium-up-to-date planet.osm.pbf
+ # save the return code
+ status=$?
+ done
+
+Creating change files for updating databases
+--------------------------------------------
+
+There are quite a few tools that can import OSM data into databases, for
+example osm2pgsql, imposm or Nominatim. These tools often can use change files
+to keep their database up-to-date. pyosmium can be used to create the appropriate
+change files. This is slightly more involved than updating a file.
+
+Preparing the state file
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+Before downloading the updates, you need to find out, with which sequence
+number to start. The easiest way to remember your current status is to save
+the number in a file. Pyosmium can then read and update the file for you.
+
+Method 1: Starting from the import file
+"""""""""""""""""""""""""""""""""""""""
+
+If you still have the OSM file you used to set up your database, then
+create a state file as follows::
+
+ pyosmium-get-changes -O <osmfile.osm.obf> -f sequence.state -v
+
+Note that there is no output file yet. This creates a new file `sequence.state`
+with the sequence ID where updates should start and prints the URL of the
+replication service to use.
+
+Method 2: Starting from a date
+""""""""""""""""""""""""""""""
+
+If you do not have the original OSM file anymore, then a good strategy is to
+look for the date of the newest node in the database to find the snapshot date
+of your database. Find the highest node ID, then look up the date for version 1
+on the OSM website. For example the date for node 2367234 can be found at
+https://www.openstreetmap.org/api/0.6/node/23672341/1 Find and copy the
+`timestamp` field. Then create a state file using this date::
+
+ pyosmium-get-changes -D 2007-01-01T14:16:21Z -f sequence.state -v
+
+Also here, this creates a new file `sequence.state` with the sequence ID where
+updates should start and prints the URL of the replication service to use.
+
+Creating a change file
+^^^^^^^^^^^^^^^^^^^^^^
+
+Now you can create change files using the state::
+
+ pyosmium-get-changes --server <replication server> -f sequence.state -o newchange.osm.gz
+
+This downloads the latest changes from the server, saves them in the file
+`newchange.osm.gz` and updates your state file. `<replication server>` is the
+URL that was printed, when you set up the state file. The parameter can be
+omitted when you use minutely change files from openstreetmap.org.
+
+`pyosmium-get-changes` loads only about 100MB worth of updates at once (about
+8 hours of planet updates). If you want more, then add a `--size` parameter.
+
+Continuously updating a database
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+`pyosmium-get-changes` emits special return codes that can be used to set
+up a script that continuously fetches updates and applies them to a
+database. The important error codes are:
+
+ * 0 - changes successfully downloaded and new change file created
+ * 3 - no new changes are available from the server
+
+All other error codes indicate fatal errors.
+
+A simple shell script can look like this::
+
+ while true; do
+ # get the next batch of changes
+ pyosmium-get-changes -f sequence.state -o newchange.osm.gz
+ # save the return code
+ status=$?
+
+ if [ $status -eq 0 ]; then
+ # apply newchange.osm.gz here
+ ....
+ elif [ $status -eq 3 ]; then
+ # No new data, so sleep for a bit
+ sleep 60
+ else
+ echo "Fatal error, stopping updates."
+ exit $status
+ done
=====================================
lib/osm.cc
=====================================
@@ -363,7 +363,9 @@ PYBIND11_MODULE(_osm, m) {
a.cend<osmium::OuterRing>()); },
py::keep_alive<0, 1>(),
"Return an iterator over all outer rings of the multipolygon.")
- .def("inner_rings", &osmium::Area::inner_rings, py::arg("outer_ring"),
+ .def("inner_rings", &osmium::Area::inner_rings,
+ py::keep_alive<0, 1>(),
+ py::arg("outer_ring"),
"Return an iterator over all inner rings of the multipolygon.")
;
=====================================
lib/simple_handler.h
=====================================
@@ -71,15 +71,15 @@ public:
osmium::osm_entity_bits::type enabled_callbacks() override
{
auto callbacks = osmium::osm_entity_bits::nothing;
- if (hasfunc("node"))
+ if (callback("node"))
callbacks |= osmium::osm_entity_bits::node;
- if (hasfunc("way"))
+ if (callback("way"))
callbacks |= osmium::osm_entity_bits::way;
- if (hasfunc("relation"))
+ if (callback("relation"))
callbacks |= osmium::osm_entity_bits::relation;
- if (hasfunc("area"))
+ if (callback("area"))
callbacks |= osmium::osm_entity_bits::area;
- if (hasfunc("changeset"))
+ if (callback("changeset"))
callbacks |= osmium::osm_entity_bits::changeset;
return callbacks;
@@ -89,36 +89,76 @@ public:
void node(osmium::Node const *n) override
{
pybind11::gil_scoped_acquire acquire;
- PYBIND11_OVERLOAD(void, SimpleHandler, node, n);
+ auto func = callback("node");
+ if (func) {
+ auto obj = pybind11::cast(n, pybind11::return_value_policy::reference);
+
+ func(obj);
+
+ if (obj.ref_count() != 1)
+ throw std::runtime_error("Node callback keeps reference to OSM object. This is not allowed.");
+ }
}
void way(osmium::Way const *w) override
{
pybind11::gil_scoped_acquire acquire;
- PYBIND11_OVERLOAD(void, SimpleHandler, way, w);
+ auto func = callback("way");
+ if (func) {
+ auto obj = pybind11::cast(w, pybind11::return_value_policy::reference);
+
+ func(obj);
+
+ if (obj.ref_count() != 1)
+ throw std::runtime_error("Way callback keeps reference to OSM object. This is not allowed.");
+ }
}
void relation(osmium::Relation const *r) override
{
pybind11::gil_scoped_acquire acquire;
- PYBIND11_OVERLOAD(void, SimpleHandler, relation, r);
+ auto func = callback("relation");
+ if (func) {
+ auto obj = pybind11::cast(r, pybind11::return_value_policy::reference);
+
+ func(obj);
+
+ if (obj.ref_count() != 1)
+ throw std::runtime_error("Relation callback keeps reference to OSM object. This is not allowed.");
+ }
}
void changeset(osmium::Changeset const *c) override
{
pybind11::gil_scoped_acquire acquire;
- PYBIND11_OVERLOAD(void, SimpleHandler, changeset, c);
+ auto func = callback("changeset");
+ if (func) {
+ auto obj = pybind11::cast(c, pybind11::return_value_policy::reference);
+
+ func(obj);
+
+ if (obj.ref_count() != 1)
+ throw std::runtime_error("Changeset callback keeps reference to OSM object. This is not allowed.");
+ }
}
void area(osmium::Area const *a) override
{
pybind11::gil_scoped_acquire acquire;
- PYBIND11_OVERLOAD(void, SimpleHandler, area, a);
+ auto func = callback("area");
+ if (func) {
+ auto obj = pybind11::cast(a, pybind11::return_value_policy::reference);
+
+ func(obj);
+
+ if (obj.ref_count() != 1)
+ throw std::runtime_error("Area callback keeps reference to OSM object. This is not allowed.");
+ }
}
private:
- bool hasfunc(char const *name)
- { return (bool)pybind11::get_overload(static_cast<SimpleHandler const *>(this), name); }
+ pybind11::function callback(char const *name)
+ { return pybind11::get_overload(static_cast<SimpleHandler const *>(this), name); }
};
#endif // PYOSMIUM_SIMPLE_HANDLER_HPP
=====================================
lib/write_handler.cc
=====================================
@@ -19,6 +19,13 @@ public:
osmium::memory::Buffer::auto_grow::yes)
{}
+ WriteHandler(const char* filename, size_t bufsz,
+ const std::string& filetype)
+ : writer(osmium::io::File(filename, filetype)),
+ buffer(bufsz < 2 * BUFFER_WRAP ? 2 * BUFFER_WRAP : bufsz,
+ osmium::memory::Buffer::auto_grow::yes)
+ {}
+
virtual ~WriteHandler()
{ close(); }
@@ -82,13 +89,17 @@ void init_write_handler(pybind11::module &m)
py::class_<WriteHandler, BaseHandler>(m, "WriteHandler",
"Handler function that writes all data directly to a file."
"The handler takes a file name as its mandatory parameter. The file "
- "must not yet exist. The file type to output is determined from the "
- "file extension. "
+ "must not yet exist. If '-' is given, then stdout is used. "
"The second (optional) parameter is the buffer size. osmium caches the "
"output data in an internal memory buffer before writing it on disk. This "
"parameter allows changing the default buffer size of 4MB. Larger buffers "
"are normally better but you should be aware that there are normally multiple "
- "buffers in use during the write process.")
+ "buffers in use during the write process."
+ "The third (optional) parameter defines the file type. Normally this "
+ "can be omitted because osmium determines the file type directly from "
+ "the filename. Only when stdout is used, then the parameter is "
+ "mandatory.")
+ .def(py::init<const char*, unsigned long, const char*>())
.def(py::init<const char*, unsigned long>())
.def(py::init<const char*>())
.def("close", &WriteHandler::close,
=====================================
src/osmium/replication/server.py
=====================================
@@ -15,6 +15,7 @@ from collections import namedtuple
from math import ceil
from osmium import MergeInputReader
from osmium import io as oio
+from osmium import version
import logging
@@ -34,6 +35,10 @@ class ReplicationServer(object):
self.baseurl = url
self.diff_type = diff_type
+ def make_request(self, url):
+ headers = {"User-Agent" : "pyosmium/{}".format(version.pyosmium_release)}
+ return urlrequest.Request(url, headers=headers)
+
def open_url(self, url):
""" Download a resource from the given URL and return a byte sequence
of the content.
@@ -266,39 +271,48 @@ class ReplicationServer(object):
return lower.sequence
- def get_state_info(self, seq=None):
+ def get_state_info(self, seq=None, retries=2):
""" Downloads and returns the state information for the given
sequence. If the download is successful, a namedtuple with
`sequence` and `timestamp` is returned, otherwise the function
- returns `None`.
+ returns `None`. `retries` sets the number of times the download
+ is retried when pyosmium detects a truncated state file.
"""
- try:
- response = self.open_url(self.get_state_url(seq))
- except Exception as err:
- return None
+ for retry in range(retries + 1):
+ try:
+ response = self.open_url(self.make_request(self.get_state_url(seq)))
+ except Exception as err:
+ log.debug("Loading state info {} failed with: {}".format(seq, str(err)))
+ return None
- ts = None
- seq = None
- line = response.readline()
- while line:
- line = line.decode('utf-8')
- if '#' in line:
- line = line[0:line.index('#')]
- else:
- line = line.strip()
- if line:
- kv = line.split('=', 2)
- if len(kv) != 2:
- return None
- if kv[0] == 'sequenceNumber':
- seq = int(kv[1])
- elif kv[0] == 'timestamp':
- ts = dt.datetime.strptime(kv[1], "%Y-%m-%dT%H\\:%M\\:%SZ")
- if sys.version_info >= (3,0):
- ts = ts.replace(tzinfo=dt.timezone.utc)
+ ts = None
+ seq = None
line = response.readline()
-
- return OsmosisState(sequence=seq, timestamp=ts)
+ while line:
+ line = line.decode('utf-8')
+ if '#' in line:
+ line = line[0:line.index('#')]
+ else:
+ line = line.strip()
+ if line:
+ kv = line.split('=', 2)
+ if len(kv) != 2:
+ return None
+ if kv[0] == 'sequenceNumber':
+ seq = int(kv[1])
+ elif kv[0] == 'timestamp':
+ try:
+ ts = dt.datetime.strptime(kv[1], "%Y-%m-%dT%H\\:%M\\:%SZ")
+ except ValueError:
+ break
+ if sys.version_info >= (3,0):
+ ts = ts.replace(tzinfo=dt.timezone.utc)
+ line = response.readline()
+
+ if ts is not None and seq is not None:
+ return OsmosisState(sequence=seq, timestamp=ts)
+
+ return None
def get_diff_block(self, seq):
""" Downloads the diff with the given sequence number and returns
@@ -306,7 +320,7 @@ class ReplicationServer(object):
(or :code:`urllib2.HTTPError` in python2)
if the file cannot be downloaded.
"""
- return self.open_url(self.get_diff_url(seq)).read()
+ return self.open_url(self.make_request(self.get_diff_url(seq))).read()
def get_state_url(self, seq):
=====================================
src/osmium/version.py
=====================================
@@ -5,11 +5,11 @@ Version information.
# the major version
pyosmium_major = '2.15'
# current release (Pip version)
-pyosmium_release = '2.15.3'
+pyosmium_release = '2.15.4'
# libosmium version shipped with the Pip release
-libosmium_version = '2.15.2'
+libosmium_version = '2.15.4'
# protozero version shipped with the Pip release
protozero_version = '1.6.8'
# pybind11 version shipped with the Pip release
-pybind11_version = '2.3.0'
+pybind11_version = '2.4.3'
=====================================
test/test_dangling_references.py
=====================================
@@ -0,0 +1,188 @@
+# vim: set fileencoding=utf-8 :
+from nose.tools import *
+import unittest
+from sys import version_info as python_version
+
+from helpers import create_osm_file
+
+import osmium as o
+
+class DanglingReferenceBase(object):
+ """ Base class for tests that try to keep a reference to the object
+ that was handed into the callback. We expect that the handler
+ bails out with a runtime error in such a case.
+ """
+
+
+ node = None
+ way = None
+ relation = None
+ area = None
+ refkeeper = []
+
+ def keep(self, obj):
+ self.refkeeper.append(obj)
+
+ def test_keep_reference(self):
+ h = o.make_simple_handler(node=self.node, way=self.way,
+ relation=self.relation, area=self.area)
+ if python_version < (3,0):
+ with self.assertRaisesRegexp(RuntimeError, "callback keeps reference"):
+ h.apply_file('example-test.pbf')
+ else:
+ with self.assertRaisesRegex(RuntimeError, "callback keeps reference"):
+ h.apply_file('example-test.pbf')
+ assert_greater(len(self.refkeeper), 0)
+ while len(self.refkeeper) > 0:
+ self.refkeeper.pop()
+# self.refkeeper.clear()
+
+
+class TestKeepNodeRef(DanglingReferenceBase, unittest.TestCase):
+
+ def node(self, n):
+ self.keep(n)
+
+class TestKeepWayRef(DanglingReferenceBase, unittest.TestCase):
+
+ def way(self, w):
+ self.keep(w)
+
+class TestKeepRelationRef(DanglingReferenceBase, unittest.TestCase):
+
+ def relation(self, r):
+ self.keep(r)
+
+class TestKeepAreaRef(DanglingReferenceBase, unittest.TestCase):
+
+ def area(self, a):
+ self.keep(a)
+
+class TestKeepNodeTagsRef(DanglingReferenceBase, unittest.TestCase):
+
+ def node(self, n):
+ self.keep(n.tags)
+
+class TestKeepWayTagsRef(DanglingReferenceBase, unittest.TestCase):
+
+ def way(self, w):
+ self.keep(w.tags)
+
+class TestKeepRelationTagsRef(DanglingReferenceBase, unittest.TestCase):
+
+ def relation(self, r):
+ self.keep(r.tags)
+
+class TestKeepAreaTagsRef(DanglingReferenceBase, unittest.TestCase):
+
+ def area(self, a):
+ self.keep(a.tags)
+
+class TestKeepTagListIterator(DanglingReferenceBase, unittest.TestCase):
+
+ def node(self, n):
+ self.keep(n.tags.__iter__())
+
+class TestKeepSingleTag(DanglingReferenceBase, unittest.TestCase):
+
+ def node(self, n):
+ for t in n.tags:
+ self.keep(t)
+
+class TestKeepOuterRingIterator(DanglingReferenceBase, unittest.TestCase):
+
+ def area(self, r):
+ self.keep(r.outer_rings())
+
+class TestKeepOuterRing(DanglingReferenceBase, unittest.TestCase):
+
+ def area(self, r):
+ for ring in r.outer_rings():
+ self.keep(ring)
+
+class TestKeepInnerRingIterator(DanglingReferenceBase, unittest.TestCase):
+
+ def area(self, r):
+ for ring in r.outer_rings():
+ self.keep(r.inner_rings(ring))
+
+class TestKeepInnerRing(DanglingReferenceBase, unittest.TestCase):
+
+ def area(self, r):
+ for outer in r.outer_rings():
+ for inner in r.inner_rings(outer):
+ self.keep(inner)
+
+class TestKeepRelationMemberIterator(DanglingReferenceBase, unittest.TestCase):
+
+ def relation(self, r):
+ self.keep(r.members)
+
+class TestKeepRelationMember(DanglingReferenceBase, unittest.TestCase):
+
+ def relation(self, r):
+ for m in r.members:
+ self.keep(m)
+
+
+class NotADanglingReferenceBase(object):
+ """ Base class for tests that ensure that the callback does not
+ bail out because of dangling references when POD types are
+ kept.
+ """
+
+ node = None
+ way = None
+ relation = None
+ area = None
+ refkeeper = []
+
+ def keep(self, obj):
+ self.refkeeper.append(obj)
+
+ def test_keep_reference(self):
+ h = o.make_simple_handler(node=self.node, way=self.way,
+ relation=self.relation, area=self.area)
+ # Does not rise a dangling reference excpetion
+ h.apply_file('example-test.pbf')
+ assert_greater(len(self.refkeeper), 0)
+ #self.refkeeper.clear()
+ while len(self.refkeeper) > 0:
+ self.refkeeper.pop()
+
+class TestKeepId(NotADanglingReferenceBase, unittest.TestCase):
+
+ def node(self, n):
+ self.keep(n.id)
+
+class TestKeepChangeset(NotADanglingReferenceBase, unittest.TestCase):
+
+ def node(self, n):
+ self.keep(n.changeset)
+
+class TestKeepUid(NotADanglingReferenceBase, unittest.TestCase):
+
+ def node(self, n):
+ self.keep(n.uid)
+
+class TestKeepUser(NotADanglingReferenceBase, unittest.TestCase):
+
+ def node(self, n):
+ self.keep(n.user)
+
+class TestKeepLocation(NotADanglingReferenceBase, unittest.TestCase):
+
+ def node(self, n):
+ self.keep(n.location)
+
+class TestKeepKey(NotADanglingReferenceBase, unittest.TestCase):
+
+ def node(self, n):
+ for t in n.tags:
+ self.keep(t.k)
+
+class TestKeepValue(NotADanglingReferenceBase, unittest.TestCase):
+
+ def node(self, n):
+ for t in n.tags:
+ self.keep(t.v)
=====================================
test/test_pyosmium_get_changes.py
=====================================
@@ -44,7 +44,7 @@ class TestPyosmiumGetChanges(unittest.TestCase):
"../../tools/pyosmium-get-changes"))
self.url_mock = MagicMock()
self.urls = dict()
- self.url_mock.side_effect = lambda url : self.urls[url]
+ self.url_mock.side_effect = lambda url : self.urls[url.get_full_url()]
self.script['rserv'].urlrequest.urlopen = self.url_mock
def url(self, url, result):
=====================================
test/test_replication.py
=====================================
@@ -77,6 +77,99 @@ def test_get_state_valid(mock):
assert_equal(mock.call_count, 1)
+ at patch('osmium.replication.server.urlrequest.urlopen', new_callable=UrllibMock)
+def test_get_state_sequence_cut(mock):
+ mock.set_script(("""\
+ #Sat Aug 26 11:04:04 UTC 2017
+ txnMaxQueried=1219304113
+ sequenceNumber=259""",
+ """\
+ #Sat Aug 26 11:04:04 UTC 2017
+ txnMaxQueried=1219304113
+ sequenceNumber=2594669
+ timestamp=2017-08-26T11\:04\:02Z"""))
+
+ res = rserv.ReplicationServer("http://test.io").get_state_info()
+
+ assert_is_not_none(res)
+ assert_equals(res.timestamp, mkdate(2017, 8, 26, 11, 4, 2))
+ assert_equals(res.sequence, 2594669)
+
+ assert_equal(mock.call_count, 2)
+
+ at patch('osmium.replication.server.urlrequest.urlopen', new_callable=UrllibMock)
+def test_get_state_date_cut(mock):
+ mock.set_script(("""\
+ #Sat Aug 26 11:04:04 UTC 2017
+ txnMaxQueried=1219304113
+ sequenceNumber=2594669
+ timestamp=2017-08-2""",
+ """\
+ #Sat Aug 26 11:04:04 UTC 2017
+ txnMaxQueried=1219304113
+ sequenceNumber=2594669
+ timestamp=2017-08-26T11\:04\:02Z"""))
+
+ res = rserv.ReplicationServer("http://test.io").get_state_info()
+
+ assert_is_not_none(res)
+ assert_equals(res.timestamp, mkdate(2017, 8, 26, 11, 4, 2))
+ assert_equals(res.sequence, 2594669)
+
+ assert_equal(mock.call_count, 2)
+
+ at patch('osmium.replication.server.urlrequest.urlopen', new_callable=UrllibMock)
+def test_get_state_timestamp_cut(mock):
+ mock.set_script(("""\
+ #Sat Aug 26 11:04:04 UTC 2017
+ txnMaxQueried=1219304113
+ sequenceNumber=2594669
+ timestamp=""",
+ """\
+ #Sat Aug 26 11:04:04 UTC 2017
+ txnMaxQueried=1219304113
+ sequenceNumber=2594669
+ timestamp=2017-08-26T11\:04\:02Z"""))
+
+ res = rserv.ReplicationServer("http://test.io").get_state_info()
+
+ assert_is_not_none(res)
+ assert_equals(res.timestamp, mkdate(2017, 8, 26, 11, 4, 2))
+ assert_equals(res.sequence, 2594669)
+
+ assert_equal(mock.call_count, 2)
+
+ at patch('osmium.replication.server.urlrequest.urlopen', new_callable=UrllibMock)
+def test_get_state_too_many_retries(mock):
+ mock.set_script(("""\
+ #Sat Aug 26 11:04:04 UTC 2017
+ txnMaxQueried=1219304113
+ sequenceNumber=2594669
+ timestamp=""",
+ """\
+ #Sat Aug 26 11:04:04 UTC 2017
+ txnMaxQueried=1219304113
+ sequenceNumber=2594669
+ timestamp=""",
+ """\
+ #Sat Aug 26 11:04:04 UTC 2017
+ txnMaxQueried=1219304113
+ sequenceNumber=2594669
+ timestamp=""",
+ """\
+ #Sat Aug 26 11:04:04 UTC 2017
+ txnMaxQueried=1219304113
+ sequenceNumber=2594669
+ timestamp=2017-08-26T11\:04\:02Z"""))
+
+ res = rserv.ReplicationServer("http://test.io").get_state_info()
+
+ assert_is_none(res)
+
+ assert_equal(mock.call_count, 3)
+
+
+
@patch('osmium.replication.server.urlrequest.urlopen')
def test_get_state_server_timeout(mock):
mock.side_effect = URLError(reason='Mock')
=====================================
tools/pyosmium-get-changes
=====================================
@@ -15,6 +15,10 @@ On success, the program will print a single number to stdout, the sequence
number where to continue updates in the next run. This output can also be
written to (and later read from) a file.
+*Note:* you may pipe the diff also to standard output using '-o -'. Then
+the sequence number will not be printed. You must write it to a file in that
+case.
+
Some OSM data sources require a cookie to be sent with the HTTP requests.
pyosmium-get-changes does not fetch the cookie from these services for you.
However, it can read cookies from a Netscape-style cookie jar file, send these
@@ -132,6 +136,8 @@ def get_arg_parser(from_main=False):
parser.add_argument('-o', '--outfile', dest='outfile',
help=h("""Name of diff output file. If omitted, only the
sequence ID will be printed where updates would start."""))
+ parser.add_argument('--format', dest='outformat', metavar='FORMAT',
+ help="Format the data should be saved in.")
parser.add_argument('--server', action='store', dest='server_url',
help='Base URL of the replication server')
parser.add_argument('--cookie', dest='cookie',
@@ -172,6 +178,10 @@ def main(args):
log.setLevel(max(3 - options.loglevel, 0) * 10)
+ if options.outfile == '-' and options.outformat is None:
+ log.error("You must define a format when using stdout. See --format.")
+ return 1
+
if options.start_file is not None:
options.start = ReplicationStart.from_osm_file(options.start_file,
options.ignore_headers)
@@ -221,7 +231,10 @@ def main(args):
return 0
log.debug("Starting download at ID %d (max %d MB)" % (startseq, options.outsize))
- outhandler = WriteHandler(options.outfile)
+ if options.outformat is not None:
+ outhandler = WriteHandler(options.outfile, 4096*1024, options.outformat)
+ else:
+ outhandler = WriteHandler(options.outfile)
endseq = svr.apply_diffs(outhandler, startseq, max_size=options.outsize*1024,
simplify=options.simplify)
@@ -234,7 +247,8 @@ def main(args):
if endseq is None:
return 3
- write_end_sequence(options.seq_file, endseq)
+ if options.outfile != '-' or options.seq_file is not None:
+ write_end_sequence(options.seq_file, endseq)
return 0
View it on GitLab: https://salsa.debian.org/debian-gis-team/pyosmium/-/compare/1520bca199d127268a6c15d94f56bb3251097172...badf7981f6659bba7fcc36efb9c91f3064a5d188
--
View it on GitLab: https://salsa.debian.org/debian-gis-team/pyosmium/-/compare/1520bca199d127268a6c15d94f56bb3251097172...badf7981f6659bba7fcc36efb9c91f3064a5d188
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/pkg-grass-devel/attachments/20200306/af435040/attachment-0001.html>
More information about the Pkg-grass-devel
mailing list