[Git][debian-gis-team/pyosmium][upstream] New upstream version 4.2.0
Bas Couwenberg (@sebastic)
gitlab at salsa.debian.org
Tue Oct 21 16:59:46 BST 2025
Bas Couwenberg pushed to branch upstream at Debian GIS Project / pyosmium
Commits:
53bfe627 by Bas Couwenberg at 2025-10-21T17:40:04+02:00
New upstream version 4.2.0
- - - - -
16 changed files:
- .github/workflows/ci.yml
- .gitignore
- CHANGELOG.md
- README.md
- README.rst
- docs/Makefile
- + docs/man/pyosmium-get-changes.1
- + docs/man/pyosmium-up-to-date.1
- pyproject.toml
- src/osmium/replication/server.py
- + src/osmium/tools/common.py
- src/osmium/tools/pyosmium_get_changes.py
- src/osmium/tools/pyosmium_up_to_date.py
- test/test_pyosmium_get_changes.py
- + test/test_pyosmium_up-to-date.py
- test/test_replication.py
Changes:
=====================================
.github/workflows/ci.yml
=====================================
@@ -301,7 +301,7 @@ jobs:
deps: develop
flavour: linux
- compiler: macos-intel
- platform: macos-13
+ platform: macos-15-intel
python: "3.10"
deps: develop
flavour: macos
=====================================
.gitignore
=====================================
@@ -6,4 +6,3 @@ osmium.egg-info
.ycm_extra_conf.py
.ipynb_checkpoints
docs/data/out
-docs/man
=====================================
CHANGELOG.md
=====================================
@@ -4,6 +4,26 @@
All notable changes to this project will be documented in this file.
This project adheres to [Semantic Versioning](http://semver.org/).
+## [4.2.0] - 2025-10-21
+
+### Added
+
+- new 'end_id' parameter for diff processing functions
+- new end ID/date parameters for pyosmium tools
+
+### Fixed
+
+- restore packaging of README
+- use replace() instead of rename() to make overwriting planet files on Windows
+
+### Changed
+
+- pre-generate man pages for easier packaging
+- diff processing functions and tools now throw an Error when diffs are
+ requested that are older than then oldest available diff on the server
+- tools now error out, when the first diff download encounters a client
+ error (HTTP 4xx)
+
## [4.1.1] - 2025-08-31
### Fixed
=====================================
README.md
=====================================
@@ -122,13 +122,12 @@ or to few it locally, you can use:
mkdocs serve
-For building the man pages for the tools run:
+Pregenerated man pages for the tools are available in `docs/man`. For rebuilding
+the man pages run:
cd docs
make man
-The man pages can be found in docs/man.
-
## Bugs and Questions
If you find bugs or have feature requests, please report those in the
=====================================
README.rst
=====================================
@@ -24,7 +24,7 @@ development packages for these libraries. On Debian/Ubuntu do::
libexpat1-dev zlib1g-dev libbz2-dev
-Python >= 3.7 is supported. Pypy is known not to work.
+Python >= 3.8 is supported. Pypy is known not to work.
Documentation
=============
@@ -47,6 +47,10 @@ The package contains an `example` directory with small examples on how to use
the library. They are mostly ports of the examples in Libosmium and
osmium-contrib.
+Also check out the `Cookbook section`_ in the documentation.
+
+.. _Cookbook section: https://docs.osmcode.org/pyosmium/latest/cookbooks/
+
Fineprint
=========
=====================================
docs/Makefile
=====================================
@@ -2,7 +2,7 @@ ARGPARSE_BASEARGS=--author 'Sarah Hoffmann' --author-email 'lonvia at denofr.de' --
man:
mkdir -p man
- argparse-manpage --pyfile ../src/osmium/tools/pyosmium_get_changes.py --function get_arg_parser ${ARGPARSE_BASEARGS} --output man/pyosmium-get-changes.1
- argparse-manpage --pyfile ../src/osmium/tools/pyosmium_up_to_date.py --function get_arg_parser ${ARGPARSE_BASEARGS} --output man/pyosmium-up-to-date.1
+ argparse-manpage --module osmium.tools.pyosmium_get_changes --function get_arg_parser ${ARGPARSE_BASEARGS} --output man/pyosmium-get-changes.1
+ argparse-manpage --module osmium.tools.pyosmium_up_to_date --function get_arg_parser ${ARGPARSE_BASEARGS} --output man/pyosmium-up-to-date.1
.PHONY: man
=====================================
docs/man/pyosmium-get-changes.1
=====================================
@@ -0,0 +1,112 @@
+.TH PYOSMIUM\-GET\-CHANGES "1" "2025\-10\-05" "pyosmium" "Generated Python Manual"
+.SH NAME
+pyosmium\-get\-changes
+.SH SYNOPSIS
+.B pyosmium\-get\-changes
+[options]
+.SH DESCRIPTION
+Fetch diffs from an OSM planet server.
+
+The starting point of the diff must be given either as a sequence ID or a date
+or can be computed from an OSM file. If no output file is given, the program
+will just print the initial sequence ID it would use (or save it in a file, if
+requested) and exit. This can be used to bootstrap the update process.
+
+The program tries to download until the latest change on the server is found
+or the maximum requested diff size is reached. Note that diffs are kept in
+memory during download.
+
+On success, the program will print a single number to stdout, the sequence
+number where to continue updates in the next run. This output can also be
+written to (and later read from) a file.
+
+*Note:* you may pipe the diff also to standard output using '\-o \-'. Then
+the sequence number will not be printed. You must write it to a file in that
+case.
+
+Some OSM data sources require a cookie to be sent with the HTTP requests.
+pyosmium\-get\-changes does not fetch the cookie from these services for you.
+However, it can read cookies from a Netscape\-style cookie jar file, send these
+cookies to the server and will save received cookies to the jar file.
+
+.SH OPTIONS
+.TP
+\fB\-v\fR
+Increase verbosity (can be used multiple times)
+
+.TP
+\fB\-o\fR \fI\,OUTFILE\/\fR, \fB\-\-outfile\fR \fI\,OUTFILE\/\fR
+Name of diff output file. If omitted, only the sequence ID will be printed where updates would start.
+
+.TP
+\fB\-\-format\fR \fI\,FORMAT\/\fR
+Format the data should be saved in.
+
+.TP
+\fB\-\-server\fR \fI\,SERVER_URL\/\fR
+Base URL of the replication server
+
+.TP
+\fB\-\-diff\-type\fR \fI\,SERVER_DIFF_TYPE\/\fR
+File format used by the replication server (default: osc.gz)
+
+.TP
+\fB\-\-cookie\fR \fI\,COOKIE\/\fR
+Netscape\-style cookie jar file to read cookies from and where received cookies will be written to.
+
+.TP
+\fB\-s\fR \fI\,OUTSIZE\/\fR, \fB\-\-size\fR \fI\,OUTSIZE\/\fR
+Maximum data to load in MB (Defaults to 100MB when no end date/ID has been set).
+
+.TP
+\fB\-I\fR \fI\,ID\/\fR, \fB\-\-start\-id\fR \fI\,ID\/\fR
+Sequence ID to start with
+
+.TP
+\fB\-D\fR \fI\,DATE\/\fR, \fB\-\-start\-date\fR \fI\,DATE\/\fR
+Date when to start updates
+
+.TP
+\fB\-O\fR \fI\,OSMFILE\/\fR, \fB\-\-start\-osm\-data\fR \fI\,OSMFILE\/\fR
+start at the date of the newest OSM object in the file
+
+.TP
+\fB\-\-end\-id\fR \fI\,ID\/\fR
+Last sequence ID to download.
+
+.TP
+\fB\-E\fR \fI\,DATE\/\fR, \fB\-\-end\-date\fR \fI\,DATE\/\fR
+Do not download diffs later than the given date.
+
+.TP
+\fB\-f\fR \fI\,SEQ_FILE\/\fR, \fB\-\-sequence\-file\fR \fI\,SEQ_FILE\/\fR
+Sequence file. If the file exists, then updates will start after the id given in the file. At the end of the process, the last sequence ID contained in the diff is written.
+
+.TP
+\fB\-\-ignore\-osmosis\-headers\fR
+When determining the start from an OSM file, ignore potential replication information in the header and search for the newest OSM object.
+
+.TP
+\fB\-d\fR, \fB\-\-no\-deduplicate\fR
+Do not deduplicate diffs.
+
+.TP
+\fB\-\-socket\-timeout\fR \fI\,SOCKET_TIMEOUT\/\fR
+Set timeout for file downloads.
+
+.TP
+\fB\-\-version\fR
+show program's version number and exit
+
+.SH AUTHOR
+.nf
+Sarah Hoffmann
+.fi
+.nf
+lonvia at denofr.de
+.fi
+
+.SH DISTRIBUTION
+The latest version of pyosmium may be downloaded from
+.UR https://github.com/osmcode/pyosmium/
+.UE
=====================================
docs/man/pyosmium-up-to-date.1
=====================================
@@ -0,0 +1,114 @@
+.TH PYOSMIUM\-UP\-TO\-DATE "1" "2025\-10\-05" "pyosmium" "Generated Python Manual"
+.SH NAME
+pyosmium\-up\-to\-date
+.SH SYNOPSIS
+.B pyosmium\-up\-to\-date
+[options] <osm file>
+.SH DESCRIPTION
+Update an OSM file with changes from a OSM replication server.
+
+Diffs are downloaded and kept in memory. To avoid running out of memory,
+the maximum size of diffs that can be downloaded at once is limited
+to 1 GB per default. This corresponds to approximately 3 days of update.
+The limit can be changed with the \-\-size parameter. However, you should
+take into account that processing the files requires additional memory
+(about 1GB more).
+
+The starting time is automatically determined from the data in the file.
+For PBF files, it is also possible to read and write the replication
+information from the osmosis headers. That means that after the first update,
+subsequent calls to pyosmium\-up\-to\-date will continue the updates from the same
+server exactly where they have left of.
+
+This program can update normal OSM data files as well as OSM history files.
+It detects automatically on what type of file it is called.
+
+The program returns 0, if updates have been successfully applied up to
+the newest data or no new data was available. It returns 1, if some updates
+have been applied but there is still data available on the server (either
+because the size limit has been reached or there was a network error which
+could not be resolved). Any other error results in a return code larger than 1.
+The output file is guaranteed to be unmodified in that case.
+
+Some OSM data sources require a cookie to be sent with the HTTP requests.
+pyosmium\-up\-to\-date does not fetch the cookie from these services for you.
+However, it can read cookies from a Netscape\-style cookie jar file, send these
+cookies to the server and will save received cookies to the jar file.
+
+.TP
+\fB<osm file>\fR
+OSM file to update
+
+.SH OPTIONS
+.TP
+\fB\-v\fR
+Increase verbosity (can be used multiple times).
+
+.TP
+\fB\-o\fR \fI\,OUTFILE\/\fR, \fB\-\-outfile\fR \fI\,OUTFILE\/\fR
+Name output of file. If missing, the input file will be overwritten.
+
+.TP
+\fB\-\-format\fR \fI\,FORMAT\/\fR
+Format the data should be saved in. Usually determined from file name.
+
+.TP
+\fB\-\-server\fR \fI\,SERVER_URL\/\fR
+Base URL of the replication server. Default: https://planet.osm.org/replication/hour/ (hourly diffs from osm.org)
+
+.TP
+\fB\-\-diff\-type\fR \fI\,SERVER_DIFF_TYPE\/\fR
+File format used by the replication server (default: osc.gz)
+
+.TP
+\fB\-s\fR \fI\,SIZE\/\fR, \fB\-\-size\fR \fI\,SIZE\/\fR
+Maximum size of change to apply at once in MB. Defaults to 1GB when no end ID or date was given.
+
+.TP
+\fB\-\-end\-id\fR \fI\,ID\/\fR
+Last sequence ID to download.
+
+.TP
+\fB\-E\fR \fI\,DATE\/\fR, \fB\-\-end\-date\fR \fI\,DATE\/\fR
+Do not download diffs later than the given date.
+
+.TP
+\fB\-\-tmpdir\fR \fI\,TMPDIR\/\fR
+Directory to use for temporary files. Usually the directory of input file is used.
+
+.TP
+\fB\-\-ignore\-osmosis\-headers\fR
+Ignore potential replication information in the header of the input file and search for the newest OSM object in the file instead.
+
+.TP
+\fB\-b\fR \fI\,WIND_BACK\/\fR, \fB\-\-wind\-back\fR \fI\,WIND_BACK\/\fR
+Number of minutes to start downloading before the newest addition to input data. (Ignored when the file contains a sequence ID.) Default: 60
+
+.TP
+\fB\-\-force\-update\-of\-old\-planet\fR
+Apply update even if the input data is really old.
+
+.TP
+\fB\-\-cookie\fR \fI\,COOKIE\/\fR
+Netscape\-style cookie jar file to read cookies from and where received cookies will be written to.
+
+.TP
+\fB\-\-socket\-timeout\fR \fI\,SOCKET_TIMEOUT\/\fR
+Set timeout for file downloads.
+
+.TP
+\fB\-\-version\fR
+show program's version number and exit
+
+.SH AUTHOR
+.nf
+Sarah Hoffmann
+.fi
+.nf
+lonvia at denofr.de
+.fi
+
+.SH DISTRIBUTION
+The latest version of pyosmium may be downloaded from
+.UR https://github.com/osmcode/pyosmium/
+.UE
=====================================
pyproject.toml
=====================================
@@ -4,10 +4,10 @@ build-backend = "scikit_build_core.build"
[project]
name = "osmium"
-version = "4.1.1"
+version = "4.2.0"
description = "Python bindings for libosmium, the data processing library for OSM data"
requires-python = ">=3.8"
-
+readme = "README.rst"
license = {text = 'BSD-2-Clause'}
authors = [
{name = "Sarah Hoffmann", email = "lonvia at denofr.de"}
@@ -37,6 +37,12 @@ dependencies = [
"requests"
]
+[project.urls]
+Homepage = "https://osmcode.org/pyosmium"
+Documentation = "https://docs.osmcode.org/pyosmium/latest/"
+Repository = "https://github.com/osmcode/pyosmium"
+Issues = "https://github.com/osmcode/pyosmium/issues"
+
[project.optional-dependencies]
tests = [
'pytest',
@@ -95,3 +101,6 @@ include = ['/src/**/*.py',
'/contrib/protozero/LICENSE',
'/contrib/protozero/README.md',
]
+
+[tool.pytest.ini_options]
+log_cli = false
=====================================
src/osmium/replication/server.py
=====================================
@@ -6,7 +6,7 @@
# For a full list of authors see the git log.
""" Helper functions to communicate with replication servers.
"""
-from typing import NamedTuple, Optional, Any, Iterator, cast, Mapping, Tuple
+from typing import NamedTuple, Optional, Any, Iterator, cast, Mapping, Tuple, Dict
import urllib.request as urlrequest
from urllib.error import URLError
import datetime as dt
@@ -67,7 +67,7 @@ class ReplicationServer:
self.baseurl = url
self.diff_type = diff_type
- self.extra_request_params: dict[str, Any] = dict(timeout=60, stream=True)
+ self.extra_request_params: Dict[str, Any] = dict(timeout=60, stream=True)
self.session: Optional[requests.Session] = None
self.retry = Retry(total=3, backoff_factor=0.5, allowed_methods={'GET'},
status_forcelist=[408, 429, 500, 502, 503, 504])
@@ -125,59 +125,100 @@ class ReplicationServer:
return _get_url_with_session()
- def collect_diffs(self, start_id: int, max_size: int = 1024) -> Optional[DownloadResult]:
+ def collect_diffs(self, start_id: int, max_size: Optional[int] = None,
+ end_id: Optional[int] = None) -> Optional[DownloadResult]:
""" Create a MergeInputReader and download diffs starting with sequence
- id `start_id` into it. `max_size`
- restricts the number of diffs that are downloaded. The download
- stops as soon as either a diff cannot be downloaded or the
- unpacked data in memory exceeds `max_size` kB.
+ id `start_id` into it. `end_id` optionally gives the highest
+ sequence number to download. `max_size` restricts the number of
+ diffs that are downloaded by size. If neither `end_id` nor
+ `max_size` are given, then download default to stop after 1MB.
+
+ The download stops as soon as
+ 1. a diff cannot be downloaded or
+ 2. the end_id (inclusive) is reached or
+ 3. the unpacked data in memory exceeds `max_size` kB or,
+ when no `end_id` and `max_size` are given, 1024kB.
If some data was downloaded, returns a namedtuple with three fields:
`id` contains the sequence id of the last downloaded diff, `reader`
contains the MergeInputReader with the data and `newest` is a
sequence id of the most recent diff available.
- Returns None if there was an error during download or no new
- data was available.
- """
- left_size = max_size * 1024
- current_id = start_id
+ Returns None if there was no new data was available.
+ If there is an error during the download, then the function will
+ simply return the already downloaded data. If the reported
+ error is a client error (HTTP 4xx) and happens during the download
+ of the first diff, then a ::request.HTTPError:: is raised: this
+ condition is likely to be permanent and the caller should not
+ simply retry without investigating the cause.
+ """
# must not read data newer than the published sequence id
# or we might end up reading partial data
newest = self.get_state_info()
- if newest is None or current_id > newest.sequence:
+ if newest is None or start_id > newest.sequence:
return None
+ current_id = start_id
+ left_size: Optional[int] = None
+ if max_size is not None:
+ left_size = max_size * 1024
+ elif end_id is None:
+ left_size = 1024 * 1024
+
rd = MergeInputReader()
- while left_size > 0 and current_id <= newest.sequence:
+ while (left_size is None or left_size > 0) \
+ and (end_id is None or current_id <= end_id) \
+ and current_id <= newest.sequence:
try:
diffdata = self.get_diff_block(current_id)
- except: # noqa: E722
- LOG.error("Error during diff download. Bailing out.")
+ except requests.RequestException as ex:
+ if start_id == current_id \
+ and ex.response is not None \
+ and (ex.response.status_code % 100 == 4):
+ # If server directly responds with a client error,
+ # reraise the exception to signal a potentially permanent
+ # error.
+ LOG.error("Permanent server error: %s", ex.response)
+ raise ex
+ # In all other cases, process whatever diffs we have and
+ # encourage a retry.
+ LOG.error("Error during diff download: %s", ex)
+ LOG.error("Bailing out.")
diffdata = ''
if len(diffdata) == 0:
if start_id == current_id:
return None
break
- left_size -= rd.add_buffer(diffdata, self.diff_type)
- LOG.debug("Downloaded change %d. (%d kB available in download buffer)",
- current_id, left_size / 1024)
+ diff_size = rd.add_buffer(diffdata, self.diff_type)
+ if left_size is None:
+ LOG.debug("Downloaded change %d.", current_id)
+ else:
+ left_size -= diff_size
+ LOG.debug("Downloaded change %d. (%d kB available in download buffer)",
+ current_id, left_size / 1024)
current_id += 1
return DownloadResult(current_id - 1, rd, newest.sequence)
def apply_diffs(self, handler: BaseHandler, start_id: int,
- max_size: int = 1024, idx: str = "",
- simplify: bool = True) -> Optional[int]:
+ max_size: Optional[int] = None,
+ idx: str = "", simplify: bool = True,
+ end_id: Optional[int] = None) -> Optional[int]:
""" Download diffs starting with sequence id `start_id`, merge them
- together and then apply them to handler `handler`. `max_size`
- restricts the number of diffs that are downloaded. The download
- stops as soon as either a diff cannot be downloaded or the
- unpacked data in memory exceeds `max_size` kB.
+ together and then apply them to handler `handler`. `end_id`
+ optionally gives the highest sequence id to download. `max_size`
+ allows to restrict the amount of diffs that are downloaded.
+ Downloaded diffs are temporarily saved in memory and this parameter
+ ensures that pyosmium doesn't run out of memory. `max_size`
+ is the maximum size in kB this internal buffer may have.
+
+ If neither `end_id` nor `max_size` are given, the download is
+ restricted to a maximum size of 1MB. The download also
+ stops when the most recent diff has been processed.
If `idx` is set, a location cache will be created and applied to
the way nodes. You should be aware that diff files usually do not
@@ -197,7 +238,7 @@ class ReplicationServer:
The function returns the sequence id of the last diff that was
downloaded or None if the download failed completely.
"""
- diffs = self.collect_diffs(start_id, max_size)
+ diffs = self.collect_diffs(start_id, end_id=end_id, max_size=max_size)
if diffs is None:
return None
@@ -206,19 +247,26 @@ class ReplicationServer:
return diffs.id
- def apply_diffs_to_file(self, infile: str, outfile: str,
- start_id: int, max_size: int = 1024,
+ def apply_diffs_to_file(self, infile: str, outfile: str, start_id: int,
+ max_size: Optional[int] = None,
set_replication_header: bool = True,
extra_headers: Optional[Mapping[str, str]] = None,
- outformat: Optional[str] = None) -> Optional[Tuple[int, int]]:
+ outformat: Optional[str] = None,
+ end_id: Optional[int] = None) -> Optional[Tuple[int, int]]:
""" Download diffs starting with sequence id `start_id`, merge them
with the data from the OSM file named `infile` and write the result
into a file with the name `outfile`. The output file must not yet
exist.
- `max_size` restricts the number of diffs that are downloaded. The
- download stops as soon as either a diff cannot be downloaded or the
- unpacked data in memory exceeds `max_size` kB.
+ `end_id` optionally gives the highest sequence id to download.
+ `max_size` allows to restrict the amount of diffs that are
+ downloaded. Downloaded diffs are saved in memory and this parameter
+ ensures that pyosmium doesn't run out of memory. `max_size`
+ is the maximum size in kB this internal buffer may have.
+
+ If neither `end_id` nor `max_size` are given, the
+ download is restricted to a maximum size of 1MB. The download also
+ stops when the most recent diff has been processed.
If `set_replication_header` is true then the URL of the replication
server and the sequence id and timestamp of the last diff applied
@@ -235,7 +283,7 @@ class ReplicationServer:
newest available sequence id if new data has been written or None
if no data was available or the download failed completely.
"""
- diffs = self.collect_diffs(start_id, max_size)
+ diffs = self.collect_diffs(start_id, end_id=end_id, max_size=max_size)
if diffs is None:
return None
@@ -274,7 +322,8 @@ class ReplicationServer:
return (diffs.id, diffs.newest)
def timestamp_to_sequence(self, timestamp: dt.datetime,
- balanced_search: bool = False) -> Optional[int]:
+ balanced_search: bool = False,
+ limit_by_oldest_available: bool = False) -> Optional[int]:
""" Get the sequence number of the replication file that contains the
given timestamp. The search algorithm is optimised for replication
servers that publish updates in regular intervals. For servers
@@ -282,8 +331,15 @@ class ReplicationServer:
should be set to true so that a standard binary search for the
sequence will be used. The default is good for all known
OSM replication services.
- """
+ When `limit_by_oldest_available` is set, then the function will
+ return None when the server replication does not start at 0 and
+ the given timestamp is older than the oldest available timestamp
+ on the server. Some replication servers do not keep the full
+ history and this flag avoids accidentally trying to download older
+ data. The downside is that the function will never return the
+ oldest available sequence ID when the flag is set.
+ """
# get the current timestamp from the server
upper = self.get_state_info()
@@ -300,8 +356,10 @@ class ReplicationServer:
lower = self.get_state_info(lowerid)
if lower is not None and lower.timestamp >= timestamp:
- if lower.sequence == 0 or lower.sequence + 1 >= upper.sequence:
- return lower.sequence
+ if lower.sequence == 0:
+ return 0
+ if lower.sequence + 1 >= upper.sequence:
+ return None if limit_by_oldest_available else lower.sequence
upper = lower
lower = None
lowerid = 0
=====================================
src/osmium/tools/common.py
=====================================
@@ -0,0 +1,100 @@
+# SPDX-License-Identifier: BSD-2-Clause
+#
+# This file is part of pyosmium. (https://osmcode.org/pyosmium/)
+#
+# Copyright (C) 2025 Sarah Hoffmann <lonvia at denofr.de> and others.
+# For a full list of authors see the git log.
+from typing import Optional
+import logging
+from dataclasses import dataclass
+import datetime as dt
+from argparse import ArgumentTypeError
+
+from ..replication import newest_change_from_file
+from ..replication.server import ReplicationServer
+from ..replication.utils import get_replication_header
+
+
+log = logging.getLogger()
+
+
+ at dataclass
+class ReplicationStart:
+ """ Represents the point where changeset download should begin.
+ """
+ date: Optional[dt.datetime] = None
+ seq_id: Optional[int] = None
+ source: Optional[str] = None
+
+ def get_sequence(self, svr: ReplicationServer) -> Optional[int]:
+ if self.seq_id is not None:
+ log.debug("Using given sequence ID %d" % self.seq_id)
+ if self.seq_id > 0:
+ start_state = svr.get_state_info(seq=self.seq_id)
+ if start_state is None:
+ log.error(
+ f"Cannot download state information for ID {self.seq_id}."
+ " Server may not have this diff anymore.")
+ return None
+ self.date = start_state.timestamp
+ return self.seq_id + 1
+
+ assert self.date is not None
+ log.debug("Looking up sequence ID for timestamp %s" % self.date)
+ return svr.timestamp_to_sequence(self.date, limit_by_oldest_available=True)
+
+ def get_end_sequence(self, svr: ReplicationServer) -> Optional[int]:
+ if self.seq_id is not None:
+ log.debug("Using end sequence ID %d" % self.seq_id)
+ return self.seq_id
+
+ assert self.date is not None
+ log.debug("Looking up end sequence ID for timestamp %s" % self.date)
+ return svr.timestamp_to_sequence(self.date)
+
+ @staticmethod
+ def from_id(idstr: str) -> 'ReplicationStart':
+ try:
+ seq_id = int(idstr)
+ except ValueError:
+ raise ArgumentTypeError("Sequence id '%s' is not a number" % idstr)
+
+ if seq_id < -1:
+ raise ArgumentTypeError("Sequence id '%s' is negative" % idstr)
+
+ return ReplicationStart(seq_id=seq_id)
+
+ @staticmethod
+ def from_date(datestr: str) -> 'ReplicationStart':
+ try:
+ date = dt.datetime.strptime(datestr, "%Y-%m-%dT%H:%M:%SZ")
+ date = date.replace(tzinfo=dt.timezone.utc)
+ except ValueError:
+ raise ArgumentTypeError(
+ "Date needs to be in ISO8601 format (e.g. 2015-12-24T08:08:08Z).")
+
+ return ReplicationStart(date=date)
+
+ @staticmethod
+ def from_osm_file(fname: str, ignore_headers: bool) -> 'ReplicationStart':
+ if ignore_headers:
+ ts = None
+ seq = None
+ url = None
+ else:
+ try:
+ (url, seq, ts) = get_replication_header(fname)
+ except RuntimeError as e:
+ raise ArgumentTypeError(e)
+
+ if ts is None and seq is None:
+ log.debug("OSM file has no replication headers. Looking for newest OSM object.")
+ try:
+ ts = newest_change_from_file(fname)
+ except RuntimeError as e:
+ raise ArgumentTypeError(e)
+
+ if ts is None:
+ raise ArgumentTypeError("OSM file does not seem to contain valid data.")
+
+ return ReplicationStart(seq_id=seq, date=ts, source=url)
=====================================
src/osmium/tools/pyosmium_get_changes.py
=====================================
@@ -1,3 +1,9 @@
+# SPDX-License-Identifier: BSD-2-Clause
+#
+# This file is part of pyosmium. (https://osmcode.org/pyosmium/)
+#
+# Copyright (C) 2025 Sarah Hoffmann <lonvia at denofr.de> and others.
+# For a full list of authors see the git log.
"""
Fetch diffs from an OSM planet server.
@@ -23,89 +29,23 @@ pyosmium-get-changes does not fetch the cookie from these services for you.
However, it can read cookies from a Netscape-style cookie jar file, send these
cookies to the server and will save received cookies to the jar file.
"""
+from typing import List
import sys
import logging
from textwrap import dedent as msgfmt
-
-from argparse import ArgumentParser, RawDescriptionHelpFormatter, ArgumentTypeError
-import datetime as dt
+from argparse import ArgumentParser, RawDescriptionHelpFormatter
import http.cookiejar
-from osmium.replication import server as rserv
-from osmium.replication import newest_change_from_file
-from osmium.replication.utils import get_replication_header
-from osmium.version import pyosmium_release
-from osmium import SimpleWriter
-
-log = logging.getLogger()
-
-
-class ReplicationStart(object):
- """ Represents the point where changeset download should begin.
- """
-
- def __init__(self, date=None, seq_id=None, src=None):
- self.date = date
- self.seq_id = seq_id
- self.source = src
-
- def get_sequence(self, svr):
- if self.seq_id is not None:
- log.debug("Using given sequence ID %d" % self.seq_id)
- return self.seq_id + 1
-
- log.debug("Looking up sequence ID for timestamp %s" % self.date)
- return svr.timestamp_to_sequence(self.date)
-
- @staticmethod
- def from_id(idstr):
- try:
- seq_id = int(idstr)
- except ValueError:
- raise ArgumentTypeError("Sequence id '%s' is not a number" % idstr)
-
- if seq_id < -1:
- raise ArgumentTypeError("Sequence id '%s' is negative" % idstr)
-
- return ReplicationStart(seq_id=seq_id)
-
- @staticmethod
- def from_date(datestr):
- try:
- date = dt.datetime.strptime(datestr, "%Y-%m-%dT%H:%M:%SZ")
- date = date.replace(tzinfo=dt.timezone.utc)
- except ValueError:
- raise ArgumentTypeError(
- "Date needs to be in ISO8601 format (e.g. 2015-12-24T08:08:08Z).")
-
- return ReplicationStart(date=date)
-
- @staticmethod
- def from_osm_file(fname, ignore_headers):
- if ignore_headers:
- ts = None
- seq = None
- url = None
- else:
- try:
- (url, seq, ts) = get_replication_header(fname)
- except RuntimeError as e:
- raise ArgumentTypeError(e)
-
- if ts is None and seq is None:
- log.debug("OSM file has no replication headers. Looking for newest OSM object.")
- try:
- ts = newest_change_from_file(fname)
- except RuntimeError as e:
- raise ArgumentTypeError(e)
+from ..replication import server as rserv
+from ..version import pyosmium_release
+from .. import SimpleWriter
+from .common import ReplicationStart
- if ts is None:
- raise ArgumentTypeError("OSM file does not seem to contain valid data.")
- return ReplicationStart(seq_id=seq, date=ts, src=url)
+log = logging.getLogger()
-def write_end_sequence(fname, seqid):
+def write_end_sequence(fname: str, seqid: int) -> None:
"""Either writes out the sequence file or prints the sequence id to stdout.
"""
if fname is None:
@@ -115,7 +55,7 @@ def write_end_sequence(fname, seqid):
fd.write(str(seqid))
-def get_arg_parser(from_main=False):
+def get_arg_parser(from_main: bool = False) -> ArgumentParser:
parser = ArgumentParser(prog='pyosmium-get-changes',
description=__doc__,
usage=None if from_main else 'pyosmium-get-changes [options]',
@@ -134,8 +74,9 @@ def get_arg_parser(from_main=False):
parser.add_argument('--cookie', dest='cookie',
help='Netscape-style cookie jar file to read cookies from '
'and where received cookies will be written to.')
- parser.add_argument('-s', '--size', dest='outsize', type=int, default=100,
- help='Maximum data to load in MB (default: 100MB).')
+ parser.add_argument('-s', '--size', dest='outsize', type=int,
+ help='Maximum data to load in MB '
+ '(Defaults to 100MB when no end date/ID has been set).')
group = parser.add_mutually_exclusive_group()
group.add_argument('-I', '--start-id', dest='start',
type=ReplicationStart.from_id, metavar='ID',
@@ -145,6 +86,13 @@ def get_arg_parser(from_main=False):
help='Date when to start updates')
group.add_argument('-O', '--start-osm-data', dest='start_file', metavar='OSMFILE',
help='start at the date of the newest OSM object in the file')
+ group = parser.add_mutually_exclusive_group()
+ group.add_argument('--end-id', dest='end',
+ type=ReplicationStart.from_id, metavar='ID',
+ help='Last sequence ID to download.')
+ group.add_argument('-E', '--end-date', dest='end', metavar='DATE',
+ type=ReplicationStart.from_date,
+ help='Do not download diffs later than the given date.')
parser.add_argument('-f', '--sequence-file', dest='seq_file',
help='Sequence file. If the file exists, then updates '
'will start after the id given in the file. At the '
@@ -164,7 +112,7 @@ def get_arg_parser(from_main=False):
return parser
-def pyosmium_get_changes(args):
+def pyosmium_get_changes(args: List[str]) -> int:
logging.basicConfig(stream=sys.stderr,
format='%(asctime)s %(levelname)s: %(message)s',
datefmt='%Y-%m-%d %H:%M:%S')
@@ -214,23 +162,46 @@ def pyosmium_get_changes(args):
cookie_jar.load(options.cookie)
svr.set_request_parameter('cookies', cookie_jar)
+ # Sanity check if server URL is correct and server is responding.
+ current = svr.get_state_info()
+ if current is None:
+ log.error("Cannot download state information. Is the replication URL correct?")
+ return 3
+ log.debug(f"Server is at sequence {current.sequence} ({current.timestamp}).")
+
startseq = options.start.get_sequence(svr)
if startseq is None:
- log.error("Cannot read state file from server. Is the URL correct?")
+ log.error(f"No starting point found for time {options.start.date} on server {url}")
return 1
if options.outfile is None:
write_end_sequence(options.seq_file, startseq - 1)
return 0
- log.debug("Starting download at ID %d (max %d MB)" % (startseq, options.outsize))
+ log.debug("Starting download at ID %d (max %f MB)"
+ % (startseq, options.outsize or float('inf')))
if options.outformat is not None:
outhandler = SimpleWriter(options.outfile, filetype=options.outformat)
else:
outhandler = SimpleWriter(options.outfile)
- endseq = svr.apply_diffs(outhandler, startseq, max_size=options.outsize*1024,
- simplify=options.simplify)
+ if options.outsize is not None:
+ max_size = options.outsize * 1024
+ elif options.end is None:
+ max_size = 100 * 1024
+ else:
+ max_size = None
+
+ if options.end is None:
+ end_id = None
+ else:
+ end_id = options.end.get_end_sequence(svr)
+ if end_id is None:
+ log.error("Cannot find the end date/ID on the server.")
+ return 1
+
+ endseq = svr.apply_diffs(outhandler, startseq, max_size=max_size,
+ end_id=end_id, simplify=options.simplify)
outhandler.close()
# save cookies
@@ -247,7 +218,7 @@ def pyosmium_get_changes(args):
return 0
-def main():
+def main() -> int:
logging.basicConfig(stream=sys.stderr,
format='%(asctime)s %(levelname)s: %(message)s',
datefmt='%Y-%m-%d %H:%M:%S')
=====================================
src/osmium/tools/pyosmium_up_to_date.py
=====================================
@@ -1,3 +1,9 @@
+# SPDX-License-Identifier: BSD-2-Clause
+#
+# This file is part of pyosmium. (https://osmcode.org/pyosmium/)
+#
+# Copyright (C) 2025 Sarah Hoffmann <lonvia at denofr.de> and others.
+# For a full list of authors see the git log.
"""
Update an OSM file with changes from a OSM replication server.
@@ -29,38 +35,42 @@ pyosmium-up-to-date does not fetch the cookie from these services for you.
However, it can read cookies from a Netscape-style cookie jar file, send these
cookies to the server and will save received cookies to the jar file.
"""
+from typing import Any, List
import sys
import traceback
import logging
import http.cookiejar
-from argparse import ArgumentParser, RawDescriptionHelpFormatter
+from argparse import ArgumentParser, RawDescriptionHelpFormatter, ArgumentTypeError
import datetime as dt
-from osmium.replication import server as rserv
-from osmium.replication.utils import get_replication_header
-from osmium.replication import newest_change_from_file
-from osmium.version import pyosmium_release
from textwrap import dedent as msgfmt
from tempfile import mktemp
import os.path
+from ..replication import server as rserv
+from ..version import pyosmium_release
+from .common import ReplicationStart
+
log = logging.getLogger()
-def update_from_osm_server(ts, options):
- """Update the OSM file using the official OSM servers at
- https://planet.osm.org/replication. This strategy will attempt
- to start with daily updates before going down to minutelies.
- TODO: only updates from hourlies currently implemented.
+def update_from_osm_server(start: ReplicationStart, options: Any) -> int:
+ """ Update the OSM file using the official OSM servers at
+ https://planet.osm.org/replication. This strategy will attempt
+ to start with daily updates before going down to minutelies.
+ TODO: only updates from hourlies currently implemented.
"""
- return update_from_custom_server("https://planet.osm.org/replication/hour/",
- None, ts, options)
+ start.source = "https://planet.osm.org/replication/hour/"
+ return update_from_custom_server(start, options)
+
+def update_from_custom_server(start: ReplicationStart, options: Any) -> int:
+ """ Update from a custom URL, simply using the diff sequence as is.
+ """
+ assert start.source
-def update_from_custom_server(url, seq, ts, options):
- """Update from a custom URL, simply using the diff sequence as is."""
- with rserv.ReplicationServer(url, "osc.gz") as svr:
- log.info("Using replication service at %s", url)
+ with rserv.ReplicationServer(start.source, options.server_diff_type) as svr:
+ log.info(f"Using replication service at {start.source}")
svr.set_request_parameter('timeout', options.socket_timeout or None)
@@ -73,39 +83,29 @@ def update_from_custom_server(url, seq, ts, options):
if current is None:
log.error("Cannot download state information. Is the replication URL correct?")
return 3
- log.debug("Server is at sequence %d (%s).", current.sequence, current.timestamp)
-
- if seq is None:
- log.info("Using timestamp %s as starting point." % ts)
- startseq = svr.timestamp_to_sequence(ts)
- if startseq is None:
- log.error("No starting point found for time %s on server %s"
- % (str(ts), url))
- return 3
- else:
- if seq >= current.sequence:
- log.info("File is already up to date.")
- return 0
-
- log.debug("Using given sequence ID %d" % seq)
- startseq = seq + 1
- ts = svr.get_state_info(seq=startseq)
- if ts is None:
- log.error("Cannot download state information for ID %d. Is the URL correct?" % seq)
- return 3
- ts = ts.timestamp
+ log.debug(f"Server is at sequence {current.sequence} ({current.timestamp}).")
+
+ if start.seq_id is not None and start.seq_id >= current.sequence:
+ log.info("File is already up to date.")
+ return 0
+
+ startseq = start.get_sequence(svr)
+ if startseq is None:
+ log.error(f"No starting point found for time {start.date} on server {start.source}")
+ return 3
if not options.force_update:
cmpdate = dt.datetime.now(dt.timezone.utc) - dt.timedelta(days=90)
cmpdate = cmpdate.replace(tzinfo=dt.timezone.utc)
- if ts < cmpdate:
+ if start.date is None or start.date < cmpdate:
log.error(
"""The OSM file is more than 3 months old. You should download a
more recent file instead of updating. If you really want to
update the file, use --force-update-of-old-planet.""")
return 3
- log.info("Starting download at ID %d (max %d MB)" % (startseq, options.outsize))
+ log.info("Starting download at ID %d (max %f MB)"
+ % (startseq, options.outsize or float('inf')))
outfile = options.outfile
infile = options.infile
@@ -118,10 +118,25 @@ def update_from_custom_server(url, seq, ts, options):
else:
ofname = outfile
+ if options.outsize is not None:
+ max_size = options.outsize * 1024
+ elif options.end is None:
+ max_size = 1024 * 1024
+ else:
+ max_size = None
+
+ if options.end is None:
+ end_id = None
+ else:
+ end_id = options.end.get_end_sequence(svr)
+ if end_id is None:
+ log.error("Cannot find the end date/ID on the server.")
+ return 1
+
try:
extra_headers = {'generator': 'pyosmium-up-to-date/' + pyosmium_release}
outseqs = svr.apply_diffs_to_file(infile, ofname, startseq,
- max_size=options.outsize*1024,
+ max_size=max_size, end_id=end_id,
extra_headers=extra_headers,
outformat=options.outformat)
@@ -130,7 +145,7 @@ def update_from_custom_server(url, seq, ts, options):
return 3
if outfile is None:
- os.rename(ofname, infile)
+ os.replace(ofname, infile)
finally:
if outfile is None:
try:
@@ -144,41 +159,35 @@ def update_from_custom_server(url, seq, ts, options):
if options.cookie:
cookie_jar.save(options.cookie)
- return 0 if outseqs[1] == outseqs[0] else 1
+ return 0 if (end_id or outseqs[1]) == outseqs[0] else 1
-def compute_start_point(options):
- if options.ignore_headers:
- url, seq, ts = None, None, None
- else:
- url, seq, ts = get_replication_header(options.infile)
+def compute_start_point(options: Any) -> ReplicationStart:
+ start = ReplicationStart.from_osm_file(options.infile, options.ignore_headers)
if options.server_url is not None:
- if url is not None and url != options.server_url:
+ if start.source is not None and start.source != options.server_url:
log.error(msgfmt(f"""
You asked to use server URL:
{options.server_url}
but the referenced OSM file points to replication server:
- {url}
+ {start.source}
If you really mean to overwrite the URL, use --ignore-osmosis-headers."""))
- exit(2)
- url = options.server_url
-
- if seq is None and ts is None:
- log.info("No replication information found, scanning for newest OSM object.")
- ts = newest_change_from_file(options.infile)
-
- if ts is None:
- log.error("OSM file does not seem to contain valid data.")
- exit(2)
+ raise ArgumentTypeError("Source URL doesn't match replication headers.")
+ if start.source is None:
+ start.source = options.server_url
+ start.seq_id = None
+ if start.date is None:
+ raise ArgumentTypeError("Cannot determine start date for file.")
- if ts is not None:
- ts -= dt.timedelta(minutes=options.wind_back)
+ if start.seq_id is None:
+ assert start.date is not None
+ start.date -= dt.timedelta(minutes=options.wind_back)
- return url, seq, ts
+ return start
-def get_arg_parser(from_main=False):
+def get_arg_parser(from_main: bool = False) -> ArgumentParser:
parser = ArgumentParser(prog='pyosmium-up-to-date',
description=__doc__,
@@ -197,8 +206,18 @@ def get_arg_parser(from_main=False):
help='Base URL of the replication server. Default: '
'https://planet.osm.org/replication/hour/ '
'(hourly diffs from osm.org)')
- parser.add_argument('-s', '--size', dest='outsize', metavar='SIZE', type=int, default=1024,
- help='Maximum size of change to apply at once in MB. Default: 1GB')
+ parser.add_argument('--diff-type', action='store', dest='server_diff_type', default='osc.gz',
+ help='File format used by the replication server (default: osc.gz)')
+ parser.add_argument('-s', '--size', dest='outsize', metavar='SIZE', type=int,
+ help='Maximum size of change to apply at once in MB. '
+ 'Defaults to 1GB when no end ID or date was given.')
+ group = parser.add_mutually_exclusive_group()
+ group.add_argument('--end-id', dest='end',
+ type=ReplicationStart.from_id, metavar='ID',
+ help='Last sequence ID to download.')
+ group.add_argument('-E', '--end-date', dest='end', metavar='DATE',
+ type=ReplicationStart.from_date,
+ help='Do not download diffs later than the given date.')
parser.add_argument('--tmpdir', dest='tmpdir',
help='Directory to use for temporary files. '
'Usually the directory of input file is used.')
@@ -225,28 +244,28 @@ def get_arg_parser(from_main=False):
return parser
-def pyosmium_up_to_date(args):
- options = get_arg_parser(from_main=True).parse_args()
+def pyosmium_up_to_date(args: List[str]) -> int:
+ options = get_arg_parser(from_main=True).parse_args(args)
log.setLevel(max(3 - options.loglevel, 0) * 10)
try:
- url, seq, ts = compute_start_point(options)
+ start = compute_start_point(options)
except RuntimeError as e:
log.error(str(e))
return 2
try:
- if url is None:
- return update_from_osm_server(ts, options)
+ if start.source is None:
+ return update_from_osm_server(start, options)
- return update_from_custom_server(url, seq, ts, options)
+ return update_from_custom_server(start, options)
except Exception:
traceback.print_exc()
return 254
-def main():
+def main() -> int:
logging.basicConfig(stream=sys.stderr,
format='%(asctime)s %(levelname)s: %(message)s',
datefmt='%Y-%m-%d %H:%M:%S')
=====================================
test/test_pyosmium_get_changes.py
=====================================
@@ -8,8 +8,10 @@
"""
from textwrap import dedent
import uuid
+import datetime as dt
+
+import pytest
-import osmium.replication.server
import osmium
from osmium.tools.pyosmium_get_changes import pyosmium_get_changes
@@ -21,87 +23,140 @@ except ImportError:
import cookielib as cookiejarlib
-class TestPyosmiumGetChanges:
+REPLICATION_BASE_TIME = dt.datetime(year=2017, month=8, day=26, hour=11, tzinfo=dt.timezone.utc)
+REPLICATION_BASE_SEQ = 100
+REPLICATION_CURRENT = 140
+
+
+ at pytest.fixture
+def replication_server(httpserver):
+ def _state(seq):
+ seqtime = REPLICATION_BASE_TIME + dt.timedelta(hours=seq - REPLICATION_CURRENT)
+ timestamp = seqtime.strftime('%Y-%m-%dT%H\\:%M\\:%SZ')
+ return f"sequenceNumber={seq}\ntimestamp={timestamp}\n"
+
+ httpserver.no_handler_status_code = 404
+ httpserver.expect_request('/state.txt').respond_with_data(_state(REPLICATION_CURRENT))
+ for i in range(REPLICATION_BASE_SEQ, REPLICATION_CURRENT + 1):
+ httpserver.expect_request(f'/000/000/{i}.opl')\
+ .respond_with_data(f"r{i} M" + ",".join(f"n{i}@" for i in range(1, 6000)))
+ httpserver.expect_request(f'/000/000/{i}.state.txt').respond_with_data(_state(i))
+
+ return httpserver.url_for('')
+
+
+ at pytest.fixture
+def runner(httpserver):
+ def _run(*args):
+ return pyosmium_get_changes(
+ ['--server', httpserver.url_for(''), '--diff-type', 'opl'] + list(map(str, args)))
+
+ return _run
+
+
+def test_init_id(runner, capsys, replication_server):
+ assert 0 == runner('-I', '100')
+
+ output = capsys.readouterr().out.strip()
+
+ assert output == '100'
+
+
+def test_init_date(runner, capsys, httpserver):
+ httpserver.expect_request('/state.txt').respond_with_data(dedent("""\
+ sequenceNumber=100
+ timestamp=2017-08-26T11\\:04\\:02Z
+ """))
+ httpserver.expect_request('/000/000/000.state.txt').respond_with_data(dedent("""\
+ sequenceNumber=0
+ timestamp=2016-08-26T11\\:04\\:02Z
+ """))
+ assert 0 == runner('-D', '2015-12-24T08:08:08Z')
+
+ output = capsys.readouterr().out.strip()
+
+ assert output == '-1'
+
+
+def test_init_to_file(runner, tmp_path, replication_server):
+ fname = tmp_path / f"{uuid.uuid4()}.seq"
+
+ assert 0 == runner('-I', '130', '-f', fname)
+ assert fname.read_text() == '130'
+
+
+def test_init_from_seq_file(runner, tmp_path, replication_server):
+ fname = tmp_path / f"{uuid.uuid4()}.seq"
+ fname.write_text('140')
+
+ assert 0 == runner('-f', fname)
+ assert fname.read_text() == '140'
+
+
+def test_init_date_with_cookie(runner, capsys, tmp_path, httpserver):
+ httpserver.expect_request('/state.txt').respond_with_data(dedent("""\
+ sequenceNumber=100
+ timestamp=2017-08-26T11\\:04\\:02Z
+ """))
+ httpserver.expect_request('/000/000/000.state.txt').respond_with_data(dedent("""\
+ sequenceNumber=0
+ timestamp=2016-08-26T11\\:04\\:02Z
+ """))
- def main(self, httpserver, *args):
- return pyosmium_get_changes(['--server', httpserver.url_for('')] + list(args))
+ fname = tmp_path / 'my.cookie'
+ cookie_jar = cookiejarlib.MozillaCookieJar(str(fname))
+ cookie_jar.save()
- def test_init_id(self, capsys, httpserver):
- assert 0 == self.main(httpserver, '-I', '453')
+ assert 0 == runner('--cookie', fname, '-D', '2015-12-24T08:08:08Z')
- output = capsys.readouterr().out.strip()
+ output = capsys.readouterr().out.strip()
- assert output == '453'
+ assert output == '-1'
- def test_init_date(self, capsys, httpserver):
- httpserver.expect_request('/state.txt').respond_with_data(dedent("""\
- sequenceNumber=100
- timestamp=2017-08-26T11\\:04\\:02Z
- """))
- httpserver.expect_request('/000/000/000.state.txt').respond_with_data(dedent("""\
- sequenceNumber=0
- timestamp=2016-08-26T11\\:04\\:02Z
- """))
- assert 0 == self.main(httpserver, '-D', '2015-12-24T08:08:08Z')
- output = capsys.readouterr().out.strip()
+def test_get_simple_update(runner, tmp_path, replication_server):
+ outfile = tmp_path / f"{uuid.uuid4()}.opl"
- assert output == '-1'
+ assert 0 == runner('-I', '139', '-o', outfile)
- def test_init_to_file(self, tmp_path, httpserver):
- fname = tmp_path / f"{uuid.uuid4()}.seq"
+ ids = IDCollector()
+ osmium.apply(outfile, ids)
- assert 0 == self.main(httpserver, '-I', '453', '-f', str(fname))
- assert fname.read_text() == '453'
+ assert ids.nodes == []
+ assert ids.ways == []
+ assert ids.relations == [140]
- def test_init_from_seq_file(self, tmp_path, httpserver):
- fname = tmp_path / f"{uuid.uuid4()}.seq"
- fname.write_text('453')
- assert 0 == self.main(httpserver, '-f', str(fname))
- assert fname.read_text() == '453'
+ at pytest.mark.parametrize('end_id,max_size,actual_end', [(107, None, 107),
+ (None, 1, 108),
+ (105, 1, 105),
+ (110, 1, 108)])
+def test_apply_diffs_endid(runner, tmp_path, replication_server, end_id, max_size, actual_end):
+ outfile = tmp_path / f"{uuid.uuid4()}.opl"
- def test_init_date_with_cookie(self, capsys, tmp_path, httpserver):
- httpserver.expect_request('/state.txt').respond_with_data(dedent("""\
- sequenceNumber=100
- timestamp=2017-08-26T11\\:04\\:02Z
- """))
- httpserver.expect_request('/000/000/000.state.txt').respond_with_data(dedent("""\
- sequenceNumber=0
- timestamp=2016-08-26T11\\:04\\:02Z
- """))
+ params = ['-I', '100', '-o', outfile]
+ if end_id is not None:
+ params.extend(('--end-id', end_id))
+ if max_size is not None:
+ params.extend(('-s', max_size))
- fname = tmp_path / 'my.cookie'
- cookie_jar = cookiejarlib.MozillaCookieJar(str(fname))
- cookie_jar.save()
+ assert 0 == runner(*params)
- assert 0 == self.main(httpserver, '--cookie', str(fname),
- '-D', '2015-12-24T08:08:08Z')
+ ids = IDCollector()
+ osmium.apply(str(outfile), ids)
- output = capsys.readouterr().out.strip()
+ assert ids.relations == list(range(101, actual_end + 1))
- assert output == '-1'
- def test_get_simple_update(self, tmp_path, httpserver):
- outfile = tmp_path / f"{uuid.uuid4()}.opl"
+def test_change_id_too_old_for_replication_source(runner, tmp_path, replication_server, caplog):
+ outfile = tmp_path / f"{uuid.uuid4()}.opl"
- httpserver.expect_request('/state.txt').respond_with_data(dedent("""\
- sequenceNumber=454
- timestamp=2017-08-26T11\\:04\\:02Z
- """))
- httpserver.expect_request('/000/000/454.state.txt').respond_with_data(dedent("""\
- sequenceNumber=454
- timestamp=2016-08-26T11\\:04\\:02Z
- """))
- httpserver.expect_request('/000/000/454.opl').respond_with_data(
- "n12 v1 x4 y6\nn13 v1 x9 y-6\nw2 v2 Nn1,n2")
+ assert 1 == runner('-I', 98, '-o', outfile)
+ assert 'Cannot download state information for ID 98.' in caplog.text
- assert 0 == self.main(httpserver, '--diff-type', 'opl',
- '-I', '453', '-o', str(outfile))
- ids = IDCollector()
- osmium.apply(str(outfile), ids)
+def test_change_date_too_old_for_replication_source(runner, tmp_path, replication_server, caplog):
+ outfile = tmp_path / f"{uuid.uuid4()}.opl"
- assert ids.nodes == [12, 13]
- assert ids.ways == [2]
- assert ids.relations == []
+ assert 1 == runner('-D', '2015-12-24T08:08:08Z', '-o', outfile)
+ assert 'No starting point found' in caplog.text
=====================================
test/test_pyosmium_up-to-date.py
=====================================
@@ -0,0 +1,190 @@
+# SPDX-License-Identifier: BSD-2-Clause
+#
+# This file is part of pyosmium. (https://osmcode.org/pyosmium/)
+#
+# Copyright (C) 2025 Sarah Hoffmann <lonvia at denofr.de> and others.
+# For a full list of authors see the git log.
+""" Tests for the pyosmium-up-to-date script.
+"""
+import uuid
+import datetime as dt
+
+import pytest
+import osmium
+from osmium.tools.pyosmium_up_to_date import pyosmium_up_to_date
+import osmium.replication.utils as rutil
+
+from helpers import IDCollector
+
+# Choosing a future date here, so we don't run into pyosmium's check for old
+# data. If you get caught by this: congratulations, you are maintaining a
+# 50-year old test.
+REPLICATION_BASE_TIME = dt.datetime(year=2070, month=5, day=6, hour=20, tzinfo=dt.timezone.utc)
+REPLICATION_BASE_SEQ = 100
+REPLICATION_CURRENT = 140
+
+
+ at pytest.fixture
+def replication_server(httpserver):
+ def _state(seq):
+ seqtime = REPLICATION_BASE_TIME + dt.timedelta(hours=seq - REPLICATION_CURRENT)
+ timestamp = seqtime.strftime('%Y-%m-%dT%H\\:%M\\:%SZ')
+ return f"sequenceNumber={seq}\ntimestamp={timestamp}\n"
+
+ httpserver.no_handler_status_code = 404
+ httpserver.expect_request('/state.txt').respond_with_data(_state(REPLICATION_CURRENT))
+ for i in range(REPLICATION_BASE_SEQ, REPLICATION_CURRENT + 1):
+ httpserver.expect_request(f'/000/000/{i}.opl')\
+ .respond_with_data(f"r{i} M" + ",".join(f"n{i}@" for i in range(1, 6000)))
+ httpserver.expect_request(f'/000/000/{i}.state.txt').respond_with_data(_state(i))
+
+ return httpserver.url_for('')
+
+
+ at pytest.fixture
+def runner(replication_server):
+ def _run(*args):
+ return pyosmium_up_to_date(
+ ['--server', replication_server, '--diff-type', 'opl'] + list(map(str, args)))
+
+ return _run
+
+
+def test_no_output_file(runner):
+ with pytest.raises(SystemExit):
+ runner()
+
+
+def test_simple_update_no_windback(runner, test_data):
+ outfile = test_data("n1 v1 t2070-05-06T19:30:00Z")
+
+ assert 0 == runner('--wind-back', 0, outfile)
+
+ ids = IDCollector()
+ osmium.apply(outfile, ids)
+
+ assert ids.nodes == [1]
+ assert ids.relations == list(range(139, REPLICATION_CURRENT + 1))
+
+
+def test_simple_update_override(runner, test_data):
+ outfile = test_data("n1 v1 t2070-05-06T19:30:00Z")
+
+ assert 0 == runner(outfile)
+
+ ids = IDCollector()
+ osmium.apply(outfile, ids)
+
+ assert ids.nodes == [1]
+ assert ids.relations == list(range(138, REPLICATION_CURRENT + 1))
+
+
+def test_simple_update_new_file(runner, replication_server, test_data, tmp_path):
+ outfile = test_data("n1 v1 t2070-05-06T19:30:00Z")
+ newfile = tmp_path / f"{uuid.uuid4()}.pbf"
+
+ assert 0 == runner('-o', str(newfile), outfile)
+
+ ids = IDCollector()
+ osmium.apply(outfile, ids)
+
+ assert ids.nodes == [1]
+ assert ids.relations == []
+
+ ids = IDCollector()
+ osmium.apply(newfile, ids)
+ assert ids.nodes == [1]
+ assert ids.relations == list(range(138, REPLICATION_CURRENT + 1))
+
+ header = rutil.get_replication_header(newfile)
+
+ assert header.url == replication_server
+ assert header.sequence == REPLICATION_CURRENT
+ assert header.timestamp == REPLICATION_BASE_TIME
+
+
+def test_update_sequences(runner, test_data, tmp_path):
+ outfile = test_data("n1 v1 t2070-05-05T10:30:00Z")
+ newfile = tmp_path / f"{uuid.uuid4()}.pbf"
+
+ assert 0 == runner('--end-id', '110', '-o', str(newfile), outfile)
+
+ ids = IDCollector()
+ osmium.apply(newfile, ids)
+ assert ids.nodes == [1]
+ assert ids.relations == list(range(105, 111))
+
+ header = rutil.get_replication_header(newfile)
+
+ assert header.sequence == 110
+
+ # Note: this test only catches holes, no duplicate application.
+ assert 0 == runner(newfile)
+
+ ids = IDCollector()
+ osmium.apply(newfile, ids)
+ assert ids.nodes == [1]
+ assert ids.relations == list(range(105, REPLICATION_CURRENT + 1))
+
+ header = rutil.get_replication_header(newfile)
+
+ assert header.sequence == REPLICATION_CURRENT
+
+
+ at pytest.mark.parametrize('end_id,max_size,actual_end', [(107, None, 107),
+ (None, 1, 108),
+ (105, 1, 105),
+ (110, 1, 108)])
+def test_update_with_endid(test_data, runner, end_id, max_size, actual_end):
+ outfile = test_data("n1 v1 t2070-05-05T06:30:00Z")
+
+ params = [outfile]
+ if end_id is not None:
+ params.extend(('--end-id', end_id))
+ if max_size is not None:
+ params.extend(('-s', max_size))
+
+ assert (0 if end_id == actual_end else 1) == runner(*params)
+
+ ids = IDCollector()
+ osmium.apply(outfile, ids)
+
+ assert ids.relations == list(range(101, actual_end + 1))
+
+
+def test_update_with_enddate(test_data, runner, tmp_path):
+ outfile = test_data("n1 v1 t2070-05-05T06:30:00Z")
+ newfile = tmp_path / f"{uuid.uuid4()}.pbf"
+
+ assert 0 == runner('-E', '2070-05-05T09:30:00Z', '-o', newfile, outfile)
+
+ header = rutil.get_replication_header(newfile)
+
+ assert header.sequence == 105
+ assert header.timestamp == dt.datetime(year=2070, month=5, day=5, hour=9,
+ tzinfo=dt.timezone.utc)
+
+ ids = IDCollector()
+ osmium.apply(newfile, ids)
+
+ assert ids.relations == list(range(101, 106))
+
+
+def test_change_date_too_old_for_replication_source(test_data, runner, caplog):
+ outfile = test_data("n1 v1 t2070-04-05T06:30:00Z")
+
+ assert 3 == runner(outfile)
+ assert 'No starting point found' in caplog.text
+
+
+def test_change_id_too_old_for_replication_source(caplog, tmp_path, runner, replication_server):
+ outfile = tmp_path / f"{uuid.uuid4()}.pbf"
+ h = osmium.io.Header()
+ h.set('osmosis_replication_base_url', replication_server)
+ h.set('osmosis_replication_sequence_number', '98')
+
+ with osmium.SimpleWriter(outfile, 4000, h) as w:
+ w.add_node({'id': 1})
+
+ assert 3 == runner(outfile)
+ assert 'Cannot download state information for ID 98' in caplog.text
=====================================
test/test_replication.py
=====================================
@@ -10,10 +10,11 @@ from textwrap import dedent
import uuid
import pytest
+import requests
from werkzeug.wrappers import Response
-from helpers import mkdate, CountingHandler
+from helpers import mkdate, CountingHandler, IDCollector
import osmium.replication.server as rserv
import osmium.replication
@@ -223,6 +224,33 @@ def test_apply_diffs_count(httpserver):
assert h.counts == [1, 1, 1, 0]
+ at pytest.mark.parametrize('end_id,max_size, actual_end', [(107, None, 107),
+ (None, 512, 108),
+ (105, 512, 105),
+ (110, 512, 108),
+ (None, None, 115)])
+def test_apply_diffs_endid(httpserver, end_id, max_size, actual_end):
+ httpserver.expect_request('/state.txt').respond_with_data("""\
+ sequenceNumber=140
+ timestamp=2017-08-26T11\\:04\\:02Z
+ """)
+ for i in range(100, 141):
+ httpserver.expect_request(f'/000/000/{i}.opl')\
+ .respond_with_data(f"r{i} M" + ",".join(f"n{i}@" for i in range(1, 3000)))
+
+ with rserv.ReplicationServer(httpserver.url_for(''), "opl") as svr:
+ res = svr.collect_diffs(101, end_id=end_id, max_size=max_size)
+
+ assert res is not None
+ assert res.id == actual_end
+ assert res.newest == 140
+
+ ids = IDCollector()
+ res.reader.apply(ids)
+
+ assert ids.relations == list(range(101, actual_end + 1))
+
+
def test_apply_diffs_without_simplify(httpserver):
httpserver.expect_ordered_request('/state.txt').respond_with_data("""\
sequenceNumber=100
@@ -364,6 +392,23 @@ def test_apply_diffs_permanent_error(httpserver, caplog):
httpserver.expect_ordered_request('/000/000/100.opl')\
.respond_with_data('not a file', status=404)
+ with caplog.at_level(logging.ERROR):
+ with rserv.ReplicationServer(httpserver.url_for(''), "opl") as svr:
+ h = CountingHandler()
+ with pytest.raises(requests.HTTPError, match='404'):
+ svr.apply_diffs(h, 100, 10000)
+
+ assert 'Permanent server error' in caplog.text
+
+
+def test_apply_diffs_transient_error_first_diff(httpserver, caplog):
+ httpserver.expect_ordered_request('/state.txt').respond_with_data("""\
+ sequenceNumber=100
+ timestamp=2017-08-26T11\\:04\\:02Z
+ """)
+ httpserver.expect_request('/000/000/100.opl')\
+ .respond_with_data('not a file', status=503)
+
with caplog.at_level(logging.ERROR):
with rserv.ReplicationServer(httpserver.url_for(''), "opl") as svr:
h = CountingHandler()
@@ -394,7 +439,7 @@ def test_apply_diffs_permanent_error_later_diff(httpserver, caplog):
assert 'Error during diff download' in caplog.text
-def test_apply_diffs_transient_error(httpserver, caplog):
+def test_apply_diffs_transient_error_later_diff(httpserver, caplog):
httpserver.expect_ordered_request('/state.txt').respond_with_data("""\
sequenceNumber=101
timestamp=2017-08-26T11\\:04\\:02Z
View it on GitLab: https://salsa.debian.org/debian-gis-team/pyosmium/-/commit/53bfe6278117e9bdd9cb8fff44036e47ead96f32
--
View it on GitLab: https://salsa.debian.org/debian-gis-team/pyosmium/-/commit/53bfe6278117e9bdd9cb8fff44036e47ead96f32
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/pkg-grass-devel/attachments/20251021/56c55a73/attachment-0001.htm>
More information about the Pkg-grass-devel
mailing list