[Git][debian-gis-team/pyogrio][upstream] New upstream version 0.11.0+ds
Bas Couwenberg (@sebastic)
gitlab at salsa.debian.org
Mon May 12 05:19:14 BST 2025
Bas Couwenberg pushed to branch upstream at Debian GIS Project / pyogrio
Commits:
fc2eb70a by Bas Couwenberg at 2025-05-12T05:35:18+02:00
New upstream version 0.11.0+ds
- - - - -
29 changed files:
- − .circleci/config.yml
- .pre-commit-config.yaml
- CHANGES.md
- README.md
- docs/source/about.md
- docs/source/index.md
- environment-dev.yml
- pyogrio/__init__.py
- pyogrio/_compat.py
- pyogrio/_err.pxd
- pyogrio/_err.pyx
- pyogrio/_geometry.pxd
- pyogrio/_geometry.pyx
- pyogrio/_io.pyx
- pyogrio/_ogr.pxd
- pyogrio/_ogr.pyx
- pyogrio/_version.py
- pyogrio/_vsi.pxd
- pyogrio/_vsi.pyx
- pyogrio/geopandas.py
- pyogrio/tests/conftest.py
- pyogrio/tests/test_arrow.py
- pyogrio/tests/test_core.py
- pyogrio/tests/test_geopandas_io.py
- pyogrio/tests/test_path.py
- pyogrio/tests/test_raw_io.py
- pyogrio/util.py
- pyproject.toml
- setup.py
Changes:
=====================================
.circleci/config.yml deleted
=====================================
@@ -1,48 +0,0 @@
-version: 2.1
-
-jobs:
- linux-aarch64-wheels:
- working_directory: ~/linux-aarch64-wheels
- machine:
- image: default
- docker_layer_caching: true
- # resource_class is what tells CircleCI to use an ARM worker for native arm builds
- # https://circleci.com/product/features/resource-classes/
- resource_class: arm.medium
- environment:
- CIBUILDWHEEL: 1
- CIBW_BUILD: "cp*-manylinux_aarch64"
- steps:
- - checkout
- - run:
- name: Build docker image with GDAL install
- command: docker build -f ci/manylinux_2_28_aarch64-vcpkg-gdal.Dockerfile -t manylinux-aarch64-vcpkg-gdal:latest .
- - run:
- name: Build the Linux aarch64 wheels.
- command: |
- python3 -m pip install --user cibuildwheel==2.21.0
- python3 -m cibuildwheel --output-dir wheelhouse
- - run:
- name: Test the wheels
- command: |
- python3 -m pip install -r ci/requirements-wheel-test.txt
- python3 -m pip install --no-deps geopandas
- python3 -m pip install --pre --find-links wheelhouse pyogrio
- python3 -m pip list
- cd ..
- python3 -c "import pyogrio; print(f'GDAL version: {pyogrio.__gdal_version__}\nGEOS version: {pyogrio.__gdal_geos_version__}')"
- python3 -m pytest --pyargs pyogrio.tests -v
- - store_artifacts:
- path: wheelhouse/
-
-workflows:
- wheel-build:
- jobs:
- - linux-aarch64-wheels:
- filters:
- branches:
- only:
- - main
- - wheels-linux-aarch64
- tags:
- only: /.*/
=====================================
.pre-commit-config.yaml
=====================================
@@ -1,7 +1,11 @@
-files: 'pyogrio\/'
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
- rev: "v0.5.2"
+ rev: "v0.11.5"
hooks:
- id: ruff-format
- - id: ruff
\ No newline at end of file
+ - id: ruff
+ - repo: https://github.com/MarcoGorelli/cython-lint
+ rev: v0.16.6
+ hooks:
+ - id: cython-lint
+ - id: double-quote-cython-strings
\ No newline at end of file
=====================================
CHANGES.md
=====================================
@@ -1,16 +1,38 @@
# CHANGELOG
+## 0.11.0 (2025-05-08)
+
+### Improvements
+
+- Capture all errors logged by gdal when opening a file fails (#495).
+- Add support to read and write ".gpkg.zip" (GDAL >= 3.7), ".shp.zip", and ".shz"
+ files (#527).
+- Compatibility with the string dtype in the upcoming pandas 3.0 release (#493).
+
+### Bug fixes
+
+- Fix WKB writing on big-endian systems (#497).
+- Fix writing fids to e.g. GPKG file with `use_arrow` (#511).
+- Fix error in `write_dataframe` when writing an empty or all-None object
+ column with `use_arrow` (#512).
+
+### Packaging
+
+- The GDAL library included in the wheels is upgraded from 3.9.2 to 3.10.3 (#499).
+
## 0.10.0 (2024-09-28)
### Improvements
- Add support to read, write, list, and remove `/vsimem/` files (#457).
+- Raise specific error when trying to read non-UTF-8 file with
+ `use_arrow=True` (#490).
### Bug fixes
- Silence warning from `write_dataframe` with `GeoSeries.notna()` (#435).
- Enable mask & bbox filter when geometry column not read (#431).
-- Raise NotImplmentedError when user attempts to write to an open file handle (#442).
+- Raise `NotImplementedError` when user attempts to write to an open file handle (#442).
- Prevent seek on read from compressed inputs (#443).
### Packaging
=====================================
README.md
=====================================
@@ -1,35 +1,29 @@
-# pyogrio - Vectorized spatial vector file format I/O using GDAL/OGR
-
-Pyogrio provides a
-[GeoPandas](https://github.com/geopandas/geopandas)-oriented API to OGR vector
-data sources, such as ESRI Shapefile, GeoPackage, and GeoJSON. Vector data sources
-have geometries, such as points, lines, or polygons, and associated records
-with potentially many columns worth of data.
-
-Pyogrio uses a vectorized approach for reading and writing GeoDataFrames to and
-from OGR vector data sources in order to give you faster interoperability. It
-uses pre-compiled bindings for GDAL/OGR so that the performance is primarily
-limited by the underlying I/O speed of data source drivers in GDAL/OGR rather
-than multiple steps of converting to and from Python data types within Python.
+# pyogrio - bulk-oriented spatial vector file I/O using GDAL/OGR
+
+Pyogrio provides fast, bulk-oriented read and write access to
+[GDAL/OGR](https://gdal.org/en/latest/drivers/vector/index.html) vector data
+sources, such as ESRI Shapefile, GeoPackage, GeoJSON, and several others.
+Vector data sources typically have geometries, such as points, lines, or
+polygons, and associated records with potentially many columns worth of data.
+
+The typical use is to read or write these data sources to/from
+[GeoPandas](https://github.com/geopandas/geopandas) `GeoDataFrames`. Because
+the geometry column is optional, reading or writing only non-spatial data is
+also possible. Hence, GeoPackage attribute tables, DBF files, or CSV files are
+also supported.
+
+Pyogrio is fast because it uses pre-compiled bindings for GDAL/OGR to read and
+write the data records in bulk. This approach avoids multiple steps of
+converting to and from Python data types within Python, so performance becomes
+primarily limited by the underlying I/O speed of data source drivers in
+GDAL/OGR.
We have seen \>5-10x speedups reading files and \>5-20x speedups writing files
-compared to using non-vectorized approaches (Fiona and current I/O support in
-GeoPandas).
-
-You can read these data sources into
-`GeoDataFrames`, read just the non-geometry columns into Pandas `DataFrames`,
-or even read non-spatial data sources that exist alongside vector data sources,
-such as tables in a ESRI File Geodatabase, or antiquated DBF files.
-
-Pyogrio also enables you to write `GeoDataFrames` to at least a few different
-OGR vector data source formats.
+compared to using row-per-row approaches (e.g. Fiona).
Read the documentation for more information:
[https://pyogrio.readthedocs.io](https://pyogrio.readthedocs.io/en/latest/).
-WARNING: Pyogrio is still at an early version and the API is subject to
-substantial change. Please see [CHANGES](CHANGES.md).
-
## Requirements
Supports Python 3.9 - 3.13 and GDAL 3.4.x - 3.9.x.
@@ -52,9 +46,9 @@ for more information.
## Supported vector formats
-Pyogrio supports some of the most common vector data source formats (provided
-they are also supported by GDAL/OGR), including ESRI Shapefile, GeoPackage,
-GeoJSON, and FlatGeobuf.
+Pyogrio supports most common vector data source formats (provided they are also
+supported by GDAL/OGR), including ESRI Shapefile, GeoPackage, GeoJSON, and
+FlatGeobuf.
Please see the [list of supported formats](https://pyogrio.readthedocs.io/en/latest/supported_formats.html)
for more information.
@@ -64,7 +58,7 @@ for more information.
Please read the [introduction](https://pyogrio.readthedocs.io/en/latest/supported_formats.html)
for more information and examples to get started using Pyogrio.
-You can also check out the the [API documentation](https://pyogrio.readthedocs.io/en/latest/api.html)
+You can also check out the [API documentation](https://pyogrio.readthedocs.io/en/latest/api.html)
for full details on using the API.
## Credits
=====================================
docs/source/about.md
=====================================
@@ -22,7 +22,7 @@ for working with OGR vector data sources. It is **awesome**, has highly-dedicate
maintainers and contributors, and exposes more functionality than Pyogrio ever will.
This project would not be possible without Fiona having come first.
-Pyogrio uses a vectorized (array-oriented) approach for reading and writing
+Pyogrio uses a bulk-oriented approach for reading and writing
spatial vector file formats, which enables faster I/O operations. It borrows
from the internal mechanics and lessons learned of Fiona. It uses a stateless
approach to reading or writing data; all data are read or written in a single
=====================================
docs/source/index.md
=====================================
@@ -1,32 +1,25 @@
-# pyogrio - Vectorized spatial vector file format I/O using GDAL/OGR
-
-Pyogrio provides a
-[GeoPandas](https://github.com/geopandas/geopandas)-oriented API to OGR vector
-data sources, such as ESRI Shapefile, GeoPackage, and GeoJSON. Vector data sources
-have geometries, such as points, lines, or polygons, and associated records
-with potentially many columns worth of data.
-
-Pyogrio uses a vectorized approach for reading and writing GeoDataFrames to and
-from OGR vector data sources in order to give you faster interoperability. It
-uses pre-compiled bindings for GDAL/OGR so that the performance is primarily
-limited by the underlying I/O speed of data source drivers in GDAL/OGR rather
-than multiple steps of converting to and from Python data types within Python.
+# pyogrio - bulk-oriented spatial vector file I/O using GDAL/OGR
+
+Pyogrio provides fast, bulk-oriented read and write access to
+[GDAL/OGR](https://gdal.org/en/latest/drivers/vector/index.html) vector data
+sources, such as ESRI Shapefile, GeoPackage, GeoJSON, and several others.
+Vector data sources typically have geometries, such as points, lines, or
+polygons, and associated records with potentially many columns worth of data.
+
+The typical use is to read or write these data sources to/from
+[GeoPandas](https://github.com/geopandas/geopandas) `GeoDataFrames`. Because
+the geometry column is optional, reading or writing only non-spatial data is
+also possible. Hence, GeoPackage attribute tables, DBF files, or CSV files are
+also supported.
+
+Pyogrio is fast because it uses pre-compiled bindings for GDAL/OGR to read and
+write the data records in bulk. This approach avoids multiple steps of
+converting to and from Python data types within Python, so performance becomes
+primarily limited by the underlying I/O speed of data source drivers in
+GDAL/OGR.
We have seen \>5-10x speedups reading files and \>5-20x speedups writing files
-compared to using non-vectorized approaches (Fiona and current I/O support in
-GeoPandas).
-
-You can read these data sources into
-`GeoDataFrames`, read just the non-geometry columns into Pandas `DataFrames`,
-or even read non-spatial data sources that exist alongside vector data sources,
-such as tables in a ESRI File Geodatabase, or antiquated DBF files.
-
-Pyogrio also enables you to write `GeoDataFrames` to at least a few different
-OGR vector data source formats.
-
-```{warning}
-Pyogrio is still at an early version and the API is subject to substantial change.
-```
+compared to using row-per-row approaches (e.g. Fiona).
```{toctree}
---
=====================================
environment-dev.yml
=====================================
@@ -14,5 +14,5 @@ dependencies:
- cython
- pre-commit
- pytest
- - ruff==0.5.2
+ - ruff==0.11.5
- versioneer
=====================================
pyogrio/__init__.py
=====================================
@@ -32,24 +32,24 @@ __version__ = get_versions()["version"]
del get_versions
__all__ = [
- "list_drivers",
+ "__gdal_geos_version__",
+ "__gdal_version__",
+ "__gdal_version_string__",
+ "__version__",
"detect_write_driver",
- "list_layers",
- "read_bounds",
- "read_info",
- "set_gdal_config_options",
"get_gdal_config_option",
"get_gdal_data_path",
+ "list_drivers",
+ "list_layers",
"open_arrow",
"read_arrow",
+ "read_bounds",
"read_dataframe",
+ "read_info",
+ "set_gdal_config_options",
"vsi_listtree",
"vsi_rmtree",
"vsi_unlink",
"write_arrow",
"write_dataframe",
- "__gdal_version__",
- "__gdal_version_string__",
- "__gdal_geos_version__",
- "__version__",
]
=====================================
pyogrio/_compat.py
=====================================
@@ -33,15 +33,23 @@ HAS_ARROW_API = __gdal_version__ >= (3, 6, 0)
HAS_ARROW_WRITE_API = __gdal_version__ >= (3, 8, 0)
HAS_PYARROW = pyarrow is not None
HAS_PYPROJ = pyproj is not None
+PYARROW_GE_19 = pyarrow is not None and Version(pyarrow.__version__) >= Version(
+ "19.0.0"
+)
HAS_GEOPANDAS = geopandas is not None
PANDAS_GE_15 = pandas is not None and Version(pandas.__version__) >= Version("1.5.0")
PANDAS_GE_20 = pandas is not None and Version(pandas.__version__) >= Version("2.0.0")
PANDAS_GE_22 = pandas is not None and Version(pandas.__version__) >= Version("2.2.0")
+PANDAS_GE_30 = pandas is not None and Version(pandas.__version__) >= Version("3.0.0dev")
+GDAL_GE_352 = __gdal_version__ >= (3, 5, 2)
+GDAL_GE_37 = __gdal_version__ >= (3, 7, 0)
GDAL_GE_38 = __gdal_version__ >= (3, 8, 0)
+GDAL_GE_311 = __gdal_version__ >= (3, 11, 0)
HAS_GDAL_GEOS = __gdal_geos_version__ is not None
HAS_SHAPELY = shapely is not None and Version(shapely.__version__) >= Version("2.0.0")
+SHAPELY_GE_21 = shapely is not None and Version(shapely.__version__) >= Version("2.1.0")
=====================================
pyogrio/_err.pxd
=====================================
@@ -1,4 +1,9 @@
-cdef object exc_check()
-cdef int exc_wrap_int(int retval) except -1
-cdef int exc_wrap_ogrerr(int retval) except -1
-cdef void *exc_wrap_pointer(void *ptr) except NULL
+cdef object check_last_error()
+cdef int check_int(int retval) except -1
+cdef void *check_pointer(void *ptr) except NULL
+
+cdef class ErrorHandler:
+ cdef object error_stack
+ cdef int check_int(self, int retval, bint squash_errors) except -1
+ cdef void *check_pointer(self, void *ptr, bint squash_errors) except NULL
+ cdef void _handle_error_stack(self, bint squash_errors)
=====================================
pyogrio/_err.pyx
=====================================
@@ -1,25 +1,26 @@
-# ported from fiona::_err.pyx
-from enum import IntEnum
-import warnings
+"""Error handling code for GDAL/OGR.
-from pyogrio._ogr cimport (
- CE_None, CE_Debug, CE_Warning, CE_Failure, CE_Fatal, CPLErrorReset,
- CPLGetLastErrorType, CPLGetLastErrorNo, CPLGetLastErrorMsg, OGRErr,
- CPLErr, CPLErrorHandler, CPLDefaultErrorHandler, CPLPushErrorHandler)
+Ported from fiona::_err.pyx
+"""
+import contextlib
+import warnings
+from contextvars import ContextVar
+from itertools import zip_longest
-# CPL Error types as an enum.
-class GDALError(IntEnum):
- none = CE_None
- debug = CE_Debug
- warning = CE_Warning
- failure = CE_Failure
- fatal = CE_Fatal
+from pyogrio._ogr cimport (
+ CE_Warning, CE_Failure, CE_Fatal, CPLErrorReset,
+ CPLGetLastErrorType, CPLGetLastErrorNo, CPLGetLastErrorMsg,
+ OGRERR_NONE, CPLErr, CPLErrorHandler, CPLDefaultErrorHandler,
+ CPLPopErrorHandler, CPLPushErrorHandler)
+_ERROR_STACK = ContextVar("error_stack")
+_ERROR_STACK.set([])
class CPLE_BaseError(Exception):
"""Base CPL error class.
+
For internal use within Cython only.
"""
@@ -103,14 +104,25 @@ class CPLE_AWSSignatureDoesNotMatchError(CPLE_BaseError):
pass
+class CPLE_AWSError(CPLE_BaseError):
+ pass
+
+
class NullPointerError(CPLE_BaseError):
"""
- Returned from exc_wrap_pointer when a NULL pointer is passed, but no GDAL
+ Returned from check_pointer when a NULL pointer is passed, but no GDAL
error was raised.
"""
pass
+class CPLError(CPLE_BaseError):
+ """
+ Returned from check_int when a error code is returned, but no GDAL
+ error was set.
+ """
+ pass
+
# Map of GDAL error numbers to the Python exceptions.
exception_map = {
@@ -132,95 +144,114 @@ exception_map = {
13: CPLE_AWSObjectNotFoundError,
14: CPLE_AWSAccessDeniedError,
15: CPLE_AWSInvalidCredentialsError,
- 16: CPLE_AWSSignatureDoesNotMatchError
+ 16: CPLE_AWSSignatureDoesNotMatchError,
+ 17: CPLE_AWSError
}
-cdef inline object exc_check():
- """Checks GDAL error stack for fatal or non-fatal errors
+cdef inline object check_last_error():
+ """Checks if the last GDAL error was a fatal or non-fatal error.
+
+ When a non-fatal error is found, an appropriate exception is raised.
+
+ When a fatal error is found, SystemExit is called.
+
Returns
-------
An Exception, SystemExit, or None
"""
- cdef const char *msg_c = NULL
-
err_type = CPLGetLastErrorType()
err_no = CPLGetLastErrorNo()
- err_msg = CPLGetLastErrorMsg()
+ err_msg = clean_error_message(CPLGetLastErrorMsg())
+ if err_msg == "":
+ err_msg = "No error message."
+
+ if err_type == CE_Failure:
+ CPLErrorReset()
+ return exception_map.get(
+ err_no, CPLE_BaseError)(err_type, err_no, err_msg)
+
+ if err_type == CE_Fatal:
+ return SystemExit("Fatal error: {0}".format((err_type, err_no, err_msg)))
- if err_msg == NULL:
- msg = "No error message."
- else:
- # Reformat messages.
- msg_b = err_msg
+cdef clean_error_message(const char* err_msg):
+ """Cleans up error messages from GDAL.
+
+ Parameters
+ ----------
+ err_msg : const char*
+ The error message to clean up.
+
+ Returns
+ -------
+ str
+ The cleaned up error message or empty string
+ """
+ if err_msg != NULL:
+ # Reformat message.
+ msg_b = err_msg
try:
- msg = msg_b.decode('utf-8')
+ msg = msg_b.decode("utf-8")
msg = msg.replace("`", "'")
msg = msg.replace("\n", " ")
except UnicodeDecodeError as exc:
- msg = f"Could not decode error message to UTF-8. Raw error: {msg_b}"
+ msg = f"Could not decode error message to UTF-8. Raw error: {msg_b}"
- if err_type == 3:
- CPLErrorReset()
- return exception_map.get(
- err_no, CPLE_BaseError)(err_type, err_no, msg)
+ else:
+ msg = ""
- if err_type == 4:
- return SystemExit("Fatal error: {0}".format((err_type, err_no, msg)))
+ return msg
- else:
- return
+cdef void *check_pointer(void *ptr) except NULL:
+ """Check the pointer returned by a GDAL/OGR function.
-cdef void *exc_wrap_pointer(void *ptr) except NULL:
- """Wrap a GDAL/OGR function that returns GDALDatasetH etc (void *)
- Raises an exception if a non-fatal error has be set or if pointer is NULL.
+ If `ptr` is `NULL`, an exception inheriting from CPLE_BaseError is raised.
+ When the last error registered by GDAL/OGR was a non-fatal error, the
+ exception raised will be customized appropriately. Otherwise a
+ NullPointerError is raised.
"""
if ptr == NULL:
- exc = exc_check()
+ exc = check_last_error()
if exc:
raise exc
else:
# null pointer was passed, but no error message from GDAL
raise NullPointerError(-1, -1, "NULL pointer error")
+
return ptr
-cdef int exc_wrap_int(int err) except -1:
- """Wrap a GDAL/OGR function that returns CPLErr or OGRErr (int)
- Raises an exception if a non-fatal error has be set.
+cdef int check_int(int err) except -1:
+ """Check the CPLErr (int) value returned by a GDAL/OGR function.
- Copied from Fiona (_err.pyx).
+ If `err` is not OGRERR_NONE, an exception inheriting from CPLE_BaseError is raised.
+ When the last error registered by GDAL/OGR was a non-fatal error, the
+ exception raised will be customized appropriately. Otherwise a CPLError is
+ raised.
"""
- if err:
- exc = exc_check()
+ if err != OGRERR_NONE:
+ exc = check_last_error()
if exc:
raise exc
else:
# no error message from GDAL
- raise CPLE_BaseError(-1, -1, "Unspecified OGR / GDAL error")
- return err
-
-
-cdef int exc_wrap_ogrerr(int err) except -1:
- """Wrap a function that returns OGRErr (int) but does not use the
- CPL error stack.
-
- Adapted from Fiona (_err.pyx).
- """
- if err != 0:
- raise CPLE_BaseError(3, err, f"OGR Error code {err}")
+ raise CPLError(-1, -1, "Unspecified OGR / GDAL error")
return err
-cdef void error_handler(CPLErr err_class, int err_no, const char* err_msg) nogil:
+cdef void error_handler(
+ CPLErr err_class, int err_no, const char* err_msg
+) noexcept nogil:
"""Custom CPL error handler to match the Python behaviour.
- Generally we want to suppress error printing to stderr (behaviour of the
- default GDAL error handler) because we already raise a Python exception
- that includes the error message.
+ For non-fatal errors (CE_Failure), error printing to stderr (behaviour of
+ the default GDAL error handler) is suppressed, because we already raise a
+ Python exception that includes the error message.
+
+ Warnings are converted to Python warnings.
"""
if err_class == CE_Fatal:
# If the error class is CE_Fatal, we want to have a message issued
@@ -229,16 +260,14 @@ cdef void error_handler(CPLErr err_class, int err_no, const char* err_msg) nogil
CPLDefaultErrorHandler(err_class, err_no, err_msg)
return
- elif err_class == CE_Failure:
+ if err_class == CE_Failure:
# For Failures, do nothing as those are explicitly caught
# with error return codes and translated into Python exceptions
return
- elif err_class == CE_Warning:
+ if err_class == CE_Warning:
with gil:
- msg_b = err_msg
- msg = msg_b.decode('utf-8')
- warnings.warn(msg, RuntimeWarning)
+ warnings.warn(clean_error_message(err_msg), RuntimeWarning)
return
# Fall back to the default handler for non-failure messages since
@@ -248,3 +277,165 @@ cdef void error_handler(CPLErr err_class, int err_no, const char* err_msg) nogil
def _register_error_handler():
CPLPushErrorHandler(<CPLErrorHandler>error_handler)
+
+
+cdef class ErrorHandler:
+
+ def __init__(self, error_stack=None):
+ self.error_stack = error_stack or {}
+
+ cdef int check_int(self, int err, bint squash_errors) except -1:
+ """Check the CPLErr (int) value returned by a GDAL/OGR function.
+
+ If `err` is not OGRERR_NONE, an exception inheriting from CPLE_BaseError is
+ raised.
+ When a non-fatal GDAL/OGR error was captured in the error stack, the
+ exception raised will be customized appropriately. Otherwise, a
+ CPLError is raised.
+
+ Parameters
+ ----------
+ err : int
+ The CPLErr returned by a GDAL/OGR function.
+ squash_errors : bool
+ True to squash all errors captured to one error with the exception type of
+ the last error and all error messages concatenated.
+
+ Returns
+ -------
+ int
+ The `err` input parameter if it is OGRERR_NONE. Otherwise an exception is
+ raised.
+
+ """
+ if err != OGRERR_NONE:
+ if self.error_stack.get():
+ self._handle_error_stack(squash_errors)
+ else:
+ raise CPLError(CE_Failure, err, "Unspecified OGR / GDAL error")
+
+ return err
+
+ cdef void *check_pointer(self, void *ptr, bint squash_errors) except NULL:
+ """Check the pointer returned by a GDAL/OGR function.
+
+ If `ptr` is `NULL`, an exception inheriting from CPLE_BaseError is
+ raised.
+ When a non-fatal GDAL/OGR error was captured in the error stack, the
+ exception raised will be customized appropriately. Otherwise, a
+ NullPointerError is raised.
+
+ Parameters
+ ----------
+ ptr : pointer
+ The pointer returned by a GDAL/OGR function.
+ squash_errors : bool
+ True to squash all errors captured to one error with the exception type of
+ the last error and all error messages concatenated.
+
+ Returns
+ -------
+ pointer
+ The `ptr` input parameter if it is not `NULL`. Otherwise an exception is
+ raised.
+
+ """
+ if ptr == NULL:
+ if self.error_stack.get():
+ self._handle_error_stack(squash_errors)
+ else:
+ raise NullPointerError(-1, -1, "NULL pointer error")
+
+ return ptr
+
+ cdef void _handle_error_stack(self, bint squash_errors):
+ """Handle the errors in `error_stack`."""
+ stack = self.error_stack.get()
+ for error, cause in zip_longest(stack[::-1], stack[::-1][1:]):
+ if error is not None and cause is not None:
+ error.__cause__ = cause
+
+ last = stack.pop()
+ if last is not None:
+ if squash_errors:
+ # Concatenate all error messages, and raise a single exception
+ errmsg = str(last)
+ inner = last.__cause__
+ while inner is not None:
+ errmsg = f"{errmsg}; {inner}"
+ inner = inner.__cause__
+
+ if errmsg == "":
+ errmsg = "No error message."
+
+ raise type(last)(-1, -1, errmsg)
+
+ raise last
+
+
+cdef void stacking_error_handler(
+ CPLErr err_class,
+ int err_no,
+ const char* err_msg
+) noexcept nogil:
+ """Custom CPL error handler that adds non-fatal errors to a stack.
+
+ All non-fatal errors (CE_Failure) are not printed to stderr (behaviour
+ of the default GDAL error handler), but they are converted to python
+ exceptions and added to a stack, so they can be dealt with afterwards.
+
+ Warnings are converted to Python warnings.
+ """
+ if err_class == CE_Fatal:
+ # If the error class is CE_Fatal, we want to have a message issued
+ # because the CPL support code does an abort() before any exception
+ # can be generated
+ CPLDefaultErrorHandler(err_class, err_no, err_msg)
+ return
+
+ if err_class == CE_Failure:
+ # For Failures, add them to the error exception stack
+ with gil:
+ stack = _ERROR_STACK.get()
+ stack.append(
+ exception_map.get(err_no, CPLE_BaseError)(
+ err_class, err_no, clean_error_message(err_msg)
+ ),
+ )
+ _ERROR_STACK.set(stack)
+
+ return
+
+ if err_class == CE_Warning:
+ with gil:
+ warnings.warn(clean_error_message(err_msg), RuntimeWarning)
+ return
+
+ # Fall back to the default handler for non-failure messages since
+ # they won't be translated into exceptions.
+ CPLDefaultErrorHandler(err_class, err_no, err_msg)
+
+
+ at contextlib.contextmanager
+def capture_errors():
+ """A context manager that captures all GDAL non-fatal errors occuring.
+
+ It adds all errors to a single stack, so it assumes that no more than one
+ GDAL function is called.
+
+ Yields an ErrorHandler object that can be used to handle the errors
+ if any were captured.
+ """
+ CPLErrorReset()
+ _ERROR_STACK.set([])
+
+ # stacking_error_handler records GDAL errors in the order they occur and
+ # converts them to exceptions.
+ CPLPushErrorHandler(<CPLErrorHandler>stacking_error_handler)
+
+ # Run code in the `with` block.
+ yield ErrorHandler(_ERROR_STACK)
+
+ CPLPopErrorHandler()
+ _ERROR_STACK.set([])
+ CPLErrorReset()
=====================================
pyogrio/_geometry.pxd
=====================================
@@ -1,4 +1,4 @@
from pyogrio._ogr cimport *
cdef str get_geometry_type(void *ogr_layer)
-cdef OGRwkbGeometryType get_geometry_type_code(str geometry_type) except *
\ No newline at end of file
+cdef OGRwkbGeometryType get_geometry_type_code(str geometry_type) except *
=====================================
pyogrio/_geometry.pyx
=====================================
@@ -8,63 +8,63 @@ from pyogrio.errors import DataLayerError, GeometryError
# Mapping of OGR integer geometry types to GeoJSON type names.
GEOMETRY_TYPES = {
- wkbUnknown: 'Unknown',
- wkbPoint: 'Point',
- wkbLineString: 'LineString',
- wkbPolygon: 'Polygon',
- wkbMultiPoint: 'MultiPoint',
- wkbMultiLineString: 'MultiLineString',
- wkbMultiPolygon: 'MultiPolygon',
- wkbGeometryCollection: 'GeometryCollection',
+ wkbUnknown: "Unknown",
+ wkbPoint: "Point",
+ wkbLineString: "LineString",
+ wkbPolygon: "Polygon",
+ wkbMultiPoint: "MultiPoint",
+ wkbMultiLineString: "MultiLineString",
+ wkbMultiPolygon: "MultiPolygon",
+ wkbGeometryCollection: "GeometryCollection",
wkbNone: None,
- wkbLinearRing: 'LinearRing',
+ wkbLinearRing: "LinearRing",
# WARNING: Measured types are not supported in GEOS and downstream uses
# these are stripped automatically to their corresponding 2D / 3D types
- wkbPointM: 'PointM',
- wkbLineStringM: 'Measured LineString',
- wkbPolygonM: 'Measured Polygon',
- wkbMultiPointM: 'Measured MultiPoint',
- wkbMultiLineStringM: 'Measured MultiLineString',
- wkbMultiPolygonM: 'Measured MultiPolygon',
- wkbGeometryCollectionM: 'Measured GeometryCollection',
- wkbPointZM: 'Measured 3D Point',
- wkbLineStringZM: 'Measured 3D LineString',
- wkbPolygonZM: 'Measured 3D Polygon',
- wkbMultiPointZM: 'Measured 3D MultiPoint',
- wkbMultiLineStringZM: 'Measured 3D MultiLineString',
- wkbMultiPolygonZM: 'Measured 3D MultiPolygon',
- wkbGeometryCollectionZM: 'Measured 3D GeometryCollection',
- wkbPoint25D: 'Point Z',
- wkbLineString25D: 'LineString Z',
- wkbPolygon25D: 'Polygon Z',
- wkbMultiPoint25D: 'MultiPoint Z',
- wkbMultiLineString25D: 'MultiLineString Z',
- wkbMultiPolygon25D: 'MultiPolygon Z',
- wkbGeometryCollection25D: 'GeometryCollection Z',
+ wkbPointM: "PointM",
+ wkbLineStringM: "Measured LineString",
+ wkbPolygonM: "Measured Polygon",
+ wkbMultiPointM: "Measured MultiPoint",
+ wkbMultiLineStringM: "Measured MultiLineString",
+ wkbMultiPolygonM: "Measured MultiPolygon",
+ wkbGeometryCollectionM: "Measured GeometryCollection",
+ wkbPointZM: "Measured 3D Point",
+ wkbLineStringZM: "Measured 3D LineString",
+ wkbPolygonZM: "Measured 3D Polygon",
+ wkbMultiPointZM: "Measured 3D MultiPoint",
+ wkbMultiLineStringZM: "Measured 3D MultiLineString",
+ wkbMultiPolygonZM: "Measured 3D MultiPolygon",
+ wkbGeometryCollectionZM: "Measured 3D GeometryCollection",
+ wkbPoint25D: "Point Z",
+ wkbLineString25D: "LineString Z",
+ wkbPolygon25D: "Polygon Z",
+ wkbMultiPoint25D: "MultiPoint Z",
+ wkbMultiLineString25D: "MultiLineString Z",
+ wkbMultiPolygon25D: "MultiPolygon Z",
+ wkbGeometryCollection25D: "GeometryCollection Z",
}
-GEOMETRY_TYPE_CODES = {v:k for k, v in GEOMETRY_TYPES.items()}
+GEOMETRY_TYPE_CODES = {v: k for k, v in GEOMETRY_TYPES.items()}
# add additional aliases from 2.5D format
GEOMETRY_TYPE_CODES.update({
- '2.5D Point': wkbPoint25D,
- '2.5D LineString': wkbLineString25D,
- '2.5D Polygon': wkbPolygon25D,
- '2.5D MultiPoint': wkbMultiPoint25D,
- '2.5D MultiLineString': wkbMultiLineString25D,
- '2.5D MultiPolygon': wkbMultiPolygon25D,
- '2.5D GeometryCollection': wkbGeometryCollection25D
+ "2.5D Point": wkbPoint25D,
+ "2.5D LineString": wkbLineString25D,
+ "2.5D Polygon": wkbPolygon25D,
+ "2.5D MultiPoint": wkbMultiPoint25D,
+ "2.5D MultiLineString": wkbMultiLineString25D,
+ "2.5D MultiPolygon": wkbMultiPolygon25D,
+ "2.5D GeometryCollection": wkbGeometryCollection25D
})
# 2.5D also represented using negative numbers not enumerated above
GEOMETRY_TYPES.update({
- -2147483647: 'Point Z',
- -2147483646: 'LineString Z',
- -2147483645: 'Polygon Z',
- -2147483644: 'MultiPoint Z',
- -2147483643: 'MultiLineString Z',
- -2147483642: 'MultiPolygon Z',
- -2147483641: 'GeometryCollection Z',
+ -2147483647: "Point Z",
+ -2147483646: "LineString Z",
+ -2147483645: "Polygon Z",
+ -2147483644: "MultiPoint Z",
+ -2147483643: "MultiLineString Z",
+ -2147483642: "MultiPolygon Z",
+ -2147483641: "GeometryCollection Z",
})
@@ -80,11 +80,11 @@ cdef str get_geometry_type(void *ogr_layer):
str
geometry type
"""
- cdef void *cogr_featuredef = NULL
+ cdef void *ogr_featuredef = NULL
cdef OGRwkbGeometryType ogr_type
try:
- ogr_featuredef = exc_wrap_pointer(OGR_L_GetLayerDefn(ogr_layer))
+ ogr_featuredef = check_pointer(OGR_L_GetLayerDefn(ogr_layer))
except NullPointerError:
raise DataLayerError("Could not get layer definition")
@@ -126,4 +126,4 @@ cdef OGRwkbGeometryType get_geometry_type_code(str geometry_type) except *:
if geometry_type not in GEOMETRY_TYPE_CODES:
raise GeometryError(f"Geometry type is not supported: {geometry_type}")
- return GEOMETRY_TYPE_CODES[geometry_type]
\ No newline at end of file
+ return GEOMETRY_TYPE_CODES[geometry_type]
=====================================
pyogrio/_io.pyx
=====================================
@@ -3,7 +3,6 @@
"""IO support for OGR vector data sources
"""
-
import contextlib
import datetime
import locale
@@ -26,11 +25,22 @@ from cpython.pycapsule cimport PyCapsule_New, PyCapsule_GetPointer
import numpy as np
from pyogrio._ogr cimport *
-from pyogrio._err cimport *
+from pyogrio._err cimport (
+ check_last_error, check_int, check_pointer, ErrorHandler
+)
from pyogrio._vsi cimport *
-from pyogrio._err import CPLE_BaseError, CPLE_NotSupportedError, NullPointerError
+from pyogrio._err import (
+ CPLE_AppDefinedError,
+ CPLE_BaseError,
+ CPLE_NotSupportedError,
+ CPLE_OpenFailedError,
+ NullPointerError,
+ capture_errors,
+)
from pyogrio._geometry cimport get_geometry_type, get_geometry_type_code
-from pyogrio.errors import CRSError, DataSourceError, DataLayerError, GeometryError, FieldError, FeatureError
+from pyogrio.errors import (
+ CRSError, DataSourceError, DataLayerError, GeometryError, FieldError, FeatureError
+)
log = logging.getLogger(__name__)
@@ -38,20 +48,20 @@ log = logging.getLogger(__name__)
# Mapping of OGR integer field types to Python field type names
# (index in array is the integer field type)
FIELD_TYPES = [
- 'int32', # OFTInteger, Simple 32bit integer
- None, # OFTIntegerList, List of 32bit integers, not supported
- 'float64', # OFTReal, Double Precision floating point
- None, # OFTRealList, List of doubles, not supported
- 'object', # OFTString, String of UTF-8 chars
- None, # OFTStringList, Array of strings, not supported
- None, # OFTWideString, deprecated, not supported
- None, # OFTWideStringList, deprecated, not supported
- 'object', # OFTBinary, Raw Binary data
- 'datetime64[D]', # OFTDate, Date
- None, # OFTTime, Time, NOTE: not directly supported in numpy
- 'datetime64[ms]',# OFTDateTime, Date and Time
- 'int64', # OFTInteger64, Single 64bit integer
- None # OFTInteger64List, List of 64bit integers, not supported
+ "int32", # OFTInteger, Simple 32bit integer
+ None, # OFTIntegerList, List of 32bit integers, not supported
+ "float64", # OFTReal, Double Precision floating point
+ None, # OFTRealList, List of doubles, not supported
+ "object", # OFTString, String of UTF-8 chars
+ None, # OFTStringList, Array of strings, not supported
+ None, # OFTWideString, deprecated, not supported
+ None, # OFTWideStringList, deprecated, not supported
+ "object", # OFTBinary, Raw Binary data
+ "datetime64[D]", # OFTDate, Date
+ None, # OFTTime, Time, NOTE: not directly supported in numpy
+ "datetime64[ms]", # OFTDateTime, Date and Time
+ "int64", # OFTInteger64, Single 64bit integer
+ None # OFTInteger64List, List of 64bit integers, not supported
]
FIELD_SUBTYPES = {
@@ -63,29 +73,29 @@ FIELD_SUBTYPES = {
# Mapping of numpy ndarray dtypes to (field type, subtype)
DTYPE_OGR_FIELD_TYPES = {
- 'int8': (OFTInteger, OFSTInt16),
- 'int16': (OFTInteger, OFSTInt16),
- 'int32': (OFTInteger, OFSTNone),
- 'int': (OFTInteger64, OFSTNone),
- 'int64': (OFTInteger64, OFSTNone),
+ "int8": (OFTInteger, OFSTInt16),
+ "int16": (OFTInteger, OFSTInt16),
+ "int32": (OFTInteger, OFSTNone),
+ "int": (OFTInteger64, OFSTNone),
+ "int64": (OFTInteger64, OFSTNone),
# unsigned ints have to be converted to ints; these are converted
# to the next largest integer size
- 'uint8': (OFTInteger, OFSTInt16),
- 'uint16': (OFTInteger, OFSTNone),
- 'uint32': (OFTInteger64, OFSTNone),
+ "uint8": (OFTInteger, OFSTInt16),
+ "uint16": (OFTInteger, OFSTNone),
+ "uint32": (OFTInteger64, OFSTNone),
# TODO: these might get truncated, check maximum value and raise error
- 'uint': (OFTInteger64, OFSTNone),
- 'uint64': (OFTInteger64, OFSTNone),
+ "uint": (OFTInteger64, OFSTNone),
+ "uint64": (OFTInteger64, OFSTNone),
# bool is handled as integer with boolean subtype
- 'bool': (OFTInteger, OFSTBoolean),
+ "bool": (OFTInteger, OFSTBoolean),
- 'float32': (OFTReal,OFSTFloat32),
- 'float': (OFTReal, OFSTNone),
- 'float64': (OFTReal, OFSTNone),
+ "float32": (OFTReal, OFSTFloat32),
+ "float": (OFTReal, OFSTNone),
+ "float64": (OFTReal, OFSTNone),
- 'datetime64[D]': (OFTDate, OFSTNone),
- 'datetime64': (OFTDateTime, OFSTNone),
+ "datetime64[D]": (OFTDate, OFSTNone),
+ "datetime64": (OFTDateTime, OFSTNone),
}
@@ -132,8 +142,8 @@ cdef char** dict_to_options(object values):
return NULL
for k, v in values.items():
- k = k.encode('UTF-8')
- v = v.encode('UTF-8')
+ k = k.encode("UTF-8")
+ v = v.encode("UTF-8")
options = CSLAddNameValue(options, <const char *>k, <const char *>v)
return options
@@ -160,7 +170,6 @@ cdef const char* override_threadlocal_config_option(str key, str value):
value_b = value.encode("UTF-8")
cdef const char* value_c = value_b
-
cdef const char *prev_value = CPLGetThreadLocalConfigOption(key_c, NULL)
if prev_value != NULL:
# strings returned from config options may be replaced via
@@ -185,7 +194,8 @@ cdef void* ogr_open(const char* path_c, int mode, char** options) except NULL:
options : char **, optional
dataset open options
"""
- cdef void* ogr_dataset = NULL
+ cdef void *ogr_dataset = NULL
+ cdef ErrorHandler errors
# Force linear approximations in all cases
OGRSetNonLinearGeometriesEnabledFlag(0)
@@ -196,27 +206,29 @@ cdef void* ogr_open(const char* path_c, int mode, char** options) except NULL:
else:
flags |= GDAL_OF_READONLY
-
try:
# WARNING: GDAL logs warnings about invalid open options to stderr
# instead of raising an error
- ogr_dataset = exc_wrap_pointer(
- GDALOpenEx(path_c, flags, NULL, <const char *const *>options, NULL)
- )
-
- return ogr_dataset
+ with capture_errors() as errors:
+ ogr_dataset = GDALOpenEx(
+ path_c, flags, NULL, <const char *const *>options, NULL
+ )
+ return errors.check_pointer(ogr_dataset, True)
except NullPointerError:
raise DataSourceError(
- "Failed to open dataset (mode={}): {}".format(mode, path_c.decode("utf-8"))
+ f"Failed to open dataset ({mode=}): {path_c.decode('utf-8')}"
) from None
except CPLE_BaseError as exc:
- if str(exc).endswith("a supported file format."):
+ if " a supported file format." in str(exc):
+ # In gdal 3.9, this error message was slightly changed, so we can only check
+ # on this part of the error message.
raise DataSourceError(
- f"{str(exc)} It might help to specify the correct driver explicitly by "
- "prefixing the file path with '<DRIVER>:', e.g. 'CSV:path'."
+ f"{str(exc)}; It might help to specify the correct driver explicitly "
+ "by prefixing the file path with '<DRIVER>:', e.g. 'CSV:path'."
) from None
+
raise DataSourceError(str(exc)) from None
@@ -227,7 +239,7 @@ cdef ogr_close(GDALDatasetH ogr_dataset):
if ogr_dataset != NULL:
IF CTE_GDAL_VERSION >= (3, 7, 0):
if GDALClose(ogr_dataset) != CE_None:
- return exc_check()
+ return check_last_error()
return
@@ -236,7 +248,7 @@ cdef ogr_close(GDALDatasetH ogr_dataset):
# GDAL will set an error if there was an error writing the data source
# on close
- return exc_check()
+ return check_last_error()
cdef OGRLayerH get_ogr_layer(GDALDatasetH ogr_dataset, layer) except NULL:
@@ -256,12 +268,12 @@ cdef OGRLayerH get_ogr_layer(GDALDatasetH ogr_dataset, layer) except NULL:
try:
if isinstance(layer, str):
- name_b = layer.encode('utf-8')
+ name_b = layer.encode("utf-8")
name_c = name_b
- ogr_layer = exc_wrap_pointer(GDALDatasetGetLayerByName(ogr_dataset, name_c))
+ ogr_layer = check_pointer(GDALDatasetGetLayerByName(ogr_dataset, name_c))
elif isinstance(layer, int):
- ogr_layer = exc_wrap_pointer(GDALDatasetGetLayer(ogr_dataset, layer))
+ ogr_layer = check_pointer(GDALDatasetGetLayer(ogr_dataset, layer))
# GDAL does not always raise exception messages in this case
except NullPointerError:
@@ -276,7 +288,7 @@ cdef OGRLayerH get_ogr_layer(GDALDatasetH ogr_dataset, layer) except NULL:
# Note: this returns NULL and does not need to be freed via
# GDALDatasetReleaseResultSet()
layer_name = get_string(OGR_L_GetName(ogr_layer))
- sql_b = f"SET interest_layers = {layer_name}".encode('utf-8')
+ sql_b = f"SET interest_layers = {layer_name}".encode("utf-8")
sql_c = sql_b
GDALDatasetExecuteSQL(ogr_dataset, sql_c, NULL, NULL)
@@ -284,7 +296,9 @@ cdef OGRLayerH get_ogr_layer(GDALDatasetH ogr_dataset, layer) except NULL:
return ogr_layer
-cdef OGRLayerH execute_sql(GDALDatasetH ogr_dataset, str sql, str sql_dialect=None) except NULL:
+cdef OGRLayerH execute_sql(
+ GDALDatasetH ogr_dataset, str sql, str sql_dialect=None
+) except NULL:
"""Execute an SQL statement on a dataset.
Parameters
@@ -301,14 +315,16 @@ cdef OGRLayerH execute_sql(GDALDatasetH ogr_dataset, str sql, str sql_dialect=No
"""
try:
- sql_b = sql.encode('utf-8')
+ sql_b = sql.encode("utf-8")
sql_c = sql_b
if sql_dialect is None:
- return exc_wrap_pointer(GDALDatasetExecuteSQL(ogr_dataset, sql_c, NULL, NULL))
+ return check_pointer(GDALDatasetExecuteSQL(ogr_dataset, sql_c, NULL, NULL))
- sql_dialect_b = sql_dialect.encode('utf-8')
+ sql_dialect_b = sql_dialect.encode("utf-8")
sql_dialect_c = sql_dialect_b
- return exc_wrap_pointer(GDALDatasetExecuteSQL(ogr_dataset, sql_c, NULL, sql_dialect_c))
+ return check_pointer(GDALDatasetExecuteSQL(
+ ogr_dataset, sql_c, NULL, sql_dialect_c)
+ )
# GDAL does not always raise exception messages in this case
except NullPointerError:
@@ -336,7 +352,7 @@ cdef str get_crs(OGRLayerH ogr_layer):
cdef char *ogr_wkt = NULL
try:
- ogr_crs = exc_wrap_pointer(OGR_L_GetSpatialRef(ogr_layer))
+ ogr_crs = check_pointer(OGR_L_GetSpatialRef(ogr_layer))
except NullPointerError:
# No coordinate system defined.
@@ -354,7 +370,7 @@ cdef str get_crs(OGRLayerH ogr_layer):
if authority_key != NULL and authority_val != NULL:
key = get_string(authority_key)
- if key == 'EPSG':
+ if key == "EPSG":
value = get_string(authority_val)
return f"EPSG:{value}"
@@ -383,7 +399,7 @@ cdef get_driver(OGRDataSourceH ogr_dataset):
cdef void *ogr_driver
try:
- ogr_driver = exc_wrap_pointer(GDALGetDatasetDriver(ogr_dataset))
+ ogr_driver = check_pointer(GDALGetDatasetDriver(ogr_dataset))
except NullPointerError:
raise DataLayerError(f"Could not detect driver of dataset") from None
@@ -426,7 +442,7 @@ cdef get_feature_count(OGRLayerH ogr_layer, int force):
feature_count = 0
while True:
try:
- ogr_feature = exc_wrap_pointer(OGR_L_GetNextFeature(ogr_layer))
+ ogr_feature = check_pointer(OGR_L_GetNextFeature(ogr_layer))
feature_count +=1
except NullPointerError:
@@ -443,7 +459,9 @@ cdef get_feature_count(OGRLayerH ogr_layer, int force):
if "failed to prepare SQL" in str(exc):
raise ValueError(f"Invalid SQL query: {str(exc)}") from None
- raise DataLayerError(f"Could not iterate over features: {str(exc)}") from None
+ raise DataLayerError(
+ f"Could not iterate over features: {str(exc)}"
+ ) from None
finally:
if ogr_feature != NULL:
@@ -469,13 +487,12 @@ cdef get_total_bounds(OGRLayerH ogr_layer, int force):
"""
cdef OGREnvelope ogr_envelope
- try:
- exc_wrap_ogrerr(OGR_L_GetExtent(ogr_layer, &ogr_envelope, force))
+
+ if OGR_L_GetExtent(ogr_layer, &ogr_envelope, force) == OGRERR_NONE:
bounds = (
ogr_envelope.MinX, ogr_envelope.MinY, ogr_envelope.MaxX, ogr_envelope.MaxY
)
-
- except CPLE_BaseError:
+ else:
bounds = None
return bounds
@@ -522,7 +539,7 @@ cdef get_metadata(GDALMajorObjectH obj):
if metadata != NULL:
return dict(
- metadata[i].decode('UTF-8').split('=', 1)
+ metadata[i].decode("UTF-8").split("=", 1)
for i in range(CSLCount(metadata))
)
@@ -580,7 +597,7 @@ cdef detect_encoding(OGRDataSourceH ogr_dataset, OGRLayerH ogr_layer):
if driver == "OSM":
# always set OSM data to UTF-8
- # per https://help.openstreetmap.org/questions/2172/what-encoding-does-openstreetmap-use
+ # per https://help.openstreetmap.org/questions/2172/what-encoding-does-openstreetmap-use # noqa: E501
return "UTF-8"
if driver in ("XLSX", "ODS"):
@@ -621,7 +638,7 @@ cdef get_fields(OGRLayerH ogr_layer, str encoding, use_arrow=False):
cdef const char *key_c
try:
- ogr_featuredef = exc_wrap_pointer(OGR_L_GetLayerDefn(ogr_layer))
+ ogr_featuredef = check_pointer(OGR_L_GetLayerDefn(ogr_layer))
except NullPointerError:
raise DataLayerError("Could not get layer definition") from None
@@ -632,16 +649,18 @@ cdef get_fields(OGRLayerH ogr_layer, str encoding, use_arrow=False):
field_count = OGR_FD_GetFieldCount(ogr_featuredef)
fields = np.empty(shape=(field_count, 4), dtype=object)
- fields_view = fields[:,:]
+ fields_view = fields[:, :]
skipped_fields = False
for i in range(field_count):
try:
- ogr_fielddef = exc_wrap_pointer(OGR_FD_GetFieldDefn(ogr_featuredef, i))
+ ogr_fielddef = check_pointer(OGR_FD_GetFieldDefn(ogr_featuredef, i))
except NullPointerError:
- raise FieldError(f"Could not get field definition for field at index {i}") from None
+ raise FieldError(
+ f"Could not get field definition for field at index {i}"
+ ) from None
except CPLE_BaseError as exc:
raise FieldError(str(exc))
@@ -662,10 +681,10 @@ cdef get_fields(OGRLayerH ogr_layer, str encoding, use_arrow=False):
# bool, int16, float32 dtypes
np_type = subtype
- fields_view[i,0] = i
- fields_view[i,1] = field_type
- fields_view[i,2] = field_name
- fields_view[i,3] = np_type
+ fields_view[i, 0] = i
+ fields_view[i, 1] = field_type
+ fields_view[i, 2] = field_name
+ fields_view[i, 3] = np_type
if skipped_fields:
# filter out skipped fields
@@ -693,18 +712,20 @@ cdef apply_where_filter(OGRLayerH ogr_layer, str where):
ValueError: if SQL query is not valid
"""
- where_b = where.encode('utf-8')
+ where_b = where.encode("utf-8")
where_c = where_b
err = OGR_L_SetAttributeFilter(ogr_layer, where_c)
# WARNING: GDAL does not raise this error for GPKG but instead only
# logs to stderr
if err != OGRERR_NONE:
try:
- exc_check()
+ check_last_error()
except CPLE_BaseError as exc:
raise ValueError(str(exc))
- raise ValueError(f"Invalid SQL query for layer '{OGR_L_GetName(ogr_layer)}': '{where}'")
+ raise ValueError(
+ f"Invalid SQL query for layer '{OGR_L_GetName(ogr_layer)}': '{where}'"
+ )
cdef apply_bbox_filter(OGRLayerH ogr_layer, bbox):
@@ -750,7 +771,9 @@ cdef apply_geometry_filter(OGRLayerH ogr_layer, wkb):
OGR_G_DestroyGeometry(ogr_geometry)
-cdef validate_feature_range(OGRLayerH ogr_layer, int skip_features=0, int max_features=0):
+cdef validate_feature_range(
+ OGRLayerH ogr_layer, int skip_features=0, int max_features=0
+):
"""Limit skip_features and max_features to bounds available for dataset.
This is typically performed after applying where and spatial filters, which
@@ -855,15 +878,15 @@ cdef process_fields(
if field_type in (OFTInteger, OFTInteger64, OFTReal):
# if a boolean or integer type, have to cast to float to hold
# NaN values
- if data.dtype.kind in ('b', 'i', 'u'):
+ if data.dtype.kind in ("b", "i", "u"):
field_data[j] = field_data[j].astype(np.float64)
field_data_view[j] = field_data[j][:]
field_data_view[j][i] = np.nan
else:
data[i] = np.nan
- elif field_type in ( OFTDate, OFTDateTime) and not datetime_as_string:
- data[i] = np.datetime64('NaT')
+ elif field_type in (OFTDate, OFTDateTime) and not datetime_as_string:
+ data[i] = np.datetime64("NaT")
else:
data[i] = None
@@ -880,7 +903,9 @@ cdef process_fields(
data[i] = OGR_F_GetFieldAsDouble(ogr_feature, field_index)
elif field_type == OFTString:
- value = get_string(OGR_F_GetFieldAsString(ogr_feature, field_index), encoding=encoding)
+ value = get_string(OGR_F_GetFieldAsString(
+ ogr_feature, field_index), encoding=encoding
+ )
data[i] = value
elif field_type == OFTBinary:
@@ -892,10 +917,21 @@ cdef process_fields(
if datetime_as_string:
# defer datetime parsing to user/ pandas layer
# Update to OGR_F_GetFieldAsISO8601DateTime when GDAL 3.7+ only
- data[i] = get_string(OGR_F_GetFieldAsString(ogr_feature, field_index), encoding=encoding)
+ data[i] = get_string(
+ OGR_F_GetFieldAsString(ogr_feature, field_index), encoding=encoding
+ )
else:
success = OGR_F_GetFieldAsDateTimeEx(
- ogr_feature, field_index, &year, &month, &day, &hour, &minute, &fsecond, &timezone)
+ ogr_feature,
+ field_index,
+ &year,
+ &month,
+ &day,
+ &hour,
+ &minute,
+ &fsecond,
+ &timezone,
+ )
ms, ss = math.modf(fsecond)
second = int(ss)
@@ -903,20 +939,22 @@ cdef process_fields(
microsecond = round(ms * 1000) * 1000
if not success:
- data[i] = np.datetime64('NaT')
+ data[i] = np.datetime64("NaT")
elif field_type == OFTDate:
data[i] = datetime.date(year, month, day).isoformat()
elif field_type == OFTDateTime:
- data[i] = datetime.datetime(year, month, day, hour, minute, second, microsecond).isoformat()
+ data[i] = datetime.datetime(
+ year, month, day, hour, minute, second, microsecond
+ ).isoformat()
@cython.boundscheck(False) # Deactivate bounds checking
@cython.wraparound(False) # Deactivate negative indexing.
cdef get_features(
OGRLayerH ogr_layer,
- object[:,:] fields,
+ object[:, :] fields,
encoding,
uint8_t read_geometry,
uint8_t force_2d,
@@ -944,20 +982,24 @@ cdef get_features(
fid_data = None
if read_geometry:
- geometries = np.empty(shape=(num_features, ), dtype='object')
+ geometries = np.empty(shape=(num_features, ), dtype="object")
geom_view = geometries[:]
else:
geometries = None
n_fields = fields.shape[0]
- field_indexes = fields[:,0]
- field_ogr_types = fields[:,1]
+ field_indexes = fields[:, 0]
+ field_ogr_types = fields[:, 1]
field_data = [
- np.empty(shape=(num_features, ),
- dtype = ("object" if datetime_as_string and
- fields[field_index,3].startswith("datetime") else fields[field_index,3])
+ np.empty(
+ shape=(num_features, ),
+ dtype = (
+ "object"
+ if datetime_as_string and fields[field_index, 3].startswith("datetime")
+ else fields[field_index, 3]
+ )
) for field_index in range(n_fields)
]
@@ -973,7 +1015,7 @@ cdef get_features(
break
try:
- ogr_feature = exc_wrap_pointer(OGR_L_GetNextFeature(ogr_layer))
+ ogr_feature = check_pointer(OGR_L_GetNextFeature(ogr_layer))
except NullPointerError:
# No more rows available, so stop reading
@@ -987,8 +1029,8 @@ cdef get_features(
"GDAL returned more records than expected based on the count of "
"records that may meet your combination of filters against this "
"dataset. Please open an issue on Github "
- "(https://github.com/geopandas/pyogrio/issues) to report encountering "
- "this error."
+ "(https://github.com/geopandas/pyogrio/issues) to report "
+ "encountering this error."
) from None
if return_fids:
@@ -1026,7 +1068,7 @@ cdef get_features(
cdef get_features_by_fid(
OGRLayerH ogr_layer,
int[:] fids,
- object[:,:] fields,
+ object[:, :] fields,
encoding,
uint8_t read_geometry,
uint8_t force_2d,
@@ -1044,20 +1086,24 @@ cdef get_features_by_fid(
OGR_L_ResetReading(ogr_layer)
if read_geometry:
- geometries = np.empty(shape=(count, ), dtype='object')
+ geometries = np.empty(shape=(count, ), dtype="object")
geom_view = geometries[:]
else:
geometries = None
n_fields = fields.shape[0]
- field_indexes = fields[:,0]
- field_ogr_types = fields[:,1]
+ field_indexes = fields[:, 0]
+ field_ogr_types = fields[:, 1]
field_data = [
- np.empty(shape=(count, ),
- dtype=("object" if datetime_as_string and fields[field_index,3].startswith("datetime")
- else fields[field_index,3]))
- for field_index in range(n_fields)
+ np.empty(
+ shape=(count, ),
+ dtype=(
+ "object"
+ if datetime_as_string and fields[field_index, 3].startswith("datetime")
+ else fields[field_index, 3]
+ )
+ ) for field_index in range(n_fields)
]
field_data_view = [field_data[field_index][:] for field_index in range(n_fields)]
@@ -1067,7 +1113,7 @@ cdef get_features_by_fid(
fid = fids[i]
try:
- ogr_feature = exc_wrap_pointer(OGR_L_GetFeature(ogr_layer, fid))
+ ogr_feature = check_pointer(OGR_L_GetFeature(ogr_layer, fid))
except NullPointerError:
raise FeatureError(f"Could not read feature with fid {fid}") from None
@@ -1087,20 +1133,15 @@ cdef get_features_by_fid(
OGR_F_Destroy(ogr_feature)
ogr_feature = NULL
-
return (geometries, field_data)
@cython.boundscheck(False) # Deactivate bounds checking
@cython.wraparound(False) # Deactivate negative indexing.
-cdef get_bounds(
- OGRLayerH ogr_layer,
- int skip_features,
- int num_features):
-
+cdef get_bounds(OGRLayerH ogr_layer, int skip_features, int num_features):
cdef OGRFeatureH ogr_feature = NULL
cdef OGRGeometryH ogr_geometry = NULL
- cdef OGREnvelope ogr_envelope # = NULL
+ cdef OGREnvelope ogr_envelope # = NULL
cdef int i
# make sure layer is read from beginning
@@ -1112,7 +1153,7 @@ cdef get_bounds(
fid_data = np.empty(shape=(num_features), dtype=np.int64)
fid_view = fid_data[:]
- bounds_data = np.empty(shape=(4, num_features), dtype='float64')
+ bounds_data = np.empty(shape=(4, num_features), dtype="float64")
bounds_view = bounds_data[:]
i = 0
@@ -1122,7 +1163,7 @@ cdef get_bounds(
break
try:
- ogr_feature = exc_wrap_pointer(OGR_L_GetNextFeature(ogr_layer))
+ ogr_feature = check_pointer(OGR_L_GetNextFeature(ogr_layer))
except NullPointerError:
# No more rows available, so stop reading
@@ -1133,7 +1174,8 @@ cdef get_bounds(
if i >= num_features:
raise FeatureError(
- "Reading more features than indicated by OGR_L_GetFeatureCount is not supported"
+ "Reading more features than indicated by OGR_L_GetFeatureCount is "
+ "not supported"
) from None
fid_view[i] = OGR_F_GetFID(ogr_feature)
@@ -1141,7 +1183,7 @@ cdef get_bounds(
ogr_geometry = OGR_F_GetGeometryRef(ogr_feature)
if ogr_geometry == NULL:
- bounds_view[:,i] = np.nan
+ bounds_view[:, i] = np.nan
else:
OGR_G_GetEnvelope(ogr_geometry, &ogr_envelope)
@@ -1181,8 +1223,8 @@ def ogr_read(
str sql=None,
str sql_dialect=None,
int return_fids=False,
- bint datetime_as_string=False
- ):
+ bint datetime_as_string=False,
+):
cdef int err = 0
cdef bint use_tmp_vsimem = isinstance(path_or_buffer, bytes)
@@ -1199,7 +1241,14 @@ def ogr_read(
cdef bint override_shape_encoding = False
if fids is not None:
- if where is not None or bbox is not None or mask is not None or sql is not None or skip_features or max_features:
+ if (
+ where is not None
+ or bbox is not None
+ or mask is not None
+ or sql is not None
+ or skip_features
+ or max_features
+ ):
raise ValueError(
"cannot set both 'fids' and any of 'where', 'bbox', 'mask', "
"'sql', 'skip_features' or 'max_features'"
@@ -1225,7 +1274,9 @@ def ogr_read(
raise ValueError("'max_features' must be >= 0")
try:
- path = read_buffer_to_vsimem(path_or_buffer) if use_tmp_vsimem else path_or_buffer
+ path = read_buffer_to_vsimem(
+ path_or_buffer
+ ) if use_tmp_vsimem else path_or_buffer
if encoding:
# for shapefiles, SHAPE_ENCODING must be set before opening the file
@@ -1234,10 +1285,12 @@ def ogr_read(
# (we do this for all data sources where encoding is set because
# we don't know the driver until after it is opened, which is too late)
override_shape_encoding = True
- prev_shape_encoding = override_threadlocal_config_option("SHAPE_ENCODING", encoding)
+ prev_shape_encoding = override_threadlocal_config_option(
+ "SHAPE_ENCODING", encoding
+ )
dataset_options = dict_to_options(dataset_kwargs)
- ogr_dataset = ogr_open(path.encode('UTF-8'), 0, dataset_options)
+ ogr_dataset = ogr_open(path.encode("UTF-8"), 0, dataset_options)
if sql is None:
if layer is None:
@@ -1252,9 +1305,14 @@ def ogr_read(
# or from the system locale
if encoding:
if get_driver(ogr_dataset) == "ESRI Shapefile":
- # NOTE: SHAPE_ENCODING is a configuration option whereas ENCODING is the dataset open option
+ # NOTE: SHAPE_ENCODING is a configuration option whereas ENCODING is the
+ # dataset open option
if "ENCODING" in dataset_kwargs:
- raise ValueError('cannot provide both encoding parameter and "ENCODING" option; use encoding parameter to specify correct encoding for data source')
+ raise ValueError(
+ 'cannot provide both encoding parameter and "ENCODING" option; '
+ "use encoding parameter to specify correct encoding for data "
+ "source"
+ )
# Because SHAPE_ENCODING is set above, GDAL will automatically
# decode shapefiles to UTF-8; ignore any encoding set by user
@@ -1268,11 +1326,11 @@ def ogr_read(
ignored_fields = []
if columns is not None:
# identify ignored fields first
- ignored_fields = list(set(fields[:,2]) - set(columns))
+ ignored_fields = list(set(fields[:, 2]) - set(columns))
# Fields are matched exactly by name, duplicates are dropped.
# Find index of each field into fields
- idx = np.intersect1d(fields[:,2], columns, return_indices=True)[1]
+ idx = np.intersect1d(fields[:, 2], columns, return_indices=True)[1]
fields = fields[idx, :]
if not read_geometry and bbox is None and mask is None:
@@ -1298,7 +1356,7 @@ def ogr_read(
encoding,
read_geometry=read_geometry and geometry_type is not None,
force_2d=force_2d,
- datetime_as_string=datetime_as_string
+ datetime_as_string=datetime_as_string,
)
# bypass reading fids since these should match fids used for read
@@ -1336,11 +1394,11 @@ def ogr_read(
)
meta = {
- 'crs': crs,
- 'encoding': encoding,
- 'fields': fields[:,2], # return only names
- 'dtypes':fields[:,3],
- 'geometry_type': geometry_type,
+ "crs": crs,
+ "encoding": encoding,
+ "fields": fields[:, 2],
+ "dtypes": fields[:, 3],
+ "geometry_type": geometry_type,
}
finally:
@@ -1376,7 +1434,7 @@ def ogr_read(
cdef void pycapsule_array_stream_deleter(object stream_capsule) noexcept:
cdef ArrowArrayStream* stream = <ArrowArrayStream*>PyCapsule_GetPointer(
- stream_capsule, 'arrow_array_stream'
+ stream_capsule, "arrow_array_stream"
)
# Do not invoke the deleter on a used/moved capsule
if stream.release != NULL:
@@ -1389,7 +1447,9 @@ cdef object alloc_c_stream(ArrowArrayStream** c_stream):
c_stream[0] = <ArrowArrayStream*> malloc(sizeof(ArrowArrayStream))
# Ensure the capsule destructor doesn't call a random release pointer
c_stream[0].release = NULL
- return PyCapsule_New(c_stream[0], 'arrow_array_stream', &pycapsule_array_stream_deleter)
+ return PyCapsule_New(
+ c_stream[0], "arrow_array_stream", &pycapsule_array_stream_deleter
+ )
class _ArrowStream:
@@ -1447,7 +1507,14 @@ def ogr_open_arrow(
raise ValueError("forcing 2D is not supported for Arrow")
if fids is not None:
- if where is not None or bbox is not None or mask is not None or sql is not None or skip_features or max_features:
+ if (
+ where is not None
+ or bbox is not None
+ or mask is not None
+ or sql is not None
+ or skip_features
+ or max_features
+ ):
raise ValueError(
"cannot set both 'fids' and any of 'where', 'bbox', 'mask', "
"'sql', 'skip_features', or 'max_features'"
@@ -1481,14 +1548,18 @@ def ogr_open_arrow(
reader = None
try:
- path = read_buffer_to_vsimem(path_or_buffer) if use_tmp_vsimem else path_or_buffer
+ path = read_buffer_to_vsimem(
+ path_or_buffer
+ ) if use_tmp_vsimem else path_or_buffer
if encoding:
override_shape_encoding = True
- prev_shape_encoding = override_threadlocal_config_option("SHAPE_ENCODING", encoding)
+ prev_shape_encoding = override_threadlocal_config_option(
+ "SHAPE_ENCODING", encoding
+ )
dataset_options = dict_to_options(dataset_kwargs)
- ogr_dataset = ogr_open(path.encode('UTF-8'), 0, dataset_options)
+ ogr_dataset = ogr_open(path.encode("UTF-8"), 0, dataset_options)
if sql is None:
if layer is None:
@@ -1504,12 +1575,19 @@ def ogr_open_arrow(
if encoding:
if get_driver(ogr_dataset) == "ESRI Shapefile":
if "ENCODING" in dataset_kwargs:
- raise ValueError('cannot provide both encoding parameter and "ENCODING" option; use encoding parameter to specify correct encoding for data source')
+ raise ValueError(
+ 'cannot provide both encoding parameter and "ENCODING" option; '
+ "use encoding parameter to specify correct encoding for data "
+ "source"
+ )
encoding = "UTF-8"
- elif encoding.replace('-','').upper() != 'UTF8':
- raise ValueError("non-UTF-8 encoding is not supported for Arrow; use the non-Arrow interface instead")
+ elif encoding.replace("-", "").upper() != "UTF8":
+ raise ValueError(
+ "non-UTF-8 encoding is not supported for Arrow; use the non-Arrow "
+ "interface instead"
+ )
else:
encoding = detect_encoding(ogr_dataset, ogr_layer)
@@ -1519,7 +1597,7 @@ def ogr_open_arrow(
ignored_fields = []
if columns is not None:
# Fields are matched exactly by name, duplicates are dropped.
- ignored_fields = list(set(fields[:,2]) - set(columns))
+ ignored_fields = list(set(fields[:, 2]) - set(columns))
if not read_geometry:
ignored_fields.append("OGR_GEOMETRY")
@@ -1528,13 +1606,14 @@ def ogr_open_arrow(
IF CTE_GDAL_VERSION < (3, 8, 3):
driver = get_driver(ogr_dataset)
- if driver in {'FlatGeobuf', 'GPKG'}:
+ if driver in {"FlatGeobuf", "GPKG"}:
ignored = set(ignored_fields)
for f in fields:
- if f[2] not in ignored and f[3] == 'bool':
+ if f[2] not in ignored and f[3] == "bool":
raise RuntimeError(
- "GDAL < 3.8.3 does not correctly read boolean data values using the "
- "Arrow API. Do not use read_arrow() / use_arrow=True for this dataset."
+ "GDAL < 3.8.3 does not correctly read boolean data values "
+ "using the Arrow API. Do not use read_arrow() / "
+ "use_arrow=True for this dataset."
)
geometry_type = get_geometry_type(ogr_layer)
@@ -1605,7 +1684,7 @@ def ogr_open_arrow(
options = CSLSetNameValue(
options,
"MAX_FEATURES_IN_BATCH",
- str(batch_size).encode('UTF-8')
+ str(batch_size).encode("UTF-8")
)
# Default to geoarrow metadata encoding
@@ -1613,7 +1692,7 @@ def ogr_open_arrow(
options = CSLSetNameValue(
options,
"GEOMETRY_METADATA_ENCODING",
- "GEOARROW".encode('UTF-8')
+ "GEOARROW".encode("UTF-8")
)
# make sure layer is read from beginning
@@ -1638,12 +1717,12 @@ def ogr_open_arrow(
reader = _ArrowStream(capsule)
meta = {
- 'crs': crs,
- 'encoding': encoding,
- 'fields': fields[:,2], # return only names
- 'geometry_type': geometry_type,
- 'geometry_name': geometry_name,
- 'fid_column': fid_column,
+ "crs": crs,
+ "encoding": encoding,
+ "fields": fields[:, 2],
+ "geometry_type": geometry_type,
+ "geometry_name": geometry_name,
+ "fid_column": fid_column,
}
# stream has to be consumed before the Dataset is closed
@@ -1695,7 +1774,8 @@ def ogr_read_bounds(
int max_features=0,
object where=None,
tuple bbox=None,
- object mask=None):
+ object mask=None,
+):
cdef int err = 0
cdef bint use_tmp_vsimem = isinstance(path_or_buffer, bytes)
@@ -1716,8 +1796,10 @@ def ogr_read_bounds(
raise ValueError("'max_features' must be >= 0")
try:
- path = read_buffer_to_vsimem(path_or_buffer) if use_tmp_vsimem else path_or_buffer
- ogr_dataset = ogr_open(path.encode('UTF-8'), 0, NULL)
+ path = read_buffer_to_vsimem(
+ path_or_buffer
+ ) if use_tmp_vsimem else path_or_buffer
+ ogr_dataset = ogr_open(path.encode("UTF-8"), 0, NULL)
if layer is None:
layer = get_default_layer(ogr_dataset)
@@ -1736,7 +1818,9 @@ def ogr_read_bounds(
apply_geometry_filter(ogr_layer, mask)
# Limit feature range to available range
- skip_features, num_features = validate_feature_range(ogr_layer, skip_features, max_features)
+ skip_features, num_features = validate_feature_range(
+ ogr_layer, skip_features, max_features
+ )
bounds = get_bounds(ogr_layer, skip_features, num_features)
@@ -1757,7 +1841,8 @@ def ogr_read_info(
object layer=None,
object encoding=None,
int force_feature_count=False,
- int force_total_bounds=False):
+ int force_total_bounds=False
+):
cdef bint use_tmp_vsimem = isinstance(path_or_buffer, bytes)
cdef const char *path_c = NULL
@@ -1768,14 +1853,18 @@ def ogr_read_info(
cdef bint override_shape_encoding = False
try:
- path = read_buffer_to_vsimem(path_or_buffer) if use_tmp_vsimem else path_or_buffer
+ path = read_buffer_to_vsimem(
+ path_or_buffer
+ ) if use_tmp_vsimem else path_or_buffer
if encoding:
override_shape_encoding = True
- prev_shape_encoding = override_threadlocal_config_option("SHAPE_ENCODING", encoding)
+ prev_shape_encoding = override_threadlocal_config_option(
+ "SHAPE_ENCODING", encoding
+ )
dataset_options = dict_to_options(dataset_kwargs)
- ogr_dataset = ogr_open(path.encode('UTF-8'), 0, dataset_options)
+ ogr_dataset = ogr_open(path.encode("UTF-8"), 0, dataset_options)
if layer is None:
layer = get_default_layer(ogr_dataset)
@@ -1792,8 +1881,8 @@ def ogr_read_info(
"layer_name": get_string(OGR_L_GetName(ogr_layer)),
"crs": get_crs(ogr_layer),
"encoding": encoding,
- "fields": fields[:,2], # return only names
- "dtypes": fields[:,3],
+ "fields": fields[:, 2],
+ "dtypes": fields[:, 3],
"fid_column": get_string(OGR_L_GetFIDColumn(ogr_layer)),
"geometry_name": get_string(OGR_L_GetGeometryColumn(ogr_layer)),
"geometry_type": get_geometry_type(ogr_layer),
@@ -1802,10 +1891,18 @@ def ogr_read_info(
"driver": get_driver(ogr_dataset),
"capabilities": {
"random_read": OGR_L_TestCapability(ogr_layer, OLCRandomRead) == 1,
- "fast_set_next_by_index": OGR_L_TestCapability(ogr_layer, OLCFastSetNextByIndex) == 1,
- "fast_spatial_filter": OGR_L_TestCapability(ogr_layer, OLCFastSpatialFilter) == 1,
- "fast_feature_count": OGR_L_TestCapability(ogr_layer, OLCFastFeatureCount) == 1,
- "fast_total_bounds": OGR_L_TestCapability(ogr_layer, OLCFastGetExtent) == 1,
+ "fast_set_next_by_index": OGR_L_TestCapability(
+ ogr_layer, OLCFastSetNextByIndex
+ ) == 1,
+ "fast_spatial_filter": OGR_L_TestCapability(
+ ogr_layer, OLCFastSpatialFilter
+ ) == 1,
+ "fast_feature_count": OGR_L_TestCapability(
+ ogr_layer, OLCFastFeatureCount
+ ) == 1,
+ "fast_total_bounds": OGR_L_TestCapability(
+ ogr_layer, OLCFastGetExtent
+ ) == 1,
},
"layer_metadata": get_metadata(ogr_layer),
"dataset_metadata": get_metadata(ogr_dataset),
@@ -1839,8 +1936,10 @@ def ogr_list_layers(object path_or_buffer):
cdef OGRDataSourceH ogr_dataset = NULL
try:
- path = read_buffer_to_vsimem(path_or_buffer) if use_tmp_vsimem else path_or_buffer
- ogr_dataset = ogr_open(path.encode('UTF-8'), 0, NULL)
+ path = (
+ read_buffer_to_vsimem(path_or_buffer) if use_tmp_vsimem else path_or_buffer
+ )
+ ogr_dataset = ogr_open(path.encode("UTF-8"), 0, NULL)
layers = get_layer_names(ogr_dataset)
finally:
@@ -1875,7 +1974,7 @@ cdef str get_default_layer(OGRDataSourceH ogr_dataset):
if len(layers) > 1:
dataset_name = os.path.basename(get_string(OGR_DS_GetName(ogr_dataset)))
- other_layer_names = ', '.join([f"'{l}'" for l in layers[1:, 0]])
+ other_layer_names = ", ".join([f"'{lyr}'" for lyr in layers[1:, 0]])
warnings.warn(
f"More than one layer found in '{dataset_name}': '{first_layer_name}' "
f"(default), {other_layer_names}. Specify layer parameter to avoid this "
@@ -1918,16 +2017,21 @@ cdef get_layer_names(OGRDataSourceH ogr_dataset):
# NOTE: all modes are write-only
# some data sources have multiple layers
-cdef void * ogr_create(const char* path_c, const char* driver_c, char** options) except NULL:
+cdef void * ogr_create(
+ const char* path_c, const char* driver_c, char** options
+) except NULL:
cdef void *ogr_driver = NULL
cdef OGRDataSourceH ogr_dataset = NULL
# Get the driver
try:
- ogr_driver = exc_wrap_pointer(GDALGetDriverByName(driver_c))
+ ogr_driver = check_pointer(GDALGetDriverByName(driver_c))
except NullPointerError:
- raise DataSourceError(f"Could not obtain driver: {driver_c.decode('utf-8')} (check that it was installed correctly into GDAL)")
+ raise DataSourceError(
+ f"Could not obtain driver: {driver_c.decode('utf-8')} "
+ "(check that it was installed correctly into GDAL)"
+ )
except CPLE_BaseError as exc:
raise DataSourceError(str(exc))
@@ -1944,13 +2048,20 @@ cdef void * ogr_create(const char* path_c, const char* driver_c, char** options)
# Create the dataset
try:
- ogr_dataset = exc_wrap_pointer(GDALCreate(ogr_driver, path_c, 0, 0, 0, GDT_Unknown, options))
+ ogr_dataset = check_pointer(
+ GDALCreate(ogr_driver, path_c, 0, 0, 0, GDT_Unknown, options)
+ )
except NullPointerError:
- raise DataSourceError(f"Failed to create dataset with driver: {path_c.decode('utf-8')} {driver_c.decode('utf-8')}") from None
+ raise DataSourceError(
+ f"Failed to create dataset with driver: {path_c.decode('utf-8')} "
+ f"{driver_c.decode('utf-8')}"
+ ) from None
except CPLE_NotSupportedError as exc:
- raise DataSourceError(f"Driver {driver_c.decode('utf-8')} does not support write functionality") from None
+ raise DataSourceError(
+ f"Driver {driver_c.decode('utf-8')} does not support write functionality"
+ ) from None
except CPLE_BaseError as exc:
raise DataSourceError(str(exc))
@@ -1962,14 +2073,16 @@ cdef void * create_crs(str crs) except NULL:
cdef char *crs_c = NULL
cdef void *ogr_crs = NULL
- crs_b = crs.encode('UTF-8')
+ crs_b = crs.encode("UTF-8")
crs_c = crs_b
try:
- ogr_crs = exc_wrap_pointer(OSRNewSpatialReference(NULL))
+ ogr_crs = check_pointer(OSRNewSpatialReference(NULL))
err = OSRSetFromUserInput(ogr_crs, crs_c)
if err:
- raise CRSError("Could not set CRS: {}".format(crs_c.decode('UTF-8'))) from None
+ raise CRSError(
+ "Could not set CRS: {}".format(crs_c.decode("UTF-8"))
+ ) from None
except CPLE_BaseError as exc:
OSRRelease(ogr_crs)
@@ -1996,7 +2109,7 @@ cdef infer_field_types(list dtypes):
field_types_view[i, 1] = field_subtype
# Determine field type from ndarray values
- elif dtype == np.dtype('O'):
+ elif dtype == np.dtype("O"):
# Object type is ambiguous: could be a string or binary data
# TODO: handle binary or other types
# for now fall back to string (same as Geopandas)
@@ -2018,7 +2131,9 @@ cdef infer_field_types(list dtypes):
field_types_view[i, 1] = field_subtype
else:
- raise NotImplementedError(f"field type is not supported {dtype.name} (field index: {i})")
+ raise NotImplementedError(
+ f"field type is not supported {dtype.name} (field index: {i})"
+ )
return field_types
@@ -2082,10 +2197,10 @@ cdef create_ogr_dataset_layer(
cdef OGRwkbGeometryType geometry_code
cdef int layer_idx = -1
- path_b = path.encode('UTF-8')
+ path_b = path.encode("UTF-8")
path_c = path_b
- driver_b = driver.encode('UTF-8')
+ driver_b = driver.encode("UTF-8")
driver_c = driver_b
# temporary in-memory dataset is always created from scratch
@@ -2097,7 +2212,10 @@ cdef create_ogr_dataset_layer(
# if shapefile, GeoJSON, or FlatGeobuf, always delete first
# for other types, check if we can create layers
# GPKG might be the only multi-layer writeable type. TODO: check this
- if driver in ('ESRI Shapefile', 'GeoJSON', 'GeoJSONSeq', 'FlatGeobuf') and path_exists:
+ if (
+ driver in ("ESRI Shapefile", "GeoJSON", "GeoJSONSeq", "FlatGeobuf")
+ and path_exists
+ ):
if not append:
os.unlink(path)
path_exists = False
@@ -2109,7 +2227,7 @@ cdef create_ogr_dataset_layer(
for i in range(GDALDatasetGetLayerCount(ogr_dataset)):
name = OGR_L_GetName(GDALDatasetGetLayer(ogr_dataset, i))
- if layer == name.decode('UTF-8'):
+ if layer == name.decode("UTF-8"):
layer_idx = i
break
@@ -2138,17 +2256,17 @@ cdef create_ogr_dataset_layer(
# the layer and all associated properties (CRS, field defs, etc)
create_layer = not (append and layer_exists)
- ### Create the layer
+ # Create the layer
if create_layer:
# Create the CRS
if crs is not None:
try:
ogr_crs = create_crs(crs)
- # force geographic CRS to use lon, lat order and ignore axis order specified by CRS, in order
- # to correctly write KML and GeoJSON coordinates in correct order
+ # force geographic CRS to use lon, lat order and ignore axis order
+ # specified by CRS, in order to correctly write KML and GeoJSON
+ # coordinates in correct order
OSRSetAxisMappingStrategy(ogr_crs, OAMS_TRADITIONAL_GIS_ORDER)
-
except Exception as exc:
if dataset_options != NULL:
CSLDestroy(dataset_options)
@@ -2161,41 +2279,47 @@ cdef create_ogr_dataset_layer(
# Setup other layer creation options
for k, v in layer_kwargs.items():
- k = k.encode('UTF-8')
- v = v.encode('UTF-8')
- layer_options = CSLAddNameValue(layer_options, <const char *>k, <const char *>v)
+ k = k.encode("UTF-8")
+ v = v.encode("UTF-8")
+ layer_options = CSLAddNameValue(
+ layer_options, <const char *>k, <const char *>v
+ )
- if driver == 'ESRI Shapefile':
+ if driver == "ESRI Shapefile":
# ENCODING option must be set for shapefiles to properly write *.cpg
# file containing the encoding; this is not a supported option for
# other drivers. This is done after setting general options above
# to override ENCODING if passed by the user as a layer option.
if encoding and "ENCODING" in layer_kwargs:
- raise ValueError('cannot provide both encoding parameter and "ENCODING" layer creation option; use the encoding parameter')
+ raise ValueError(
+ 'cannot provide both encoding parameter and "ENCODING" layer '
+ "creation option; use the encoding parameter"
+ )
# always write to UTF-8 if encoding is not set
encoding = encoding or "UTF-8"
- encoding_b = encoding.upper().encode('UTF-8')
+ encoding_b = encoding.upper().encode("UTF-8")
encoding_c = encoding_b
layer_options = CSLSetNameValue(layer_options, "ENCODING", encoding_c)
-
- ### Get geometry type
+ # Get geometry type
# TODO: this is brittle for 3D / ZM / M types
# TODO: fail on M / ZM types
geometry_code = get_geometry_type_code(geometry_type)
try:
if create_layer:
- layer_b = layer.encode('UTF-8')
+ layer_b = layer.encode("UTF-8")
layer_c = layer_b
- ogr_layer = exc_wrap_pointer(
- GDALDatasetCreateLayer(ogr_dataset, layer_c, ogr_crs,
- geometry_code, layer_options))
+ ogr_layer = check_pointer(
+ GDALDatasetCreateLayer(
+ ogr_dataset, layer_c, ogr_crs, geometry_code, layer_options
+ )
+ )
else:
- ogr_layer = exc_wrap_pointer(get_ogr_layer(ogr_dataset, layer))
+ ogr_layer = check_pointer(get_ogr_layer(ogr_dataset, layer))
# Set dataset and layer metadata
set_metadata(ogr_dataset, dataset_metadata)
@@ -2253,7 +2377,8 @@ def ogr_write(
cdef OGRGeometryH ogr_geometry_multi = NULL
cdef OGRFeatureDefnH ogr_featuredef = NULL
cdef OGRFieldDefnH ogr_fielddef = NULL
- cdef unsigned char *wkb_buffer = NULL
+ cdef const unsigned char *wkb_buffer = NULL
+ cdef unsigned int wkbtype = 0
cdef int supports_transactions = 0
cdef int err = 0
cdef int i = 0
@@ -2292,7 +2417,9 @@ def ogr_write(
raise ValueError("field_data and field_mask must be same length")
for i in range(0, len(field_mask)):
if field_mask[i] is not None and len(field_mask[i]) != num_records:
- raise ValueError("field_mask arrays must be same length as geometry array")
+ raise ValueError(
+ "field_mask arrays must be same length as geometry array"
+ )
else:
field_mask = [None] * num_fields
@@ -2311,19 +2438,20 @@ def ogr_write(
&ogr_dataset, &ogr_layer,
)
- if driver == 'ESRI Shapefile':
- # force encoding for remaining operations to be in UTF-8 (even if user
- # provides an encoding) because GDAL will automatically convert those to
- # the target encoding because ENCODING is set as a layer creation option
+ if driver == "ESRI Shapefile":
+ # force encoding for remaining operations to be in UTF-8 (even if
+ # user provides an encoding) because GDAL will automatically
+ # convert those to the target encoding because ENCODING is set as a
+ # layer creation option
encoding = "UTF-8"
else:
- # Now the dataset and layer have been created, we can properly determine the
- # encoding. It is derived from the user, from the dataset capabilities / type,
- # or from the system locale
+ # Now the dataset and layer have been created, we can properly
+ # determine the encoding. It is derived from the user, from the
+ # dataset capabilities / type, or from the system locale
encoding = encoding or detect_encoding(ogr_dataset, ogr_layer)
- ### Create the fields
+ # Create the fields
field_types = None
if num_fields > 0:
field_types = infer_field_types([field.dtype for field in field_data])
@@ -2334,9 +2462,9 @@ def ogr_write(
name_b = fields[i].encode(encoding)
try:
- ogr_fielddef = exc_wrap_pointer(OGR_Fld_Create(name_b, field_type))
+ ogr_fielddef = check_pointer(OGR_Fld_Create(name_b, field_type))
- # subtypes, see: https://gdal.org/development/rfc/rfc50_ogr_field_subtype.html
+ # subtypes, see: https://gdal.org/development/rfc/rfc50_ogr_field_subtype.html # noqa: E501
if field_subtype != OFSTNone:
OGR_Fld_SetSubType(ogr_fielddef, field_subtype)
@@ -2345,18 +2473,19 @@ def ogr_write(
# TODO: set precision
- exc_wrap_int(OGR_L_CreateField(ogr_layer, ogr_fielddef, 1))
+ check_int(OGR_L_CreateField(ogr_layer, ogr_fielddef, 1))
- except:
- raise FieldError(f"Error adding field '{fields[i]}' to layer") from None
+ except Exception:
+ raise FieldError(
+ f"Error adding field '{fields[i]}' to layer"
+ ) from None
finally:
if ogr_fielddef != NULL:
OGR_Fld_Destroy(ogr_fielddef)
ogr_fielddef = NULL
-
- ### Create the features
+ # Create the features
ogr_featuredef = OGR_L_GetLayerDefn(ogr_layer)
supports_transactions = OGR_L_TestCapability(ogr_layer, OLCTransactions)
@@ -2369,30 +2498,48 @@ def ogr_write(
if ogr_feature == NULL:
raise FeatureError(f"Could not create feature at index {i}") from None
- # create the geometry based on specific WKB type (there might be mixed types in geometries)
+ # create the geometry based on specific WKB type
+ # (there might be mixed types in geometries)
# TODO: geometry must not be null or errors
wkb = None if geometry is None else geometry[i]
if wkb is not None:
- wkbtype = <int>bytearray(wkb)[1]
- # may need to consider all 4 bytes: int.from_bytes(wkb[0][1:4], byteorder="little")
- # use "little" if the first byte == 1
+ wkb_buffer = wkb
+ if wkb_buffer[0] == 1:
+ # Little endian WKB type.
+ wkbtype = (
+ wkb_buffer[1] + (wkb_buffer[2] << 8) + (wkb_buffer[3] << 16) +
+ (<unsigned int>wkb_buffer[4] << 24)
+ )
+ else:
+ # Big endian WKB type.
+ wkbtype = (
+ (<unsigned int>(wkb_buffer[1]) << 24) + (wkb_buffer[2] << 16) +
+ (wkb_buffer[3] << 8) + wkb_buffer[4]
+ )
ogr_geometry = OGR_G_CreateGeometry(<OGRwkbGeometryType>wkbtype)
if ogr_geometry == NULL:
- raise GeometryError(f"Could not create geometry at index {i} for WKB type {wkbtype}") from None
+ raise GeometryError(
+ f"Could not create geometry at index {i} for WKB type {wkbtype}"
+ ) from None
# import the WKB
- wkb_buffer = wkb
err = OGR_G_ImportFromWkb(ogr_geometry, wkb_buffer, len(wkb))
if err:
- raise GeometryError(f"Could not create geometry from WKB at index {i}") from None
+ raise GeometryError(
+ f"Could not create geometry from WKB at index {i}"
+ ) from None
# Convert to multi type
if promote_to_multi:
if wkbtype in (wkbPoint, wkbPoint25D, wkbPointM, wkbPointZM):
ogr_geometry = OGR_G_ForceToMultiPoint(ogr_geometry)
- elif wkbtype in (wkbLineString, wkbLineString25D, wkbLineStringM, wkbLineStringZM):
+ elif wkbtype in (
+ wkbLineString, wkbLineString25D, wkbLineStringM, wkbLineStringZM
+ ):
ogr_geometry = OGR_G_ForceToMultiLineString(ogr_geometry)
- elif wkbtype in (wkbPolygon, wkbPolygon25D, wkbPolygonM, wkbPolygonZM):
+ elif wkbtype in (
+ wkbPolygon, wkbPolygon25D, wkbPolygonM, wkbPolygonZM
+ ):
ogr_geometry = OGR_G_ForceToMultiPolygon(ogr_geometry)
# Set the geometry on the feature
@@ -2400,7 +2547,9 @@ def ogr_write(
err = OGR_F_SetGeometryDirectly(ogr_feature, ogr_geometry)
ogr_geometry = NULL # to prevent cleanup after this point
if err:
- raise GeometryError(f"Could not set geometry for feature at index {i}") from None
+ raise GeometryError(
+ f"Could not set geometry for feature at index {i}"
+ ) from None
# Set field values
for field_idx in range(num_fields):
@@ -2427,7 +2576,10 @@ def ogr_write(
OGR_F_SetFieldString(ogr_feature, field_idx, value_b)
except AttributeError:
- raise ValueError(f"Could not encode value '{field_value}' in field '{fields[field_idx]}' to string")
+ raise ValueError(
+ f"Could not encode value '{field_value}' in field "
+ f"'{fields[field_idx]}' to string"
+ )
except Exception:
raise
@@ -2484,24 +2636,26 @@ def ogr_write(
)
else:
- raise NotImplementedError(f"OGR field type is not supported for writing: {field_type}")
-
+ raise NotImplementedError(
+ f"OGR field type is not supported for writing: {field_type}"
+ )
# Add feature to the layer
try:
- exc_wrap_int(OGR_L_CreateFeature(ogr_layer, ogr_feature))
+ check_int(OGR_L_CreateFeature(ogr_layer, ogr_feature))
except CPLE_BaseError as exc:
- raise FeatureError(f"Could not add feature to layer at index {i}: {exc}") from None
+ raise FeatureError(
+ f"Could not add feature to layer at index {i}: {exc}"
+ ) from None
OGR_F_Destroy(ogr_feature)
ogr_feature = NULL
-
if supports_transactions:
commit_transaction(ogr_dataset)
- log.info(f"Created {num_records:,} records" )
+ log.info(f"Created {num_records:,} records")
# close dataset to force driver to flush data
exc = ogr_close(ogr_dataset)
@@ -2514,7 +2668,7 @@ def ogr_write(
read_vsimem_to_buffer(path, path_or_fp)
finally:
- ### Final cleanup
+ # Final cleanup
# make sure that all objects allocated above are released if exceptions
# are raised, and the dataset is closed
if ogr_fielddef != NULL:
@@ -2579,8 +2733,14 @@ def ogr_write_arrow(
# only shapefile supports non-UTF encoding because ENCODING option is set
# during dataset creation and GDAL auto-translates from UTF-8 values to that
# encoding
- if encoding and encoding.replace('-','').upper() != 'UTF8' and driver != 'ESRI Shapefile':
- raise ValueError("non-UTF-8 encoding is not supported for Arrow; use the non-Arrow interface instead")
+ if (
+ encoding and encoding.replace("-", "").upper() != "UTF8"
+ and driver != "ESRI Shapefile"
+ ):
+ raise ValueError(
+ "non-UTF-8 encoding is not supported for Arrow; use the non-Arrow "
+ "interface instead"
+ )
if geometry_name:
opts = {"GEOMETRY_NAME": geometry_name}
@@ -2615,7 +2775,7 @@ def ogr_write_arrow(
break
if not OGR_L_WriteArrowBatch(ogr_layer, &schema, &array, options):
- exc = exc_check()
+ exc = check_last_error()
gdal_msg = f": {str(exc)}" if exc else "."
raise DataLayerError(
f"Error while writing batch to OGR layer{gdal_msg}"
@@ -2729,6 +2889,18 @@ cdef create_fields_from_arrow_schema(
IF CTE_GDAL_VERSION < (3, 8, 0):
raise RuntimeError("Need GDAL>=3.8 for Arrow write support")
+ # Some formats store the FID explicitly in a real column, e.g. GPKG.
+ # For those formats, OGR_L_GetFIDColumn will return the column name used
+ # for this and otherwise it returns "". GDAL typically also provides a
+ # layer creation option to overrule the column name to be used as FID
+ # column. When a column with the appropriate name is present in the data,
+ # GDAL will automatically use it for the FID. Reference:
+ # https://gdal.org/en/stable/tutorials/vector_api_tut.html#writing-to-ogr-using-the-arrow-c-data-interface
+ # Hence, the column should not be created as an ordinary field as well.
+ # Doing so triggers a bug in GDAL < 3.10.1:
+ # https://github.com/OSGeo/gdal/issues/11527#issuecomment-2556092722
+ fid_column = get_string(OGR_L_GetFIDColumn(destLayer))
+
# The schema object is a struct type where each child is a column.
cdef ArrowSchema* child
for i in range(schema.n_children):
@@ -2741,8 +2913,19 @@ cdef create_fields_from_arrow_schema(
if get_string(child.name) == geometry_name or is_arrow_geometry_field(child):
continue
+ # Don't create property for column that will already be used as FID
+ # Note: it seems that GDAL initially uses a case-sensitive check of the
+ # FID column, but then falls back to case-insensitive matching via
+ # the "ordinary" field being added. So, the check here needs to be
+ # case-sensitive so the column is still added as regular column if the
+ # casing isn't matched, otherwise the column is simply "lost".
+ # Note2: in the non-arrow path, the FID column is also treated
+ # case-insensitive, so this is consistent with that.
+ if fid_column != "" and get_string(child.name) == fid_column:
+ continue
+
if not OGR_L_CreateFieldFromArrowSchema(destLayer, child, options):
- exc = exc_check()
+ exc = check_last_error()
gdal_msg = f" ({str(exc)})" if exc else ""
raise FieldError(
f"Error while creating field from Arrow for field {i} with name "
=====================================
pyogrio/_ogr.pxd
=====================================
@@ -25,10 +25,10 @@ cdef extern from "cpl_error.h" nogil:
CE_Failure
CE_Fatal
- void CPLErrorReset()
- int CPLGetLastErrorNo()
+ void CPLErrorReset()
+ int CPLGetLastErrorNo()
const char* CPLGetLastErrorMsg()
- int CPLGetLastErrorType()
+ int CPLGetLastErrorType()
ctypedef void (*CPLErrorHandler)(CPLErr, int, const char*)
void CPLDefaultErrorHandler(CPLErr, int, const char *)
@@ -64,8 +64,13 @@ cdef extern from "cpl_vsi.h" nogil:
int VSIFFlushL(VSILFILE *fp)
int VSIUnlink(const char *path)
- VSILFILE *VSIFileFromMemBuffer(const char *path, void *data, vsi_l_offset data_len, int take_ownership)
- unsigned char *VSIGetMemFileBuffer(const char *path, vsi_l_offset *data_len, int take_ownership)
+ VSILFILE* VSIFileFromMemBuffer(const char *path,
+ void *data,
+ vsi_l_offset data_len,
+ int take_ownership)
+ unsigned char* VSIGetMemFileBuffer(const char *path,
+ vsi_l_offset *data_len,
+ int take_ownership)
int VSIMkdir(const char *path, long mode)
int VSIMkdirRecursive(const char *path, long mode)
@@ -201,14 +206,18 @@ cdef extern from "ogr_srs_api.h":
int OSRAutoIdentifyEPSG(OGRSpatialReferenceH srs)
OGRErr OSRExportToWkt(OGRSpatialReferenceH srs, char **params)
- const char* OSRGetAuthorityName(OGRSpatialReferenceH srs, const char *key)
- const char* OSRGetAuthorityCode(OGRSpatialReferenceH srs, const char *key)
+ const char* OSRGetAuthorityName(OGRSpatialReferenceH srs,
+ const char *key)
+ const char* OSRGetAuthorityCode(OGRSpatialReferenceH srs,
+ const char *key)
OGRErr OSRImportFromEPSG(OGRSpatialReferenceH srs, int code)
ctypedef enum OSRAxisMappingStrategy:
OAMS_TRADITIONAL_GIS_ORDER
-
- void OSRSetAxisMappingStrategy(OGRSpatialReferenceH hSRS, OSRAxisMappingStrategy)
- int OSRSetFromUserInput(OGRSpatialReferenceH srs, const char *pszDef)
+
+ void OSRSetAxisMappingStrategy(OGRSpatialReferenceH hSRS,
+ OSRAxisMappingStrategy)
+ int OSRSetFromUserInput(OGRSpatialReferenceH srs,
+ const char *pszDef)
void OSRSetPROJSearchPaths(const char *const *paths)
OGRSpatialReferenceH OSRNewSpatialReference(const char *wkt)
void OSRRelease(OGRSpatialReferenceH srs)
@@ -261,30 +270,47 @@ cdef extern from "ogr_api.h":
int64_t OGR_F_GetFID(OGRFeatureH feature)
OGRGeometryH OGR_F_GetGeometryRef(OGRFeatureH feature)
GByte* OGR_F_GetFieldAsBinary(OGRFeatureH feature, int n, int *s)
- int OGR_F_GetFieldAsDateTimeEx(OGRFeatureH feature, int n, int *y, int *m, int *d, int *h, int *m, float *s, int *z)
+ int OGR_F_GetFieldAsDateTimeEx(OGRFeatureH feature,
+ int n,
+ int *y,
+ int *m,
+ int *d,
+ int *h,
+ int *m,
+ float *s,
+ int *z)
double OGR_F_GetFieldAsDouble(OGRFeatureH feature, int n)
int OGR_F_GetFieldAsInteger(OGRFeatureH feature, int n)
int64_t OGR_F_GetFieldAsInteger64(OGRFeatureH feature, int n)
const char* OGR_F_GetFieldAsString(OGRFeatureH feature, int n)
int OGR_F_IsFieldSetAndNotNull(OGRFeatureH feature, int n)
- void OGR_F_SetFieldDateTime(OGRFeatureH feature, int n, int y, int m, int d, int hh, int mm, int ss, int tz)
- void OGR_F_SetFieldDouble(OGRFeatureH feature, int n, double value)
- void OGR_F_SetFieldInteger(OGRFeatureH feature, int n, int value)
- void OGR_F_SetFieldInteger64(OGRFeatureH feature, int n, int64_t value)
- void OGR_F_SetFieldString(OGRFeatureH feature, int n, char *value)
- void OGR_F_SetFieldBinary(OGRFeatureH feature, int n, int l, unsigned char *value)
- void OGR_F_SetFieldNull(OGRFeatureH feature, int n) # new in GDAL 2.2
- void OGR_F_SetFieldDateTimeEx(
- OGRFeatureH hFeat,
- int iField,
- int nYear,
- int nMonth,
- int nDay,
- int nHour,
- int nMinute,
- float fSecond,
- int nTZFlag)
- OGRErr OGR_F_SetGeometryDirectly(OGRFeatureH feature, OGRGeometryH geometry)
+
+ void OGR_F_SetFieldDateTime(OGRFeatureH feature,
+ int n,
+ int y,
+ int m,
+ int d,
+ int hh,
+ int mm,
+ int ss,
+ int tz)
+ void OGR_F_SetFieldDouble(OGRFeatureH feature, int n, double value)
+ void OGR_F_SetFieldInteger(OGRFeatureH feature, int n, int value)
+ void OGR_F_SetFieldInteger64(OGRFeatureH feature, int n, int64_t value)
+ void OGR_F_SetFieldString(OGRFeatureH feature, int n, char *value)
+ void OGR_F_SetFieldBinary(OGRFeatureH feature, int n, int l, unsigned char *value)
+ void OGR_F_SetFieldNull(OGRFeatureH feature, int n) # new in GDAL 2.2
+ void OGR_F_SetFieldDateTimeEx(OGRFeatureH hFeat,
+ int iField,
+ int nYear,
+ int nMonth,
+ int nDay,
+ int nHour,
+ int nMinute,
+ float fSecond,
+ int nTZFlag)
+
+ OGRErr OGR_F_SetGeometryDirectly(OGRFeatureH feature, OGRGeometryH geometry)
OGRFeatureDefnH OGR_FD_Create(const char *name)
int OGR_FD_GetFieldCount(OGRFeatureDefnH featuredefn)
@@ -298,20 +324,34 @@ cdef extern from "ogr_api.h":
OGRFieldSubType OGR_Fld_GetSubType(OGRFieldDefnH fielddefn)
int OGR_Fld_GetType(OGRFieldDefnH fielddefn)
int OGR_Fld_GetWidth(OGRFieldDefnH fielddefn)
- void OGR_Fld_Set(OGRFieldDefnH fielddefn, const char *name, int fieldtype, int width, int precision, int justification)
+ void OGR_Fld_Set(OGRFieldDefnH fielddefn,
+ const char *name,
+ int fieldtype,
+ int width,
+ int precision,
+ int justification)
void OGR_Fld_SetPrecision(OGRFieldDefnH fielddefn, int n)
void OGR_Fld_SetWidth(OGRFieldDefnH fielddefn, int n)
void OGR_Fld_SetSubType(OGRFieldDefnH fielddefn, OGRFieldSubType subtype)
OGRGeometryH OGR_G_CreateGeometry(int wkbtypecode)
- OGRErr OGR_G_CreateFromWkb(const void *bytes, OGRSpatialReferenceH srs, OGRGeometryH *geometry, int nbytes)
+ OGRErr OGR_G_CreateFromWkb(const void *bytes,
+ OGRSpatialReferenceH srs,
+ OGRGeometryH *geometry,
+ int nbytes)
void OGR_G_DestroyGeometry(OGRGeometryH geometry)
- void OGR_G_ExportToWkb(OGRGeometryH geometry, int endianness, unsigned char *buffer)
+ void OGR_G_ExportToWkb(OGRGeometryH geometry,
+ int endianness,
+ unsigned char *buffer)
void OGR_G_GetEnvelope(OGRGeometryH geometry, OGREnvelope* envelope)
OGRwkbGeometryType OGR_G_GetGeometryType(OGRGeometryH)
- OGRGeometryH OGR_G_GetLinearGeometry(OGRGeometryH hGeom, double dfMaxAngleStepSizeDegrees, char **papszOptions)
- OGRErr OGR_G_ImportFromWkb(OGRGeometryH geometry, const void *bytes, int nbytes)
+ OGRGeometryH OGR_G_GetLinearGeometry(OGRGeometryH hGeom,
+ double dfMaxAngleStepSizeDegrees,
+ char **papszOptions)
+ OGRErr OGR_G_ImportFromWkb(OGRGeometryH geometry,
+ const void *bytes,
+ int nbytes)
int OGR_G_IsMeasured(OGRGeometryH geometry)
void OGR_G_SetMeasured(OGRGeometryH geometry, int isMeasured)
int OGR_G_Is3D(OGRGeometryH geometry)
@@ -326,24 +366,34 @@ cdef extern from "ogr_api.h":
int OGR_GT_IsNonLinear(OGRwkbGeometryType eType)
OGRwkbGeometryType OGR_GT_SetModifier(OGRwkbGeometryType eType, int setZ, int setM)
- OGRErr OGR_L_CreateFeature(OGRLayerH layer, OGRFeatureH feature)
- OGRErr OGR_L_CreateField(OGRLayerH layer, OGRFieldDefnH fielddefn, int flexible)
- const char* OGR_L_GetName(OGRLayerH layer)
- const char* OGR_L_GetFIDColumn(OGRLayerH layer)
- const char* OGR_L_GetGeometryColumn(OGRLayerH layer)
- OGRErr OGR_L_GetExtent(OGRLayerH layer, OGREnvelope *psExtent, int bForce)
- OGRSpatialReferenceH OGR_L_GetSpatialRef(OGRLayerH layer)
- int OGR_L_TestCapability(OGRLayerH layer, const char *name)
- OGRFeatureDefnH OGR_L_GetLayerDefn(OGRLayerH layer)
- OGRFeatureH OGR_L_GetNextFeature(OGRLayerH layer)
- OGRFeatureH OGR_L_GetFeature(OGRLayerH layer, int nFeatureId)
- void OGR_L_ResetReading(OGRLayerH layer)
- OGRErr OGR_L_SetAttributeFilter(OGRLayerH hLayer, const char* pszQuery)
- OGRErr OGR_L_SetNextByIndex(OGRLayerH layer, int nIndex)
- int OGR_L_GetFeatureCount(OGRLayerH layer, int m)
- void OGR_L_SetSpatialFilterRect(OGRLayerH layer, double xmin, double ymin, double xmax, double ymax)
- void OGR_L_SetSpatialFilter(OGRLayerH layer, OGRGeometryH geometry)
- OGRErr OGR_L_SetIgnoredFields(OGRLayerH layer, const char** fields)
+ OGRErr OGR_L_CreateFeature(OGRLayerH layer, OGRFeatureH feature)
+ OGRErr OGR_L_CreateField(OGRLayerH layer,
+ OGRFieldDefnH fielddefn,
+ int flexible)
+ const char* OGR_L_GetName(OGRLayerH layer)
+ const char* OGR_L_GetFIDColumn(OGRLayerH layer)
+ const char* OGR_L_GetGeometryColumn(OGRLayerH layer)
+ OGRErr OGR_L_GetExtent(OGRLayerH layer,
+ OGREnvelope *psExtent,
+ int bForce)
+
+ OGRSpatialReferenceH OGR_L_GetSpatialRef(OGRLayerH layer)
+ int OGR_L_TestCapability(OGRLayerH layer, const char *name)
+ OGRFeatureDefnH OGR_L_GetLayerDefn(OGRLayerH layer)
+ OGRFeatureH OGR_L_GetNextFeature(OGRLayerH layer)
+ OGRFeatureH OGR_L_GetFeature(OGRLayerH layer, int nFeatureId)
+ void OGR_L_ResetReading(OGRLayerH layer)
+ OGRErr OGR_L_SetAttributeFilter(OGRLayerH hLayer,
+ const char* pszQuery)
+ OGRErr OGR_L_SetNextByIndex(OGRLayerH layer, int nIndex)
+ int OGR_L_GetFeatureCount(OGRLayerH layer, int m)
+ void OGR_L_SetSpatialFilterRect(OGRLayerH layer,
+ double xmin,
+ double ymin,
+ double xmax,
+ double ymax)
+ void OGR_L_SetSpatialFilter(OGRLayerH layer, OGRGeometryH geometry)
+ OGRErr OGR_L_SetIgnoredFields(OGRLayerH layer, const char** fields)
void OGRSetNonLinearGeometriesEnabledFlag(int bFlag)
int OGRGetNonLinearGeometriesEnabledFlag()
@@ -360,14 +410,23 @@ cdef extern from "ogr_api.h":
IF CTE_GDAL_VERSION >= (3, 6, 0):
cdef extern from "ogr_api.h":
- bint OGR_L_GetArrowStream(OGRLayerH hLayer, ArrowArrayStream *out_stream, char** papszOptions)
+ bint OGR_L_GetArrowStream(
+ OGRLayerH hLayer, ArrowArrayStream *out_stream, char** papszOptions
+ )
IF CTE_GDAL_VERSION >= (3, 8, 0):
cdef extern from "ogr_api.h":
- bint OGR_L_CreateFieldFromArrowSchema(OGRLayerH hLayer, ArrowSchema *schema, char **papszOptions)
- bint OGR_L_WriteArrowBatch(OGRLayerH hLayer, ArrowSchema *schema, ArrowArray *array, char **papszOptions)
+ bint OGR_L_CreateFieldFromArrowSchema(
+ OGRLayerH hLayer, ArrowSchema *schema, char **papszOptions
+ )
+ bint OGR_L_WriteArrowBatch(
+ OGRLayerH hLayer,
+ ArrowSchema *schema,
+ ArrowArray *array,
+ char **papszOptions,
+ )
cdef extern from "gdal.h":
ctypedef enum GDALDataType:
@@ -401,14 +460,14 @@ cdef extern from "gdal.h":
int nXSize,
int nYSize,
int nBands,
- GDALDataType eBandType,
- char ** papszOptions)
+ GDALDataType eBandType,
+ char ** papszOptions)
OGRLayerH GDALDatasetCreateLayer(GDALDatasetH ds,
- const char * pszName,
- OGRSpatialReferenceH hSpatialRef,
- int eType,
- char ** papszOptions)
+ const char * pszName,
+ OGRSpatialReferenceH hSpatialRef,
+ int eType,
+ char ** papszOptions)
int GDALDatasetDeleteLayer(GDALDatasetH hDS, int iLayer)
@@ -423,18 +482,21 @@ cdef extern from "gdal.h":
int GDALDatasetGetLayerCount(GDALDatasetH ds)
OGRLayerH GDALDatasetGetLayer(GDALDatasetH ds, int iLayer)
OGRLayerH GDALDatasetGetLayerByName(GDALDatasetH ds, char * pszName)
- OGRLayerH GDALDatasetExecuteSQL(
- GDALDatasetH ds,
- const char* pszStatement,
- OGRGeometryH hSpatialFilter,
- const char* pszDialect)
+ OGRLayerH GDALDatasetExecuteSQL(GDALDatasetH ds,
+ const char* pszStatement,
+ OGRGeometryH hSpatialFilter,
+ const char* pszDialect)
void GDALDatasetReleaseResultSet(GDALDatasetH, OGRLayerH)
OGRErr GDALDatasetStartTransaction(GDALDatasetH ds, int bForce)
OGRErr GDALDatasetCommitTransaction(GDALDatasetH ds)
OGRErr GDALDatasetRollbackTransaction(GDALDatasetH ds)
char** GDALGetMetadata(GDALMajorObjectH obj, const char *pszDomain)
- const char* GDALGetMetadataItem(GDALMajorObjectH obj, const char *pszName, const char *pszDomain)
- OGRErr GDALSetMetadata(GDALMajorObjectH obj, char **metadata, const char *pszDomain)
+ const char* GDALGetMetadataItem(GDALMajorObjectH obj,
+ const char *pszName,
+ const char *pszDomain)
+ OGRErr GDALSetMetadata(GDALMajorObjectH obj,
+ char **metadata,
+ const char *pszDomain)
const char* GDALVersionInfo(const char *pszRequest)
=====================================
pyogrio/_ogr.pyx
=====================================
@@ -3,13 +3,13 @@ import sys
from uuid import uuid4
import warnings
-from pyogrio._err cimport exc_wrap_int, exc_wrap_ogrerr, exc_wrap_pointer
+from pyogrio._err cimport check_pointer
from pyogrio._err import CPLE_BaseError, NullPointerError
from pyogrio.errors import DataSourceError
cdef get_string(const char *c_str, str encoding="UTF-8"):
- """Get Python string from a char *
+ """Get Python string from a char *.
IMPORTANT: the char * must still be freed by the caller.
@@ -61,7 +61,7 @@ def get_gdal_geos_version():
def set_gdal_config_options(dict options):
for name, value in options.items():
- name_b = name.encode('utf-8')
+ name_b = name.encode("utf-8")
name_c = name_b
# None is a special case; this is used to clear the previous value
@@ -71,16 +71,16 @@ def set_gdal_config_options(dict options):
# normalize bool to ON/OFF
if isinstance(value, bool):
- value_b = b'ON' if value else b'OFF'
+ value_b = b"ON" if value else b"OFF"
else:
- value_b = str(value).encode('utf-8')
+ value_b = str(value).encode("utf-8")
value_c = value_b
CPLSetConfigOption(<const char*>name_c, <const char*>value_c)
def get_gdal_config_option(str name):
- name_b = name.encode('utf-8')
+ name_b = name.encode("utf-8")
name_c = name_b
value = CPLGetConfigOption(<const char*>name_c, NULL)
@@ -90,9 +90,9 @@ def get_gdal_config_option(str name):
if value.isdigit():
return int(value)
- if value == b'ON':
+ if value == b"ON":
return True
- if value == b'OFF':
+ if value == b"OFF":
return False
str_value = get_string(value)
@@ -102,7 +102,7 @@ def get_gdal_config_option(str name):
def ogr_driver_supports_write(driver):
# check metadata for driver to see if it supports write
- if _get_driver_metadata_item(driver, "DCAP_CREATE") == 'YES':
+ if _get_driver_metadata_item(driver, "DCAP_CREATE") == "YES":
return True
return False
@@ -110,7 +110,7 @@ def ogr_driver_supports_write(driver):
def ogr_driver_supports_vsi(driver):
# check metadata for driver to see if it supports write
- if _get_driver_metadata_item(driver, "DCAP_VIRTUALIO") == 'YES':
+ if _get_driver_metadata_item(driver, "DCAP_VIRTUALIO") == "YES":
return True
return False
@@ -182,19 +182,21 @@ def has_proj_data():
"""
cdef OGRSpatialReferenceH srs = OSRNewSpatialReference(NULL)
- try:
- exc_wrap_ogrerr(exc_wrap_int(OSRImportFromEPSG(srs, 4326)))
- except CPLE_BaseError:
- return False
- else:
+ retval = OSRImportFromEPSG(srs, 4326)
+ if srs != NULL:
+ OSRRelease(srs)
+
+ if retval == OGRERR_NONE:
+ # Succesfull return, so PROJ data files are correctly found
return True
- finally:
- if srs != NULL:
- OSRRelease(srs)
+ else:
+ return False
def init_gdal_data():
- """Set GDAL data search directories in the following precedence:
+ """Set GDAL data search directories.
+
+ They are set in the following precedence:
- wheel copy of gdal_data
- default detection by GDAL, including GDAL_DATA (detected automatically by GDAL)
- other well-known paths under sys.prefix
@@ -207,21 +209,30 @@ def init_gdal_data():
if os.path.exists(wheel_path):
set_gdal_config_options({"GDAL_DATA": wheel_path})
if not has_gdal_data():
- raise ValueError("Could not correctly detect GDAL data files installed by pyogrio wheel")
+ raise ValueError(
+ "Could not correctly detect GDAL data files installed by pyogrio wheel"
+ )
return
# GDAL correctly found data files from GDAL_DATA or compiled-in paths
if has_gdal_data():
return
- wk_path = os.path.join(sys.prefix, 'share', 'gdal')
+ wk_path = os.path.join(sys.prefix, "share", "gdal")
if os.path.exists(wk_path):
set_gdal_config_options({"GDAL_DATA": wk_path})
if not has_gdal_data():
- raise ValueError(f"Found GDAL data directory at {wk_path} but it does not appear to correctly contain GDAL data files")
+ raise ValueError(
+ f"Found GDAL data directory at {wk_path} but it does not appear to "
+ "correctly contain GDAL data files"
+ )
return
- warnings.warn("Could not detect GDAL data files. Set GDAL_DATA environment variable to the correct path.", RuntimeWarning)
+ warnings.warn(
+ "Could not detect GDAL data files. Set GDAL_DATA environment variable to the "
+ "correct path.",
+ RuntimeWarning
+ )
def init_proj_data():
@@ -239,22 +250,29 @@ def init_proj_data():
set_proj_search_path(wheel_path)
# verify that this now resolves
if not has_proj_data():
- raise ValueError("Could not correctly detect PROJ data files installed by pyogrio wheel")
+ raise ValueError(
+ "Could not correctly detect PROJ data files installed by pyogrio wheel"
+ )
return
# PROJ correctly found data files from PROJ_LIB or compiled-in paths
if has_proj_data():
return
- wk_path = os.path.join(sys.prefix, 'share', 'proj')
+ wk_path = os.path.join(sys.prefix, "share", "proj")
if os.path.exists(wk_path):
set_proj_search_path(wk_path)
# verify that this now resolves
if not has_proj_data():
- raise ValueError(f"Found PROJ data directory at {wk_path} but it does not appear to correctly contain PROJ data files")
+ raise ValueError(
+ f"Found PROJ data directory at {wk_path} but it does not appear to "
+ "correctly contain PROJ data files"
+ )
return
- warnings.warn("Could not detect PROJ data files. Set PROJ_LIB environment variable to the correct path.", RuntimeWarning)
+ warnings.warn(
+ "Could not detect PROJ data files. Set PROJ_LIB environment variable to "
+ "the correct path.", RuntimeWarning)
def _register_drivers():
@@ -282,7 +300,7 @@ def _get_driver_metadata_item(driver, metadata_item):
cdef void *cogr_driver = NULL
try:
- cogr_driver = exc_wrap_pointer(GDALGetDriverByName(driver.encode('UTF-8')))
+ cogr_driver = check_pointer(GDALGetDriverByName(driver.encode("UTF-8")))
except NullPointerError:
raise DataSourceError(
f"Could not obtain driver: {driver} (check that it was installed "
@@ -291,12 +309,12 @@ def _get_driver_metadata_item(driver, metadata_item):
except CPLE_BaseError as exc:
raise DataSourceError(str(exc))
- metadata_c = GDALGetMetadataItem(cogr_driver, metadata_item.encode('UTF-8'), NULL)
+ metadata_c = GDALGetMetadataItem(cogr_driver, metadata_item.encode("UTF-8"), NULL)
metadata = None
if metadata_c != NULL:
metadata = metadata_c
- metadata = metadata.decode('UTF-8')
+ metadata = metadata.decode("UTF-8")
if len(metadata) == 0:
metadata = None
@@ -316,13 +334,12 @@ def _get_drivers_for_path(path):
else:
ext = None
-
# allow specific drivers to have a .zip extension to match GDAL behavior
- if ext == 'zip':
- if path.endswith('.shp.zip'):
- ext = 'shp.zip'
- elif path.endswith('.gpkg.zip'):
- ext = 'gpkg.zip'
+ if ext == "zip":
+ if path.endswith(".shp.zip"):
+ ext = "shp.zip"
+ elif path.endswith(".gpkg.zip"):
+ ext = "gpkg.zip"
drivers = []
for i in range(OGRGetDriverCount()):
@@ -336,7 +353,11 @@ def _get_drivers_for_path(path):
# extensions is a space-delimited list of supported extensions
# for driver
extensions = _get_driver_metadata_item(name, "DMD_EXTENSIONS")
- if ext is not None and extensions is not None and ext in extensions.lower().split(' '):
+ if (
+ ext is not None
+ and extensions is not None
+ and ext in extensions.lower().split(" ")
+ ):
drivers.append(name)
else:
prefix = _get_driver_metadata_item(name, "DMD_CONNECTION_PREFIX")
=====================================
pyogrio/_version.py
=====================================
@@ -25,9 +25,9 @@ def get_keywords():
# setup.py/versioneer.py will grep for the variable names, so they must
# each be defined on a line of their own. _version.py will just call
# get_keywords().
- git_refnames = " (HEAD -> main, tag: v0.10.0)"
- git_full = "eb8e7889224155ffa0f779360db29f07f370eef1"
- git_date = "2024-09-28 11:22:57 -0700"
+ git_refnames = " (HEAD -> main, tag: v0.11.0)"
+ git_full = "7ada821e195a4c74b5135ae88d1e8c494afb0c9a"
+ git_date = "2025-05-08 16:44:26 +0200"
keywords = {"refnames": git_refnames, "full": git_full, "date": git_date}
return keywords
=====================================
pyogrio/_vsi.pxd
=====================================
@@ -1,4 +1,4 @@
cdef tuple get_ogr_vsimem_write_path(object path_or_fp, str driver)
cdef str read_buffer_to_vsimem(bytes bytes_buffer)
cdef read_vsimem_to_buffer(str path, object out_buffer)
-cpdef vsimem_rmtree_toplevel(str path)
\ No newline at end of file
+cpdef vsimem_rmtree_toplevel(str path)
=====================================
pyogrio/_vsi.pyx
=====================================
@@ -2,9 +2,6 @@ import fnmatch
from io import BytesIO
from uuid import uuid4
-from libc.stdlib cimport malloc, free
-from libc.string cimport memcpy
-
from pyogrio._ogr cimport *
from pyogrio._ogr import _get_driver_metadata_item
@@ -21,7 +18,7 @@ cdef tuple get_ogr_vsimem_write_path(object path_or_fp, str driver):
(though drivers that create sibling files are not supported for in-memory
files).
- Caller is responsible for deleting the directory via
+ Caller is responsible for deleting the directory via
vsimem_rmtree_toplevel().
Parameters
@@ -42,10 +39,12 @@ cdef tuple get_ogr_vsimem_write_path(object path_or_fp, str driver):
# Check for existing bytes
if path_or_fp.getbuffer().nbytes > 0:
- raise NotImplementedError("writing to existing in-memory object is not supported")
+ raise NotImplementedError(
+ "writing to existing in-memory object is not supported"
+ )
# Create in-memory directory to contain auxiliary files.
- # Prefix with "pyogrio_" so it is clear the directory was created by pyogrio.
+ # Prefix with "pyogrio_" so it is clear the directory was created by pyogrio.
memfilename = f"pyogrio_{uuid4().hex}"
VSIMkdir(f"/vsimem/{memfilename}".encode("UTF-8"), 0666)
@@ -79,7 +78,7 @@ cdef str read_buffer_to_vsimem(bytes bytes_buffer):
is_zipped = len(bytes_buffer) > 4 and bytes_buffer[:4].startswith(b"PK\x03\x04")
ext = ".zip" if is_zipped else ""
- # Prefix with "pyogrio_" so it is clear the file was created by pyogrio.
+ # Prefix with "pyogrio_" so it is clear the file was created by pyogrio.
path = f"/vsimem/pyogrio_{uuid4().hex}{ext}"
# Create an in-memory object that references bytes_buffer
@@ -227,7 +226,7 @@ def ogr_vsi_listtree(str path, str pattern):
if not path.endswith("/"):
path = f"{path}/"
files = [f"{path}{file}" for file in files]
-
+
return files
@@ -277,7 +276,7 @@ def ogr_vsi_unlink(str path):
except UnicodeDecodeError:
path_b = path
path_c = path_b
-
+
if not VSIStatL(path_c, &st_buf) == 0:
raise FileNotFoundError(f"Path does not exist: '{path}'")
=====================================
pyogrio/geopandas.py
=====================================
@@ -5,7 +5,14 @@ import warnings
import numpy as np
-from pyogrio._compat import HAS_GEOPANDAS, PANDAS_GE_15, PANDAS_GE_20, PANDAS_GE_22
+from pyogrio._compat import (
+ HAS_GEOPANDAS,
+ PANDAS_GE_15,
+ PANDAS_GE_20,
+ PANDAS_GE_22,
+ PANDAS_GE_30,
+ PYARROW_GE_19,
+)
from pyogrio.errors import DataSourceError
from pyogrio.raw import (
DRIVERS_NO_MIXED_DIMENSIONS,
@@ -52,13 +59,13 @@ def _try_parse_datetime(ser):
except Exception:
res = ser
# if object dtype, try parse as utc instead
- if res.dtype == "object":
+ if res.dtype in ("object", "string"):
try:
res = pd.to_datetime(ser, utc=True, **datetime_kwargs)
except Exception:
pass
- if res.dtype != "object":
+ if res.dtype.kind == "M": # any datetime64
# GDAL only supports ms precision, convert outputs to match.
# Pandas 2.0 supports datetime[ms] directly, prior versions only support [ns],
# Instead, round the values to [ms] precision.
@@ -209,6 +216,9 @@ def read_dataframe(
warning will be raised.
- **ignore**: invalid WKB geometries will be returned as ``None``
without a warning.
+ - **fix**: an effort is made to fix invalid input geometries (currently
+ just unclosed rings). If this is not possible, they are returned as
+ ``None`` without a warning. Requires GEOS >= 3.11 and shapely >= 2.1.
arrow_to_pandas_kwargs : dict, optional (default: None)
When `use_arrow` is True, these kwargs will be passed to the `to_pandas`_
@@ -282,14 +292,42 @@ def read_dataframe(
)
if use_arrow:
+ import pyarrow as pa
+
meta, table = result
# split_blocks and self_destruct decrease memory usage, but have as side effect
# that accessing table afterwards causes crash, so del table to avoid.
kwargs = {"self_destruct": True}
+ if PANDAS_GE_30:
+ # starting with pyarrow 19.0, pyarrow will correctly handle this themselves,
+ # so only use types_mapper as workaround for older versions
+ if not PYARROW_GE_19:
+ kwargs["types_mapper"] = {
+ pa.string(): pd.StringDtype(na_value=np.nan),
+ pa.large_string(): pd.StringDtype(na_value=np.nan),
+ pa.json_(): pd.StringDtype(na_value=np.nan),
+ }.get
+ # TODO enable the below block when upstream issue to accept extension types
+ # is fixed
+ # else:
+ # # for newer pyarrow, still include mapping for json
+ # # GDAL 3.11 started to emit this extension type, but pyarrow does not
+ # # yet support it properly in the conversion to pandas
+ # kwargs["types_mapper"] = {
+ # pa.json_(): pd.StringDtype(na_value=np.nan),
+ # }.get
if arrow_to_pandas_kwargs is not None:
kwargs.update(arrow_to_pandas_kwargs)
- df = table.to_pandas(**kwargs)
+
+ try:
+ df = table.to_pandas(**kwargs)
+ except UnicodeDecodeError as ex:
+ # Arrow does not support reading data in a non-UTF-8 encoding
+ raise DataSourceError(
+ "The file being read is not encoded in UTF-8; please use_arrow=False"
+ ) from ex
+
del table
if fid_as_index:
@@ -333,7 +371,6 @@ def read_dataframe(
return gp.GeoDataFrame(df, geometry=geometry, crs=meta["crs"])
-# TODO: handle index properly
def write_dataframe(
df,
path,
@@ -469,47 +506,9 @@ def write_dataframe(
if len(geometry_columns) > 0:
geometry_column = geometry_columns[0]
geometry = df[geometry_column]
- fields = [c for c in df.columns if not c == geometry_column]
else:
geometry_column = None
geometry = None
- fields = list(df.columns)
-
- # TODO: may need to fill in pd.NA, etc
- field_data = []
- field_mask = []
- # dict[str, np.array(int)] special case for dt-tz fields
- gdal_tz_offsets = {}
- for name in fields:
- col = df[name]
- if isinstance(col.dtype, pd.DatetimeTZDtype):
- # Deal with datetimes with timezones by passing down timezone separately
- # pass down naive datetime
- naive = col.dt.tz_localize(None)
- values = naive.values
- # compute offset relative to UTC explicitly
- tz_offset = naive - col.dt.tz_convert("UTC").dt.tz_localize(None)
- # Convert to GDAL timezone offset representation.
- # GMT is represented as 100 and offsets are represented by adding /
- # subtracting 1 for every 15 minutes different from GMT.
- # https://gdal.org/development/rfc/rfc56_millisecond_precision.html#core-changes
- # Convert each row offset to a signed multiple of 15m and add to GMT value
- gdal_offset_representation = tz_offset // pd.Timedelta("15m") + 100
- gdal_tz_offsets[name] = gdal_offset_representation.values
- else:
- values = col.values
- if isinstance(values, pd.api.extensions.ExtensionArray):
- from pandas.arrays import BooleanArray, FloatingArray, IntegerArray
-
- if isinstance(values, (IntegerArray, FloatingArray, BooleanArray)):
- field_data.append(values._data)
- field_mask.append(values._mask)
- else:
- field_data.append(np.asarray(values))
- field_mask.append(np.asarray(values.isna()))
- else:
- field_data.append(values)
- field_mask.append(None)
# Determine geometry_type and/or promote_to_multi
if geometry_column is not None:
@@ -622,6 +621,15 @@ def write_dataframe(
table = pa.Table.from_pandas(df, preserve_index=False)
+ # Null arrow columns are not supported by GDAL, so convert to string
+ for field_index, field in enumerate(table.schema):
+ if field.type == pa.null():
+ table = table.set_column(
+ field_index,
+ field.with_type(pa.string()),
+ table[field_index].cast(pa.string()),
+ )
+
if geometry_column is not None:
# ensure that the geometry column is binary (for all-null geometries,
# this could be a wrong type)
@@ -658,6 +666,46 @@ def write_dataframe(
# If there is geometry data, prepare it to be written
if geometry_column is not None:
geometry = to_wkb(geometry.values)
+ fields = [c for c in df.columns if not c == geometry_column]
+ else:
+ fields = list(df.columns)
+
+ # Convert data to numpy arrays for writing
+ # TODO: may need to fill in pd.NA, etc
+ field_data = []
+ field_mask = []
+ # dict[str, np.array(int)] special case for dt-tz fields
+ gdal_tz_offsets = {}
+ for name in fields:
+ col = df[name]
+ if isinstance(col.dtype, pd.DatetimeTZDtype):
+ # Deal with datetimes with timezones by passing down timezone separately
+ # pass down naive datetime
+ naive = col.dt.tz_localize(None)
+ values = naive.values
+ # compute offset relative to UTC explicitly
+ tz_offset = naive - col.dt.tz_convert("UTC").dt.tz_localize(None)
+ # Convert to GDAL timezone offset representation.
+ # GMT is represented as 100 and offsets are represented by adding /
+ # subtracting 1 for every 15 minutes different from GMT.
+ # https://gdal.org/development/rfc/rfc56_millisecond_precision.html#core-changes
+ # Convert each row offset to a signed multiple of 15m and add to GMT value
+ gdal_offset_representation = tz_offset // pd.Timedelta("15m") + 100
+ gdal_tz_offsets[name] = gdal_offset_representation.values
+ else:
+ values = col.values
+ if isinstance(values, pd.api.extensions.ExtensionArray):
+ from pandas.arrays import BooleanArray, FloatingArray, IntegerArray
+
+ if isinstance(values, (IntegerArray, FloatingArray, BooleanArray)):
+ field_data.append(values._data)
+ field_mask.append(values._mask)
+ else:
+ field_data.append(np.asarray(values))
+ field_mask.append(np.asarray(values.isna()))
+ else:
+ field_data.append(values)
+ field_mask.append(None)
write(
path,
=====================================
pyogrio/tests/conftest.py
=====================================
@@ -31,7 +31,10 @@ DRIVERS = {
".geojsonl": "GeoJSONSeq",
".geojsons": "GeoJSONSeq",
".gpkg": "GPKG",
+ ".gpkg.zip": "GPKG",
".shp": "ESRI Shapefile",
+ ".shp.zip": "ESRI Shapefile",
+ ".shz": "ESRI Shapefile",
}
# mapping of driver name to extension
@@ -124,6 +127,11 @@ def naturalearth_lowres_all_ext(tmp_path, naturalearth_lowres, request):
return prepare_testfile(naturalearth_lowres, tmp_path, request.param)
+ at pytest.fixture(scope="function", params=[".geojson"])
+def naturalearth_lowres_geojson(tmp_path, naturalearth_lowres, request):
+ return prepare_testfile(naturalearth_lowres, tmp_path, request.param)
+
+
@pytest.fixture(scope="function")
def naturalearth_lowres_vsi(tmp_path, naturalearth_lowres):
"""Wrap naturalearth_lowres as a zip file for VSI tests"""
=====================================
pyogrio/tests/test_arrow.py
=====================================
@@ -643,6 +643,9 @@ def test_write_append(request, tmp_path, naturalearth_lowres, ext):
pytest.mark.xfail(reason="Bugs with append when writing Arrow to GeoJSON")
)
+ if ext == ".gpkg.zip":
+ pytest.skip("Append is not supported for .gpkg.zip")
+
meta, table = read_arrow(naturalearth_lowres)
# coerce output layer to generic Geometry to avoid mixed type errors
=====================================
pyogrio/tests/test_core.py
=====================================
@@ -106,9 +106,9 @@ def test_detect_write_driver_unsupported(path):
detect_write_driver(path)
- at pytest.mark.parametrize("path", ["test.xml", "test.txt"])
+ at pytest.mark.parametrize("path", ["test.xml"])
def test_detect_write_driver_multiple_unsupported(path):
- with pytest.raises(ValueError, match="multiple drivers are available"):
+ with pytest.raises(ValueError, match="multiple drivers are available "):
detect_write_driver(path)
@@ -286,8 +286,12 @@ def test_read_bounds_negative_skip_features(naturalearth_lowres):
def test_read_bounds_where_invalid(naturalearth_lowres_all_ext):
- with pytest.raises(ValueError, match="Invalid SQL"):
- read_bounds(naturalearth_lowres_all_ext, where="invalid")
+ if naturalearth_lowres_all_ext.suffix == ".gpkg" and __gdal_version__ >= (3, 11, 0):
+ with pytest.raises(DataLayerError, match="no such column"):
+ read_bounds(naturalearth_lowres_all_ext, where="invalid")
+ else:
+ with pytest.raises(ValueError, match="Invalid SQL"):
+ read_bounds(naturalearth_lowres_all_ext, where="invalid")
def test_read_bounds_where(naturalearth_lowres):
=====================================
pyogrio/tests/test_geopandas_io.py
=====================================
@@ -12,10 +12,20 @@ from pyogrio import (
list_drivers,
list_layers,
read_info,
+ set_gdal_config_options,
vsi_listtree,
vsi_unlink,
)
-from pyogrio._compat import HAS_ARROW_WRITE_API, HAS_PYPROJ, PANDAS_GE_15
+from pyogrio._compat import (
+ GDAL_GE_37,
+ GDAL_GE_311,
+ GDAL_GE_352,
+ HAS_ARROW_WRITE_API,
+ HAS_PYPROJ,
+ PANDAS_GE_15,
+ PANDAS_GE_30,
+ SHAPELY_GE_21,
+)
from pyogrio.errors import DataLayerError, DataSourceError, FeatureError, GeometryError
from pyogrio.geopandas import PANDAS_GE_20, read_dataframe, write_dataframe
from pyogrio.raw import (
@@ -93,8 +103,20 @@ def spatialite_available(path):
return False
- at pytest.mark.parametrize("encoding", ["utf-8", "cp1252", None])
-def test_read_csv_encoding(tmp_path, encoding):
+ at pytest.mark.parametrize(
+ "encoding, arrow",
+ [
+ ("utf-8", False),
+ pytest.param("utf-8", True, marks=requires_pyarrow_api),
+ ("cp1252", False),
+ (None, False),
+ ],
+)
+def test_read_csv_encoding(tmp_path, encoding, arrow):
+ """ "Test reading CSV files with different encodings.
+
+ Arrow only supports utf-8 encoding.
+ """
# Write csv test file. Depending on the os this will be written in a different
# encoding: for linux and macos this is utf-8, for windows it is cp1252.
csv_path = tmp_path / "test.csv"
@@ -105,7 +127,7 @@ def test_read_csv_encoding(tmp_path, encoding):
# Read csv. The data should be read with the same default encoding as the csv file
# was written in, but should have been converted to utf-8 in the dataframe returned.
# Hence, the asserts below, with strings in utf-8, be OK.
- df = read_dataframe(csv_path, encoding=encoding)
+ df = read_dataframe(csv_path, encoding=encoding, use_arrow=arrow)
assert len(df) == 1
assert df.columns.tolist() == ["näme", "city"]
@@ -117,19 +139,29 @@ def test_read_csv_encoding(tmp_path, encoding):
locale.getpreferredencoding().upper() == "UTF-8",
reason="test requires non-UTF-8 default platform",
)
-def test_read_csv_platform_encoding(tmp_path):
- """verify that read defaults to platform encoding; only works on Windows (CP1252)"""
+def test_read_csv_platform_encoding(tmp_path, use_arrow):
+ """Verify that read defaults to platform encoding; only works on Windows (CP1252).
+
+ When use_arrow=True, reading an non-UTF8 fails.
+ """
csv_path = tmp_path / "test.csv"
with open(csv_path, "w", encoding=locale.getpreferredencoding()) as csv:
csv.write("näme,city\n")
csv.write("Wilhelm Röntgen,Zürich\n")
- df = read_dataframe(csv_path)
+ if use_arrow:
+ with pytest.raises(
+ DataSourceError,
+ match="; please use_arrow=False",
+ ):
+ df = read_dataframe(csv_path, use_arrow=use_arrow)
+ else:
+ df = read_dataframe(csv_path, use_arrow=use_arrow)
- assert len(df) == 1
- assert df.columns.tolist() == ["näme", "city"]
- assert df.city.tolist() == ["Zürich"]
- assert df.näme.tolist() == ["Wilhelm Röntgen"]
+ assert len(df) == 1
+ assert df.columns.tolist() == ["näme", "city"]
+ assert df.city.tolist() == ["Zürich"]
+ assert df.näme.tolist() == ["Wilhelm Röntgen"]
def test_read_dataframe(naturalearth_lowres_all_ext):
@@ -227,11 +259,32 @@ def test_read_force_2d(tmp_path, use_arrow):
assert not df.iloc[0].geometry.has_z
+ at pytest.mark.skipif(
+ not GDAL_GE_352,
+ reason="gdal >= 3.5.2 needed to use OGR_GEOJSON_MAX_OBJ_SIZE with a float value",
+)
+def test_read_geojson_error(naturalearth_lowres_geojson, use_arrow):
+ try:
+ set_gdal_config_options({"OGR_GEOJSON_MAX_OBJ_SIZE": 0.01})
+ with pytest.raises(
+ DataSourceError,
+ match="Failed to read GeoJSON data; .* GeoJSON object too complex",
+ ):
+ read_dataframe(naturalearth_lowres_geojson, use_arrow=use_arrow)
+ finally:
+ set_gdal_config_options({"OGR_GEOJSON_MAX_OBJ_SIZE": None})
+
+
def test_read_layer(tmp_path, use_arrow):
filename = tmp_path / "test.gpkg"
# create a multilayer GPKG
expected1 = gp.GeoDataFrame(geometry=[Point(0, 0)], crs="EPSG:4326")
+ if use_arrow:
+ # TODO this needs to be fixed on the geopandas side (to ensure the
+ # GeoDataFrame() constructor does this), when use_arrow we already
+ # get columns Index with string dtype
+ expected1.columns = expected1.columns.astype("str")
write_dataframe(
expected1,
filename,
@@ -239,6 +292,8 @@ def test_read_layer(tmp_path, use_arrow):
)
expected2 = gp.GeoDataFrame(geometry=[Point(1, 1)], crs="EPSG:4326")
+ if use_arrow:
+ expected2.columns = expected2.columns.astype("str")
write_dataframe(expected2, filename, layer="layer2", append=True)
assert np.array_equal(
@@ -361,7 +416,7 @@ def test_read_null_values(tmp_path, use_arrow):
df = read_dataframe(filename, use_arrow=use_arrow, read_geometry=False)
# make sure that Null values are preserved
- assert np.array_equal(df.col.values, expected.col.values)
+ assert df["col"].isna().all()
def test_read_fid_as_index(naturalearth_lowres_all_ext, use_arrow):
@@ -438,10 +493,17 @@ def test_read_where_invalid(request, naturalearth_lowres_all_ext, use_arrow):
if use_arrow and naturalearth_lowres_all_ext.suffix == ".gpkg":
# https://github.com/OSGeo/gdal/issues/8492
request.node.add_marker(pytest.mark.xfail(reason="GDAL doesn't error for GPGK"))
- with pytest.raises(ValueError, match="Invalid SQL"):
- read_dataframe(
- naturalearth_lowres_all_ext, use_arrow=use_arrow, where="invalid"
- )
+
+ if naturalearth_lowres_all_ext.suffix == ".gpkg" and __gdal_version__ >= (3, 11, 0):
+ with pytest.raises(DataLayerError, match="no such column"):
+ read_dataframe(
+ naturalearth_lowres_all_ext, use_arrow=use_arrow, where="invalid"
+ )
+ else:
+ with pytest.raises(ValueError, match="Invalid SQL"):
+ read_dataframe(
+ naturalearth_lowres_all_ext, use_arrow=use_arrow, where="invalid"
+ )
def test_read_where_ignored_field(naturalearth_lowres, use_arrow):
@@ -675,6 +737,13 @@ def test_read_skip_features(naturalearth_lowres_all_ext, use_arrow, skip_feature
# In .geojsonl the vertices are reordered, so normalize
is_jsons = ext == ".geojsonl"
+ if skip_features == 200 and not use_arrow:
+ # result is an empty dataframe, so no proper dtype inference happens
+ # for the numpy object dtype arrays
+ df[["continent", "name", "iso_a3"]] = df[
+ ["continent", "name", "iso_a3"]
+ ].astype("str")
+
assert_geodataframe_equal(
df,
expected,
@@ -943,9 +1012,20 @@ def test_read_sql_dialect_sqlite_gpkg(naturalearth_lowres, use_arrow):
assert df.iloc[0].geometry.area > area_canada
- at pytest.mark.parametrize("encoding", ["utf-8", "cp1252", None])
-def test_write_csv_encoding(tmp_path, encoding):
- """Test if write_dataframe uses the default encoding correctly."""
+ at pytest.mark.parametrize(
+ "encoding, arrow",
+ [
+ ("utf-8", False),
+ pytest.param("utf-8", True, marks=requires_arrow_write_api),
+ ("cp1252", False),
+ (None, False),
+ ],
+)
+def test_write_csv_encoding(tmp_path, encoding, arrow):
+ """Test if write_dataframe uses the default encoding correctly.
+
+ Arrow only supports utf-8 encoding.
+ """
# Write csv test file. Depending on the os this will be written in a different
# encoding: for linux and macos this is utf-8, for windows it is cp1252.
csv_path = tmp_path / "test.csv"
@@ -958,7 +1038,7 @@ def test_write_csv_encoding(tmp_path, encoding):
# same encoding as above.
df = pd.DataFrame({"näme": ["Wilhelm Röntgen"], "city": ["Zürich"]})
csv_pyogrio_path = tmp_path / "test_pyogrio.csv"
- write_dataframe(df, csv_pyogrio_path, encoding=encoding)
+ write_dataframe(df, csv_pyogrio_path, encoding=encoding, use_arrow=arrow)
# Check if the text files written both ways can be read again and give same result.
with open(csv_path, encoding=encoding) as csv:
@@ -976,6 +1056,48 @@ def test_write_csv_encoding(tmp_path, encoding):
assert csv_bytes == csv_pyogrio_bytes
+ at pytest.mark.parametrize(
+ "ext, fid_column, fid_param_value",
+ [
+ (".gpkg", "fid", None),
+ (".gpkg", "FID", None),
+ (".sqlite", "ogc_fid", None),
+ (".gpkg", "fid_custom", "fid_custom"),
+ (".gpkg", "FID_custom", "fid_custom"),
+ (".sqlite", "ogc_fid_custom", "ogc_fid_custom"),
+ ],
+)
+ at pytest.mark.requires_arrow_write_api
+def test_write_custom_fids(tmp_path, ext, fid_column, fid_param_value, use_arrow):
+ """Test to specify FIDs to save when writing to a file.
+
+ Saving custom FIDs is only supported for formats that actually store the FID, like
+ e.g. GPKG and SQLite. The fid_column name check is case-insensitive.
+
+ Typically, GDAL supports using a custom FID column for these file formats via a
+ `FID` layer creation option, which is also tested here. If `fid_param_value` is
+ specified (not None), an `fid` parameter is passed to `write_dataframe`, causing
+ GDAL to use the column name specified for the FID.
+ """
+ input_gdf = gp.GeoDataFrame(
+ {fid_column: [5]}, geometry=[shapely.Point(0, 0)], crs="epsg:4326"
+ )
+ kwargs = {}
+ if fid_param_value is not None:
+ kwargs["fid"] = fid_param_value
+ path = tmp_path / f"test{ext}"
+
+ write_dataframe(input_gdf, path, use_arrow=use_arrow, **kwargs)
+
+ assert path.exists()
+ output_gdf = read_dataframe(path, fid_as_index=True, use_arrow=use_arrow)
+ output_gdf = output_gdf.reset_index()
+
+ # pyogrio always sets "fid" as index name with `fid_as_index`
+ expected_gdf = input_gdf.rename(columns={fid_column: "fid"})
+ assert_geodataframe_equal(output_gdf, expected_gdf)
+
+
@pytest.mark.parametrize("ext", ALL_EXTS)
@pytest.mark.requires_arrow_write_api
def test_write_dataframe(tmp_path, naturalearth_lowres, ext, use_arrow):
@@ -1087,16 +1209,38 @@ def test_write_dataframe_index(tmp_path, naturalearth_lowres, use_arrow):
@pytest.mark.parametrize("ext", [ext for ext in ALL_EXTS if ext not in ".geojsonl"])
+ at pytest.mark.parametrize(
+ "columns, dtype",
+ [
+ ([], None),
+ (["col_int"], np.int64),
+ (["col_float"], np.float64),
+ (["col_object"], object),
+ ],
+)
@pytest.mark.requires_arrow_write_api
-def test_write_empty_dataframe(tmp_path, ext, use_arrow):
- expected = gp.GeoDataFrame(geometry=[], crs=4326)
+def test_write_empty_dataframe(tmp_path, ext, columns, dtype, use_arrow):
+ """Test writing dataframe with no rows.
+ With use_arrow, object type columns with no rows are converted to null type columns
+ by pyarrow, but null columns are not supported by GDAL. Added to test fix for #513.
+ """
+ expected = gp.GeoDataFrame(geometry=[], columns=columns, dtype=dtype, crs=4326)
filename = tmp_path / f"test{ext}"
write_dataframe(expected, filename, use_arrow=use_arrow)
assert filename.exists()
- df = read_dataframe(filename)
- assert_geodataframe_equal(df, expected)
+ df = read_dataframe(filename, use_arrow=use_arrow)
+
+ # Check result
+ # For older pandas versions, the index is created as Object dtype but read as
+ # RangeIndex, so don't check the index dtype in that case.
+ check_index_type = True if PANDAS_GE_20 else False
+ # with pandas 3+ and reading through arrow, we preserve the string dtype
+ # (no proper dtype inference happens for the empty numpy object dtype arrays)
+ if use_arrow and dtype is object:
+ expected["col_object"] = expected["col_object"].astype("str")
+ assert_geodataframe_equal(df, expected, check_index_type=check_index_type)
def test_write_empty_geometry(tmp_path):
@@ -1116,6 +1260,28 @@ def test_write_empty_geometry(tmp_path):
assert_geodataframe_equal(df, expected)
+ at pytest.mark.requires_arrow_write_api
+def test_write_None_string_column(tmp_path, use_arrow):
+ """Test pandas object columns with all None values.
+
+ With use_arrow, such columns are converted to null type columns by pyarrow, but null
+ columns are not supported by GDAL. Added to test fix for #513.
+ """
+ gdf = gp.GeoDataFrame({"object_col": [None]}, geometry=[Point(0, 0)], crs=4326)
+ filename = tmp_path / "test.gpkg"
+
+ write_dataframe(gdf, filename, use_arrow=use_arrow)
+ assert filename.exists()
+
+ result_gdf = read_dataframe(filename, use_arrow=use_arrow)
+ if PANDAS_GE_30 and use_arrow:
+ assert result_gdf.object_col.dtype == "str"
+ gdf["object_col"] = gdf["object_col"].astype("str")
+ else:
+ assert result_gdf.object_col.dtype == object
+ assert_geodataframe_equal(result_gdf, gdf)
+
+
@pytest.mark.parametrize("ext", [".geojsonl", ".geojsons"])
@pytest.mark.requires_arrow_write_api
def test_write_read_empty_dataframe_unsupported(tmp_path, ext, use_arrow):
@@ -1521,6 +1687,30 @@ def test_custom_crs_io(tmp_path, naturalearth_lowres_all_ext, use_arrow):
assert df.crs.equals(expected.crs)
+ at pytest.mark.parametrize("ext", [".gpkg.zip", ".shp.zip", ".shz"])
+ at pytest.mark.requires_arrow_write_api
+def test_write_read_zipped_ext(tmp_path, naturalearth_lowres, ext, use_arrow):
+ """Run a basic read and write test on some extra (zipped) extensions."""
+ if ext == ".gpkg.zip" and not GDAL_GE_37:
+ pytest.skip(".gpkg.zip support requires GDAL >= 3.7")
+
+ input_gdf = read_dataframe(naturalearth_lowres)
+ output_path = tmp_path / f"test{ext}"
+
+ write_dataframe(input_gdf, output_path, use_arrow=use_arrow)
+
+ assert output_path.exists()
+ result_gdf = read_dataframe(output_path)
+
+ geometry_types = result_gdf.geometry.type.unique()
+ if DRIVERS[ext] in DRIVERS_NO_MIXED_SINGLE_MULTI:
+ assert list(geometry_types) == ["MultiPolygon"]
+ else:
+ assert set(geometry_types) == {"MultiPolygon", "Polygon"}
+
+ assert_geodataframe_equal(result_gdf, input_gdf, check_index_type=False)
+
+
def test_write_read_mixed_column_values(tmp_path):
# use_arrow=True is tested separately below
mixed_values = ["test", 1.0, 1, datetime.now(), None, np.nan]
@@ -1532,11 +1722,13 @@ def test_write_read_mixed_column_values(tmp_path):
write_dataframe(test_gdf, output_path)
output_gdf = read_dataframe(output_path)
assert len(test_gdf) == len(output_gdf)
- for idx, value in enumerate(mixed_values):
- if value in (None, np.nan):
- assert output_gdf["mixed"][idx] is None
- else:
- assert output_gdf["mixed"][idx] == str(value)
+ # mixed values as object dtype are currently written as strings
+ # (but preserving nulls)
+ expected = pd.Series(
+ [str(value) if value not in (None, np.nan) else None for value in mixed_values],
+ name="mixed",
+ )
+ assert_series_equal(output_gdf["mixed"], expected)
@requires_arrow_write_api
@@ -1569,8 +1761,8 @@ def test_write_read_null(tmp_path, use_arrow):
assert pd.isna(result_gdf["float64"][1])
assert pd.isna(result_gdf["float64"][2])
assert result_gdf["object_str"][0] == "test"
- assert result_gdf["object_str"][1] is None
- assert result_gdf["object_str"][2] is None
+ assert pd.isna(result_gdf["object_str"][1])
+ assert pd.isna(result_gdf["object_str"][2])
@pytest.mark.requires_arrow_write_api
@@ -1714,23 +1906,29 @@ def test_write_geometry_z_types_auto(
@pytest.mark.parametrize(
- "on_invalid, message",
+ "on_invalid, message, expected_wkt",
[
(
"warn",
"Invalid WKB: geometry is returned as None. IllegalArgumentException: "
- "Invalid number of points in LinearRing found 2 - must be 0 or >=",
+ "Points of LinearRing do not form a closed linestring",
+ None,
),
- ("raise", "Invalid number of points in LinearRing found 2 - must be 0 or >="),
- ("ignore", None),
+ ("raise", "Points of LinearRing do not form a closed linestring", None),
+ ("ignore", None, None),
+ ("fix", None, "POLYGON ((0 0, 0 1, 0 0))"),
],
)
-def test_read_invalid_poly_ring(tmp_path, use_arrow, on_invalid, message):
+ at pytest.mark.filterwarnings("ignore:Non closed ring detected:RuntimeWarning")
+def test_read_invalid_poly_ring(tmp_path, use_arrow, on_invalid, message, expected_wkt):
+ if on_invalid == "fix" and not SHAPELY_GE_21:
+ pytest.skip("on_invalid=fix not available for Shapely < 2.1")
+
if on_invalid == "raise":
handler = pytest.raises(shapely.errors.GEOSException, match=message)
elif on_invalid == "warn":
handler = pytest.warns(match=message)
- elif on_invalid == "ignore":
+ elif on_invalid in ("fix", "ignore"):
handler = contextlib.nullcontext()
else:
raise ValueError(f"unknown value for on_invalid: {on_invalid}")
@@ -1744,7 +1942,7 @@ def test_read_invalid_poly_ring(tmp_path, use_arrow, on_invalid, message):
"properties": {},
"geometry": {
"type": "Polygon",
- "coordinates": [ [ [0, 0], [0, 0] ] ]
+ "coordinates": [ [ [0, 0], [0, 1] ] ]
}
}
]
@@ -1760,7 +1958,10 @@ def test_read_invalid_poly_ring(tmp_path, use_arrow, on_invalid, message):
use_arrow=use_arrow,
on_invalid=on_invalid,
)
- df.geometry.isnull().all()
+ if expected_wkt is None:
+ assert df.geometry.iloc[0] is None
+ else:
+ assert df.geometry.iloc[0].wkt == expected_wkt
def test_read_multisurface(multisurface_file, use_arrow):
@@ -1792,6 +1993,10 @@ def test_read_dataset_kwargs(nested_geojson_file, use_arrow):
geometry=[shapely.Point(0, 0)],
crs="EPSG:4326",
)
+ if GDAL_GE_311 and use_arrow:
+ # GDAL 3.11 started to use json extension type, which is not yet handled
+ # correctly in the arrow->pandas conversion (using object instead of str dtype)
+ expected["intermediate_level"] = expected["intermediate_level"].astype(object)
assert_geodataframe_equal(df, expected)
@@ -1837,7 +2042,7 @@ def test_write_nullable_dtypes(tmp_path, use_arrow):
expected["col2"] = expected["col2"].astype("float64")
expected["col3"] = expected["col3"].astype("float32")
expected["col4"] = expected["col4"].astype("float64")
- expected["col5"] = expected["col5"].astype(object)
+ expected["col5"] = expected["col5"].astype("str")
expected.loc[1, "col5"] = None # pandas converts to pd.NA on line above
assert_geodataframe_equal(output_gdf, expected)
@@ -2160,7 +2365,10 @@ def test_non_utf8_encoding_io_shapefile(tmp_path, encoded_text, use_arrow):
if use_arrow:
# pyarrow cannot decode column name with incorrect encoding
- with pytest.raises(UnicodeDecodeError):
+ with pytest.raises(
+ DataSourceError,
+ match="The file being read is not encoded in UTF-8; please use_arrow=False",
+ ):
read_dataframe(output_path, use_arrow=True)
else:
bad = read_dataframe(output_path, use_arrow=False)
@@ -2257,7 +2465,7 @@ def test_write_kml_file_coordinate_order(tmp_path, use_arrow):
if "LIBKML" in list_drivers():
# test appending to the existing file only if LIBKML is available
# as it appears to fall back on LIBKML driver when appending.
- points_append = [Point(70, 80), Point(90, 100), Point(110, 120)]
+ points_append = [Point(7, 8), Point(9, 10), Point(11, 12)]
gdf_append = gp.GeoDataFrame(geometry=points_append, crs="EPSG:4326")
write_dataframe(
=====================================
pyogrio/tests/test_path.py
=====================================
@@ -33,10 +33,20 @@ def change_cwd(path):
[
# local file paths that should be passed through as is
("data.gpkg", "data.gpkg"),
+ ("data.gpkg.zip", "data.gpkg.zip"),
+ ("data.shp.zip", "data.shp.zip"),
(Path("data.gpkg"), "data.gpkg"),
+ (Path("data.gpkg.zip"), "data.gpkg.zip"),
+ (Path("data.shp.zip"), "data.shp.zip"),
("/home/user/data.gpkg", "/home/user/data.gpkg"),
+ ("/home/user/data.gpkg.zip", "/home/user/data.gpkg.zip"),
+ ("/home/user/data.shp.zip", "/home/user/data.shp.zip"),
(r"C:\User\Documents\data.gpkg", r"C:\User\Documents\data.gpkg"),
+ (r"C:\User\Documents\data.gpkg.zip", r"C:\User\Documents\data.gpkg.zip"),
+ (r"C:\User\Documents\data.shp.zip", r"C:\User\Documents\data.shp.zip"),
("file:///home/user/data.gpkg", "/home/user/data.gpkg"),
+ ("file:///home/user/data.gpkg.zip", "/home/user/data.gpkg.zip"),
+ ("file:///home/user/data.shp.zip", "/home/user/data.shp.zip"),
("/home/folder # with hash/data.gpkg", "/home/folder # with hash/data.gpkg"),
# cloud URIs
("https://testing/data.gpkg", "/vsicurl/https://testing/data.gpkg"),
=====================================
pyogrio/tests/test_raw_io.py
=====================================
@@ -17,7 +17,7 @@ from pyogrio import (
read_info,
set_gdal_config_options,
)
-from pyogrio._compat import HAS_PYARROW, HAS_SHAPELY
+from pyogrio._compat import GDAL_GE_37, HAS_PYARROW, HAS_SHAPELY
from pyogrio.errors import DataLayerError, DataSourceError, FeatureError
from pyogrio.raw import open_arrow, read, write
from pyogrio.tests.conftest import (
@@ -63,9 +63,10 @@ def test_read(naturalearth_lowres):
@pytest.mark.parametrize("ext", DRIVERS)
def test_read_autodetect_driver(tmp_path, naturalearth_lowres, ext):
# Test all supported autodetect drivers
+ if ext == ".gpkg.zip" and not GDAL_GE_37:
+ pytest.skip(".gpkg.zip not supported for gdal < 3.7.0")
testfile = prepare_testfile(naturalearth_lowres, dst_dir=tmp_path, ext=ext)
- assert testfile.suffix == ext
assert testfile.exists()
meta, _, geometry, fields = read(testfile)
@@ -703,6 +704,9 @@ def test_write_append(tmp_path, naturalearth_lowres, ext):
if ext in (".geojsonl", ".geojsons") and __gdal_version__ < (3, 6, 0):
pytest.skip("Append to GeoJSONSeq only available for GDAL >= 3.6.0")
+ if ext == ".gpkg.zip":
+ pytest.skip("Append to .gpkg.zip is not supported")
+
meta, _, geometry, field_data = read(naturalearth_lowres)
# coerce output layer to MultiPolygon to avoid mixed type errors
=====================================
pyogrio/util.py
=====================================
@@ -9,6 +9,8 @@ from urllib.parse import urlparse
from pyogrio._vsi import vsimem_rmtree_toplevel as _vsimem_rmtree_toplevel
+MULTI_EXTENSIONS = (".gpkg.zip", ".shp.zip")
+
def get_vsi_path_or_buffer(path_or_buffer):
"""Get VSI-prefixed path or bytes buffer depending on type of path_or_buffer.
@@ -68,15 +70,23 @@ def vsi_path(path: Union[str, Path]) -> str:
# Windows drive letters (e.g. "C:\") confuse `urlparse` as they look like
# URL schemes
if sys.platform == "win32" and re.match("^[a-zA-Z]\\:", path):
+ # If it is not a zip file or it is multi-extension zip file that is directly
+ # supported by a GDAL driver, return the path as is.
if not path.split("!")[0].endswith(".zip"):
return path
+ if path.split("!")[0].endswith(MULTI_EXTENSIONS):
+ return path
# prefix then allow to proceed with remaining parsing
path = f"zip://{path}"
path, archive, scheme = _parse_uri(path)
- if scheme or archive or path.endswith(".zip"):
+ if (
+ scheme
+ or archive
+ or (path.endswith(".zip") and not path.endswith(MULTI_EXTENSIONS))
+ ):
return _construct_vsi_path(path, archive, scheme)
return path
@@ -146,7 +156,10 @@ def _construct_vsi_path(path, archive, scheme) -> str:
suffix = ""
schemes = scheme.split("+")
- if "zip" not in schemes and (archive.endswith(".zip") or path.endswith(".zip")):
+ if "zip" not in schemes and (
+ archive.endswith(".zip")
+ or (path.endswith(".zip") and not path.endswith(MULTI_EXTENSIONS))
+ ):
schemes.insert(0, "zip")
if schemes:
=====================================
pyproject.toml
=====================================
@@ -13,7 +13,7 @@ name = "pyogrio"
dynamic = ["version"]
authors = [
{ name = "Brendan C. Ward", email = "bcward at astutespruce.com" },
- { name = "pyogrio contributors" }
+ { name = "pyogrio contributors" },
]
maintainers = [{ name = "pyogrio contributors" }]
license = { file = "LICENSE" }
@@ -43,7 +43,7 @@ Repository = "https://github.com/geopandas/pyogrio"
[tool.cibuildwheel]
skip = ["pp*", "*musllinux*"]
archs = ["auto64"]
-manylinux-x86_64-image = "manylinux-vcpkg-gdal:latest"
+manylinux-x86_64-image = "manylinux-x86_64-vcpkg-gdal:latest"
manylinux-aarch64-image = "manylinux-aarch64-vcpkg-gdal:latest"
build-verbosity = 3
@@ -51,7 +51,7 @@ build-verbosity = 3
VCPKG_INSTALL = "$VCPKG_INSTALLATION_ROOT/installed/$VCPKG_DEFAULT_TRIPLET"
GDAL_INCLUDE_PATH = "$VCPKG_INSTALL/include"
GDAL_LIBRARY_PATH = "$VCPKG_INSTALL/lib"
-GDAL_VERSION = "3.9.2"
+GDAL_VERSION = "3.10.3"
PYOGRIO_PACKAGE_DATA = 1
GDAL_DATA = "$VCPKG_INSTALL/share/gdal"
PROJ_LIB = "$VCPKG_INSTALL/share/proj"
@@ -66,7 +66,7 @@ repair-wheel-command = [
VCPKG_INSTALL = "$VCPKG_INSTALLATION_ROOT/installed/$VCPKG_DEFAULT_TRIPLET"
GDAL_INCLUDE_PATH = "$VCPKG_INSTALL/include"
GDAL_LIBRARY_PATH = "$VCPKG_INSTALL/lib"
-GDAL_VERSION = "3.9.2"
+GDAL_VERSION = "3.10.3"
PYOGRIO_PACKAGE_DATA = 1
GDAL_DATA = "$VCPKG_INSTALL/share/gdal"
PROJ_LIB = "$VCPKG_INSTALL/share/proj"
@@ -80,11 +80,14 @@ repair-wheel-command = "delvewheel repair --add-path C:/vcpkg/installed/x64-wind
VCPKG_INSTALL = "$VCPKG_INSTALLATION_ROOT/installed/x64-windows-dynamic-release"
GDAL_INCLUDE_PATH = "$VCPKG_INSTALL/include"
GDAL_LIBRARY_PATH = "$VCPKG_INSTALL/lib"
-GDAL_VERSION = "3.9.2"
+GDAL_VERSION = "3.10.3"
PYOGRIO_PACKAGE_DATA = 1
GDAL_DATA = "$VCPKG_INSTALL/share/gdal"
PROJ_LIB = "$VCPKG_INSTALL/share/proj"
+[tool.cython-lint]
+ignore = ["E265", "E222"]
+
[tool.versioneer]
VCS = "git"
style = "pep440"
@@ -95,7 +98,7 @@ tag_prefix = "v"
[tool.ruff]
line-length = 88
-extend-exclude = ["doc/*", "benchmarks/*", "pyogrio/_version.py"]
+extend-exclude = ["doc/*", "benchmarks/*", "pyogrio/_version.py", "conf.py", "setup.py"]
[tool.ruff.lint]
select = [
@@ -206,3 +209,6 @@ section-order = [
"geopandas.tests",
"geopandas.testing",
]
+
+[tool.ruff.lint.pydocstyle]
+convention = "numpy"
=====================================
setup.py
=====================================
@@ -205,7 +205,7 @@ setup(
version=version,
packages=find_packages(),
include_package_data=True,
- exclude_package_data={'': ['*.h', '_*.pxd', '_*.pyx']},
+ exclude_package_data={"": ["*.h", "_*.pxd", "_*.pyx"]},
cmdclass=cmdclass,
ext_modules=ext_modules,
package_data=package_data,
View it on GitLab: https://salsa.debian.org/debian-gis-team/pyogrio/-/commit/fc2eb70a415dfbe4531b7995809a89e98ea94fc2
--
View it on GitLab: https://salsa.debian.org/debian-gis-team/pyogrio/-/commit/fc2eb70a415dfbe4531b7995809a89e98ea94fc2
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/pkg-grass-devel/attachments/20250512/0f7f8a0c/attachment-0001.htm>
More information about the Pkg-grass-devel
mailing list