[Git][debian-gis-team/stac-validator][master] 7 commits: New upstream version 4.2.2
Antonio Valentino (@antonio.valentino)
gitlab at salsa.debian.org
Mon Jun 1 15:29:30 BST 2026
Antonio Valentino pushed to branch master at Debian GIS Project / stac-validator
Commits:
b151f592 by Antonio Valentino at 2026-04-29T19:06:31+00:00
New upstream version 4.2.2
- - - - -
a776b61b by Antonio Valentino at 2026-06-01T13:58:52+00:00
New upstream version 4.4.0
- - - - -
7edadefa by Antonio Valentino at 2026-06-01T13:59:11+00:00
Update upstream source from tag 'upstream/4.4.0'
Update to upstream version '4.4.0'
with Debian dir 41885d2a2e082554b641c12515f53b7279adb37b
- - - - -
d0de5249 by Antonio Valentino at 2026-06-01T14:01:45+00:00
New upstream release
- - - - -
1369136a by Antonio Valentino at 2026-06-01T14:08:43+00:00
Update dependencies
- - - - -
c612d724 by Antonio Valentino at 2026-06-01T14:26:56+00:00
Update 0001-No-network.patch
- - - - -
86a07aeb by Antonio Valentino at 2026-06-01T14:27:01+00:00
Set distribution to unstable
- - - - -
17 changed files:
- + .github/workflows/publish-image.yml
- CHANGELOG.md
- README.md
- cdk-deployment/requirements.txt
- debian/changelog
- debian/control
- debian/patches/0001-No-network.patch
- docs/requirements.txt
- pyproject.toml
- + server/Dockerfile
- + server/api_client_example.py
- + server/server.py
- stac_validator/fast_validator.py
- stac_validator/stac_validator.py
- stac_validator/utilities.py
- tests/test_fast_validator.py
- tests/test_sys_exit.py
Changes:
=====================================
.github/workflows/publish-image.yml
=====================================
@@ -0,0 +1,49 @@
+name: Publish Container Image
+
+on:
+ push:
+ tags:
+ - 'v*' # Trigger on version tags like v4.2.3
+
+env:
+ REGISTRY: ghcr.io
+ # Converts repository name to lowercase for Docker compatibility
+ IMAGE_NAME: ${{ github.repository }}
+
+jobs:
+ build-and-push:
+ runs-on: ubuntu-latest
+ permissions:
+ contents: read
+ packages: write
+
+ steps:
+ - name: Checkout repository
+ uses: actions/checkout at v6
+
+ - name: Log in to the Container registry
+ uses: docker/login-action at v4
+ with:
+ registry: ${{ env.REGISTRY }}
+ username: ${{ github.actor }}
+ password: ${{ secrets.GITHUB_TOKEN }}
+
+ - name: Extract metadata (tags, labels)
+ id: meta
+ uses: docker/metadata-action at v6
+ with:
+ images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
+ tags: |
+ type=semver,pattern={{version}}
+ type=semver,pattern={{major}}.{{minor}}
+ type=sha,prefix=sha-
+ type=raw,value=latest,enable=true
+
+ - name: Build and push Docker image
+ uses: docker/build-push-action at v7
+ with:
+ context: .
+ file: server/Dockerfile
+ push: true
+ tags: ${{ steps.meta.outputs.tags }}
+ labels: ${{ steps.meta.outputs.labels }}
\ No newline at end of file
=====================================
CHANGELOG.md
=====================================
@@ -16,6 +16,22 @@ The format is (loosely) based on [Keep a Changelog](http://keepachangelog.com/)
### Updated
+## [v4.4.0] - 2026-05-11
+
+### Added
+
+- support for --recursive option for `fast` command to validate static STAC catalogs. [#294](https://github.com/stac-utils/stac-validator/pull/294)
+- support for --api option for `fast` command to validate STAC API endpoints. [#294](https://github.com/stac-utils/stac-validator/pull/294)
+- support for --limit option for `fast` command to cap the number of STAC objects validated. [#294](https://github.com/stac-utils/stac-validator/pull/294)
+- Added `run_dict` method to `FastValidator` for direct in-memory dictionary validation without file/network loading. [#294](https://github.com/stac-utils/stac-validator/pull/294)
+
+
+## [v4.3.0] - 2026-05-08
+
+### Added
+
+- **STAC-Valid API & Containerization:** Added a FastAPI-based server for high-speed remote validation, an optimized `uv`-based Dockerfile, and a GitHub Actions workflow for automatic image publication to GHCR. [#293](https://github.com/stac-utils/stac-validator/pull/293)
+
## [v4.2.2] - 2026-04-29
### Added
@@ -444,7 +460,8 @@ The format is (loosely) based on [Keep a Changelog](http://keepachangelog.com/)
- With the newest version - 1.0.0-beta.2 - items will run through jsonchema validation before the PySTAC validation. The reason for this is that jsonschema will give more informative error messages. This should be addressed better in the future. This is not the case with the --recursive option as time can be a concern here with larger collections.
- Logging. Various additions were made here depending on the options selected. This was done to help assist people to update their STAC collections.
-[Unreleased]: https://github.com/sparkgeo/stac-validator/compare/v4.2.2..main
+[Unreleased]: https://github.com/sparkgeo/stac-validator/compare/v4.3.0..main
+[v4.3.0]: https://github.com/sparkgeo/stac-validator/compare/v4.2.2..v4.3.0
[v4.2.2]: https://github.com/sparkgeo/stac-validator/compare/v4.2.1..v4.2.2
[v4.2.1]: https://github.com/sparkgeo/stac-validator/compare/v4.2.0..v4.2.1
[v4.2.0]: https://github.com/sparkgeo/stac-validator/compare/v4.1.0..v4.2.0
=====================================
README.md
=====================================
@@ -10,7 +10,7 @@
[](https://github.com/stac-utils/stac-validator/graphs/contributors)
[](https://github.com/stac-utils/stac-validator/stargazers)
[](https://github.com/stac-utils/stac-validator/network/members)
- [](https://pypi.org/project/stac-validator/)
+ [](https://pypi.org/project/stac-valid/)
[](https://github.com/radiantearth/stac-spec/tree/v1.1.0)
@@ -46,6 +46,7 @@
- [Legacy Validation](#legacy-validation)
- [Batch Validation](#batch-validation)
- [Fast Validation](#fast-validation)
+ - [API Server (FastAPI)](#api-server-fastapi)
- [Python](#python)
- [Schema Cache Settings](#schema-cache-settings)
- [Performance Benchmarking](#performance-benchmarking)
@@ -310,6 +311,12 @@ Options:
-q, --quiet Suppress individual item logs.
-v, --verbose Show full validation logs for all items. By default, only
invalid items are shown.
+ -r, --recursive Recursively validate all child catalogs, collections,
+ and items.
+ -a, --api Validate a STAC API catalog recursively (follows data,
+ child, item, and items links).
+ --limit INTEGER RANGE Limit number of STAC objects to validate.
+ [x>=1]
--help Show this message and exit.
```
@@ -512,6 +519,8 @@ The `fast` command provides ultra-high-speed validation using `fastjsonschema` w
- **Multi-tier caching:** RAM → Disk → Network with automatic fallback
- **Local schema storage:** Schemas cached locally under `local_schemas/.schemas` directory for instant reuse
- **Automatic detection:** Detects STAC type (Item, Collection, Catalog, FeatureCollection) automatically
+- **Recursive traversal:** Supports `--recursive` for local catalog/collection graphs
+- **STAC API traversal:** Supports `--api` to follow STAC API data, child, item, and items links
- **Detailed metrics:** Shows setup time, execution time, and cache hit status for each item
- **Error grouping:** Groups validation errors by type and shows affected items
@@ -540,8 +549,17 @@ $ stac-validator fast item.json --quiet
# Show detailed output for all items (default shows first 5)
$ stac-validator fast collection.json --verbose
+# Validate only first 25 objects in a large FeatureCollection
+$ stac-validator fast collection.json --limit 25
+
+# Recursively validate a local catalog graph
+$ stac-validator fast catalog.json --recursive
+
+# Recursively validate a STAC API root endpoint
+$ stac-validator fast https://api.example.com --api
+
# Combine options
-$ stac-validator fast collection.json --verbose --quiet # Quiet takes precedence
+$ stac-validator fast collection.json --verbose --limit 50
```
**Example Output**
@@ -668,6 +686,50 @@ else:
| Development/testing | `fast` | Instant feedback, detailed metrics, minimal overhead |
| Complex validation rules | `validate` | Full control over validation options, recursive validation |
+#### API Server (FastAPI)
+
+The `fast` validation engine can be deployed as a high-performance REST API, ideal for validating STAC objects during ingestion or as part of a microservices architecture.
+
+**Running the Server (Local):**
+```bash
+# Requires fastapi and uvicorn
+pip install "stac-valid[server]"
+python server/server.py
+```
+
+**Running the Server (Docker):**
+```bash
+# Pull and run the official GitHub container image
+docker run -p 8000:8000 ghcr.io/staclabs/stac-validator:latest
+```
+
+**Validate via local script:**
+```bash
+python server/api_client_example.py sample_data/sentinel-cogs_0_100.json
+```
+
+**Validate via curl:**
+```bash
+curl -X POST http://localhost:8000/validate \
+ -H "Content-Type: application/json" \
+ -d @sample_data/sentinel-cogs_0_100.json
+```
+
+**Response Format:**
+The API returns a detailed JSON summary including performance metrics and error breakdowns:
+```json
+{
+ "path": "request_body",
+ "valid_stac": true,
+ "total_objects": 100,
+ "valid_objects": 100,
+ "invalid_objects": 0,
+ "setup_time_ms": 0.25,
+ "execution_time_ms": 25.4,
+ "errors": []
+}
+```
+
### Python
**Single File Validation**
@@ -823,6 +885,10 @@ import json
fv = FastValidator("large_collection.json", quiet=True)
fv.run()
+# Optionally cap validation to the first N objects
+fv_limited = FastValidator("large_collection.json", quiet=True, limit=100)
+fv_limited.run()
+
# Access validation results via the message attribute
print(json.dumps(fv.message, indent=2))
=====================================
cdk-deployment/requirements.txt
=====================================
@@ -1,4 +1,4 @@
-aws-cdk.core==1.101.0
+aws-cdk.core==1.204.0
aws-cdk.aws-lambda==1.204.0
aws-cdk.aws_apigateway==1.204.0
=====================================
debian/changelog
=====================================
@@ -1,3 +1,13 @@
+stac-validator (4.4.0-1) unstable; urgency=medium
+
+ * New upstream release.
+ * debian/control:
+ - Add dependency on python3-fastapi and python3-unicorn.
+ * debian/patches:
+ - Update 0001-No-network.patch.
+
+ -- Antonio Valentino <antonio.valentino at tiscali.it> Mon, 01 Jun 2026 14:18:35 +0000
+
stac-validator (4.2.2-1) unstable; urgency=medium
* New upstream release.
=====================================
debian/control
=====================================
@@ -7,6 +7,7 @@ Build-Depends: debhelper-compat (= 13),
pybuild-plugin-pyproject,
python3-all,
python3-click,
+ python3-fastapi,
python3-fastjsonschema,
python3-jsonschema,
python3-pytest <!nocheck>,
@@ -16,6 +17,7 @@ Build-Depends: debhelper-compat (= 13),
python3-setuptools,
python3-stac-pydantic,
python3-tqdm,
+ python3-unicorn,
python3-yaml
Standards-Version: 4.7.4
Testsuite: autopkgtest-pkg-pybuild
@@ -33,6 +35,8 @@ Architecture: all
Depends: ${python3:Depends},
${misc:Depends}
Recommends: python3-stac-pydantic
+Suggests: python3-fastapi,
+ python3-unicorn
Description: ${source:Synopsis}
${source:Extended-Description}
=====================================
debian/patches/0001-No-network.patch
=====================================
@@ -12,7 +12,7 @@ Forwarded: not-needed
tests/test_custom.py | 7 +++++++
tests/test_default.py | 11 +++++++++++
tests/test_extensions.py | 12 ++++++++++++
- tests/test_fast_validator.py | 18 ++++++++++++++++++
+ tests/test_fast_validator.py | 22 ++++++++++++++++++++++
tests/test_header.py | 2 ++
tests/test_links.py | 4 ++++
tests/test_pydantic.py | 3 +++
@@ -22,13 +22,13 @@ Forwarded: not-needed
tests/test_validate_collections.py | 3 +++
tests/test_validate_dict.py | 4 ++++
tests/test_validate_item_collection.py | 7 +++++++
- 18 files changed, 127 insertions(+), 1 deletion(-)
+ 18 files changed, 131 insertions(+), 1 deletion(-)
diff --git a/pyproject.toml b/pyproject.toml
-index 409125c..7359c6d 100644
+index 128a49f..0a16d5a 100644
--- a/pyproject.toml
+++ b/pyproject.toml
-@@ -65,4 +65,9 @@ stac-validator = "stac_validator.stac_validator:cli"
+@@ -73,4 +73,9 @@ stac-validator = "stac_validator.stac_validator:cli"
stac-valid = "stac_validator.stac_validator:cli"
[tool.setuptools.package-data]
@@ -507,7 +507,7 @@ index 2b03907..5612c05 100644
"""Test that verbose mode provides detailed error information in the expected format."""
stac_file = "tests/test_data/v100/bad-item.json"
diff --git a/tests/test_fast_validator.py b/tests/test_fast_validator.py
-index 919e8d1..3ee43d5 100644
+index 89236fa..21220c3 100644
--- a/tests/test_fast_validator.py
+++ b/tests/test_fast_validator.py
@@ -129,36 +129,42 @@ def invalid_feature_collection(tmp_path):
@@ -577,7 +577,39 @@ index 919e8d1..3ee43d5 100644
def test_non_verbose_mode(self, tmp_path, capsys):
"""Test non-verbose mode shows first 5 items and silences rest."""
fc_path = tmp_path / "large_fc.json"
-@@ -214,6 +223,7 @@ class TestFastValidatorOptions:
+@@ -210,6 +219,7 @@ class TestFastValidatorOptions:
+ assert "[1]" in captured.out
+ assert "silencing output" in captured.out
+
++ @pytest.mark.network
+ def test_limit_reduces_validated_objects(self, tmp_path):
+ """Test limit option validates only the first N objects."""
+ fc_path = tmp_path / "limited_fc.json"
+@@ -238,6 +248,7 @@ class TestFastValidatorOptions:
+ assert msg["valid_objects"] == 3
+ assert msg["invalid_objects"] == 0
+
++ @pytest.mark.network
+ def test_limit_above_total_does_not_change_count(self, valid_feature_collection):
+ """Test limit larger than object count validates all objects."""
+ fv = FastValidator(valid_feature_collection, quiet=True, limit=20)
+@@ -251,6 +262,7 @@ class TestFastValidatorOptions:
+ class TestFastValidatorRunDict:
+ """Test in-memory dictionary validation entrypoint."""
+
++ @pytest.mark.network
+ def test_run_dict_valid_item(self):
+ payload = {
+ "stac_version": "1.0.0",
+@@ -290,6 +302,7 @@ class TestFastValidatorRunDict:
+ assert fv.message[0]["invalid_objects"] == 1
+ assert len(fv.message[0]["errors"]) > 0
+
++ @pytest.mark.network
+ def test_run_dict_feature_collection_limit(self):
+ payload = {
+ "type": "FeatureCollection",
+@@ -886,6 +899,7 @@ class TestFastValidatorRefResolutionFallback:
class TestFastValidatorDetection:
"""Test STAC type detection."""
@@ -585,7 +617,7 @@ index 919e8d1..3ee43d5 100644
def test_detects_item(self, valid_item, capsys):
"""Test detection of STAC Item."""
fv = FastValidator(valid_item, quiet=False)
-@@ -221,6 +231,7 @@ class TestFastValidatorDetection:
+@@ -893,6 +907,7 @@ class TestFastValidatorDetection:
captured = capsys.readouterr()
assert "Item" in captured.out or "Feature" in captured.out
@@ -593,7 +625,7 @@ index 919e8d1..3ee43d5 100644
def test_detects_collection(self, valid_collection, capsys):
"""Test detection of STAC Collection."""
fv = FastValidator(valid_collection, quiet=False)
-@@ -228,6 +239,7 @@ class TestFastValidatorDetection:
+@@ -900,6 +915,7 @@ class TestFastValidatorDetection:
captured = capsys.readouterr()
assert "Collection" in captured.out
@@ -601,7 +633,7 @@ index 919e8d1..3ee43d5 100644
def test_detects_catalog(self, valid_catalog, capsys):
"""Test detection of STAC Catalog."""
fv = FastValidator(valid_catalog, quiet=False)
-@@ -235,6 +247,7 @@ class TestFastValidatorDetection:
+@@ -907,6 +923,7 @@ class TestFastValidatorDetection:
captured = capsys.readouterr()
assert "Catalog" in captured.out
@@ -609,7 +641,7 @@ index 919e8d1..3ee43d5 100644
def test_detects_feature_collection(self, valid_feature_collection, capsys):
"""Test detection of FeatureCollection."""
fv = FastValidator(valid_feature_collection, quiet=False)
-@@ -275,12 +288,14 @@ class TestFastValidatorErrorHandling:
+@@ -947,12 +964,14 @@ class TestFastValidatorErrorHandling:
class TestFastValidatorPerformance:
"""Test performance characteristics."""
@@ -624,7 +656,7 @@ index 919e8d1..3ee43d5 100644
def test_large_feature_collection(self, tmp_path):
"""Test validation of a large FeatureCollection."""
fc_path = tmp_path / "large_fc.json"
-@@ -305,6 +320,7 @@ class TestFastValidatorPerformance:
+@@ -977,6 +996,7 @@ class TestFastValidatorPerformance:
fv.run()
assert fv.valid is True
@@ -632,7 +664,7 @@ index 919e8d1..3ee43d5 100644
def test_message_attribute_structure(self, valid_item):
"""Test that the message attribute has the correct structure."""
fv = FastValidator(valid_item, quiet=True)
-@@ -340,6 +356,7 @@ class TestFastValidatorPerformance:
+@@ -1012,6 +1032,7 @@ class TestFastValidatorPerformance:
assert isinstance(msg["execution_time_ms"], float)
assert isinstance(msg["errors"], list)
@@ -640,7 +672,7 @@ index 919e8d1..3ee43d5 100644
def test_message_attribute_valid_items(self, valid_feature_collection):
"""Test message attribute for valid items."""
fv = FastValidator(valid_feature_collection, quiet=True)
-@@ -359,6 +376,7 @@ class TestFastValidatorPerformance:
+@@ -1031,6 +1052,7 @@ class TestFastValidatorPerformance:
assert len(msg["schemas_checked"]) > 0
assert "1.0.0" in msg["stac_versions"]
@@ -830,7 +862,7 @@ index eadf82c..569e608 100644
"""Test that extension schemas are cached across validations."""
original_cache_info = fetch_and_parse_schema.cache_info()
diff --git a/tests/test_sys_exit.py b/tests/test_sys_exit.py
-index 9f1e03d..c8d97c8 100644
+index 145a2c4..1b03218 100644
--- a/tests/test_sys_exit.py
+++ b/tests/test_sys_exit.py
@@ -3,6 +3,7 @@ import subprocess
=====================================
docs/requirements.txt
=====================================
@@ -1,4 +1,4 @@
sphinx>=7.4.7
sphinx_rtd_theme>=3.1.0
-myst-parser>=0.15.0
-sphinx-autodoc-typehints>=1.12.0
\ No newline at end of file
+sphinx-autodoc-typehints>=2.3.0
+myst-parser>=3.0.1
=====================================
pyproject.toml
=====================================
@@ -4,11 +4,15 @@ build-backend = "setuptools.build_meta"
[project]
name = "stac_valid"
-version = "4.2.2"
+version = "4.4.0"
description = "A package to validate STAC files"
authors = [
- {name = "James Banting"},
- {name = "Jonathan Healy", email = "jon at healy-hyperspatial.dev"}
+ {name = "Jonathan Healy", email = "jon at healy-hyperspatial.dev"},
+ {name = "James Banting"}
+]
+maintainers = [
+ {name = "Jonathan Healy", email = "jon at healy-hyperspatial.dev"},
+ {name = "Healy Hyperspatial"}
]
license = {text = "Apache-2.0"}
classifiers = [
@@ -48,6 +52,10 @@ dev = [
pydantic = [
"stac-pydantic>=3.3.0"
]
+server = [
+ "fastapi>=0.111.0",
+ "uvicorn>=0.30.0"
+]
[project.urls]
Homepage = "https://github.com/stac-utils/stac-validator"
=====================================
server/Dockerfile
=====================================
@@ -0,0 +1,43 @@
+FROM python:3.11-slim
+COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
+
+# Set environment variables
+ENV PYTHONDONTWRITEBYTECODE=1
+ENV PYTHONUNBUFFERED=1
+ENV UV_COMPILE_BYTECODE=1
+ENV UV_LINK_MODE=copy
+
+WORKDIR /app
+
+# Install system dependencies
+RUN apt-get update && apt-get install -y --no-install-recommends \
+ build-essential \
+ && rm -rf /var/lib/apt/lists/*
+
+# Copy metadata files needed for installation
+COPY pyproject.toml ./
+
+# Copy the actual source code directory
+# (Make sure this name matches what is in [tool.setuptools] packages)
+COPY stac_validator/ ./stac_validator/
+
+# Install dependencies using uv
+RUN --mount=type=cache,target=/root/.cache/uv \
+ uv pip install --system . uvicorn fastapi requests
+
+# Copy the server directory specifically
+COPY server/ ./server/
+
+# Final install
+RUN --mount=type=cache,target=/root/.cache/uv \
+ uv pip install --system .
+
+# Create schema cache directory
+RUN mkdir -p local_schemas/.schemas && chmod -R 777 local_schemas
+
+EXPOSE 8000
+
+# Start the server using uvicorn
+# Note: Since we are in /app and copied the root,
+# 'server.server:app' points to 'server/server.py'
+CMD ["uvicorn", "server.server:app", "--host", "0.0.0.0", "--port", "8000"]
\ No newline at end of file
=====================================
server/api_client_example.py
=====================================
@@ -0,0 +1,80 @@
+import json
+import sys
+
+import requests
+
+
+def test_validate_collection(
+ file_path: str, server_url: str = "http://localhost:8000/validate"
+):
+ """Sends a local STAC file to the validation server and prints the result."""
+
+ # 1. Load the ItemCollection
+ try:
+ with open(file_path, "r") as f:
+ stac_data = json.load(f)
+ except FileNotFoundError:
+ print(f"❌ Error: File {file_path} not found.")
+ return
+ except json.JSONDecodeError:
+ print(f"❌ Error: File {file_path} is not valid JSON.")
+ return
+
+ print(f"🚀 Sending '{file_path}' to {server_url}...")
+
+ # 2. POST to the server
+ try:
+ response = requests.post(server_url, json=stac_data)
+
+ # Check if the server returned an error (e.g., 500 or 422)
+ if response.status_code != 200:
+ print(f"❌ Server Error ({response.status_code}): {response.text}")
+ return
+
+ result = response.json()
+
+ # Double check the expected keys exist before trying to print them
+ if "valid_stac" not in result:
+ print(f"❌ Error: Server response format unexpected: {result}")
+ return
+
+ except requests.exceptions.RequestException as e:
+ print(f"❌ API Request failed: {e}")
+ return
+
+ # 3. Print the formatted result
+ print("\n" + "=" * 50)
+ print("📊 API VALIDATION RESULT")
+ print("=" * 50)
+ # Corrected key: valid_stac
+ status_text = "✅ VALID" if result.get("valid_stac") else "❌ INVALID"
+ print(f"Status : {status_text}")
+ print(f"Total Objects : {result.get('total_objects')}")
+ print(f"Valid Objects : {result.get('valid_objects')}")
+ print(f"Invalid Objects : {result.get('invalid_objects')}")
+ print(f"Execution Time : {result.get('execution_time_ms', 0):.2f} ms")
+ print("-" * 50)
+
+ # Corrected keys: errors, error_message, affected_items
+ if result.get("errors"):
+ print("🚨 ERRORS FOUND:")
+ for err in result["errors"]:
+ msg = err.get("error_message", "Unknown Error")
+ count = err.get("count", 0)
+ samples = ", ".join(err.get("affected_items", [])[:3])
+ print(f"- [{count} objects] {msg}")
+ print(f" Examples: {samples}")
+
+ print("=" * 50 + "\n")
+
+ print(
+ "result dict:", json.dumps(result, indent=2)
+ ) # Debug: Print the full result dict
+
+
+if __name__ == "__main__":
+ # Defaulting to your sample collection if no argument provided
+ target_file = (
+ sys.argv[1] if len(sys.argv) > 1 else "sample_data/sentinel-cogs_0_100.json"
+ )
+ test_validate_collection(target_file)
=====================================
server/server.py
=====================================
@@ -0,0 +1,67 @@
+import json
+import os
+import tempfile
+from typing import Any, Dict
+
+import uvicorn
+from fastapi import Body, FastAPI, HTTPException
+
+from stac_validator.fast_validator import FastValidator
+
+app = FastAPI(
+ title="STAC-Valid API",
+ description="High-performance STAC validation as a service using fastjsonschema.",
+ version="4.2.0",
+)
+
+
+ at app.post("/validate")
+async def validate_stac(data: Dict[str, Any] = Body(...)):
+ """
+ Validates a STAC Item, Collection, or FeatureCollection provided in the request body.
+ Returns a detailed validation summary including performance metrics and error breakdowns.
+ """
+ # Create a temporary file because FastValidator currently reads from disk
+ tmp_path = None
+ try:
+ with tempfile.NamedTemporaryFile(mode="w", suffix=".json", delete=False) as tmp:
+ json.dump(data, tmp)
+ tmp_path = tmp.name
+
+ # Initialize and run the FastValidator in quiet mode
+ fv = FastValidator(stac_file=tmp_path, quiet=True, verbose=False)
+ fv.run()
+
+ # Check if the result message was populated.
+ # With the fix in FastValidator, fv.message[0] should contain the 'valid' key.
+ if not fv.message or len(fv.message) == 0:
+ raise HTTPException(
+ status_code=500, detail="Validator failed to produce a result summary."
+ )
+
+ # Extract the validation dictionary
+ response = fv.message[0]
+
+ # Cleanup: Hide internal temp paths from API consumers
+ response["path"] = "request_body"
+
+ return response
+
+ except Exception as e:
+ # Catch unexpected errors and return a 500
+ raise HTTPException(status_code=500, detail=f"Validation crashed: {str(e)}")
+
+ finally:
+ # Ensure cleanup of the temporary file regardless of success or failure
+ if tmp_path and os.path.exists(tmp_path):
+ os.remove(tmp_path)
+
+
+ at app.get("/health")
+async def health_check():
+ """Simple health check endpoint for monitoring."""
+ return {"status": "online", "engine": "fastjsonschema"}
+
+
+if __name__ == "__main__":
+ uvicorn.run(app, host="0.0.0.0", port=8000)
=====================================
stac_validator/fast_validator.py
=====================================
@@ -1,12 +1,19 @@
+import io
import json
import os
+import sys
import time
-import urllib.error
-import urllib.request
-from typing import Any, Dict, List
+from concurrent.futures import Future, ThreadPoolExecutor
+from contextlib import redirect_stderr, redirect_stdout
+from typing import Any, Dict, List, Optional, Set, Tuple
import click
import fastjsonschema # type: ignore
+import requests
+from requests.adapters import HTTPAdapter
+from urllib3.util.retry import Retry
+
+from .utilities import validate_with_ref_resolver
# --- Caches & Config ---
SCHEMA_CACHE: Dict[str, Any] = {}
@@ -19,6 +26,22 @@ LOCAL_SCHEMA_DIR = os.path.join(
".schemas",
)
+# Shared HTTP session with keep-alive connection pooling and retries for crawler workloads.
+HTTP_SESSION = requests.Session()
+_http_retries = Retry(
+ total=3,
+ backoff_factor=0.2,
+ status_forcelist=[500, 502, 503, 504],
+)
+_http_adapter = HTTPAdapter(
+ pool_connections=100,
+ pool_maxsize=100,
+ max_retries=_http_retries,
+)
+HTTP_SESSION.mount("http://", _http_adapter)
+HTTP_SESSION.mount("https://", _http_adapter)
+HTTP_SESSION.headers.update({"User-Agent": "stac-fast-cli/5.0"})
+
def get_local_path_for_uri(uri: str) -> str:
"""Creates a safe local filepath for a cached schema URL."""
@@ -48,11 +71,11 @@ def fetch_schema(uri: str) -> Dict[str, Any]:
# 3. Network Fetch
if not QUIET_MODE:
click.secho(f" [Network] Fetching: {uri}", fg="yellow", dim=True)
- req = urllib.request.Request(uri, headers={"User-Agent": "stac-fast-cli/5.0"})
try:
- with urllib.request.urlopen(req) as response:
- schema_dict = json.loads(response.read().decode("utf-8"))
- except urllib.error.URLError as e:
+ response = HTTP_SESSION.get(uri, timeout=10)
+ response.raise_for_status()
+ schema_dict = response.json()
+ except requests.RequestException as e:
raise RuntimeError(f"Could not resolve schema: {uri}. Reason: {e}")
# 4. Save to Disk Cache
@@ -110,7 +133,7 @@ def get_validator(stac_type: str, stac_version: str, extensions: List[str]):
import jsonschema
# Create a validator using the same custom logic
- def fallback_validator(data):
+ def fallback_validator(data: Dict[str, Any]) -> None:
# We need a resolver to handle the remote $refs
resolver = jsonschema.RefResolver(
base_uri="",
@@ -126,30 +149,164 @@ def get_validator(stac_type: str, stac_version: str, extensions: List[str]):
class FastValidator:
- def __init__(self, stac_file: str, quiet: bool = False, verbose: bool = False):
+ def __init__(
+ self,
+ stac_file: str,
+ quiet: bool = False,
+ verbose: bool = False,
+ limit: Optional[int] = None,
+ ):
global QUIET_MODE
self.stac_file = stac_file
self.quiet = quiet
self.valid = True
self.verbose = verbose
+ self.limit = limit
self.message: List[Dict[str, Any]] = []
QUIET_MODE = quiet
+ def _limit_reached(self, results: List[Dict]) -> bool:
+ return self.limit is not None and len(results) >= self.limit
+
+ def _get_base_schema_uri(self, stac_type: str, stac_version: str) -> str:
+ stac_type_lower = stac_type.lower()
+ if stac_type_lower in ["item", "feature"]:
+ return f"https://schemas.stacspec.org/v{stac_version}/item-spec/json-schema/item.json"
+ if stac_type_lower == "collection":
+ return f"https://schemas.stacspec.org/v{stac_version}/collection-spec/json-schema/collection.json"
+ if stac_type_lower == "catalog":
+ return f"https://schemas.stacspec.org/v{stac_version}/catalog-spec/json-schema/catalog.json"
+ raise ValueError(f"Unknown STAC type for validation: {stac_type}")
+
+ def _is_ref_resolution_error(self, err: Exception) -> bool:
+ err_text = str(err)
+ err_type = err.__class__.__name__
+ return (
+ "Unresolvable JSON pointer" in err_text
+ or "RefResolutionError" in err_type
+ or "Unresolvable" in err_type
+ )
+
+ def _validate_with_jsonschema_fallback(
+ self,
+ item: Dict[str, Any],
+ stac_type: str,
+ stac_version: str,
+ extensions: List[str],
+ ) -> None:
+ """Fallback validation path using the main jsonschema resolver utility."""
+ base_schema = self._get_base_schema_uri(stac_type, stac_version)
+ validate_with_ref_resolver(base_schema, item)
+ for ext_schema in extensions:
+ validate_with_ref_resolver(ext_schema, item)
+
+ def _load_json_resource(self, resource_path: str) -> Dict[str, Any]:
+ if resource_path.startswith("http"):
+ response = HTTP_SESSION.get(resource_path, timeout=15)
+ response.raise_for_status()
+ return response.json()
+
+ with open(resource_path, "r") as f:
+ return json.load(f)
+
+ def _get_parallel_fetch_workers(self, item_count: int) -> int:
+ return max(1, min(8, item_count))
+
+ def _load_collection_documents(
+ self, collection_urls: List[str]
+ ) -> List[Tuple[str, Optional[Dict[str, Any]], Optional[Exception]]]:
+ if len(collection_urls) <= 1:
+ results_single: List[
+ Tuple[str, Optional[Dict[str, Any]], Optional[Exception]]
+ ] = []
+ for collection_url in collection_urls:
+ try:
+ results_single.append(
+ (collection_url, self._load_json_resource(collection_url), None)
+ )
+ except Exception as exc:
+ results_single.append((collection_url, None, exc))
+ return results_single
+
+ with ThreadPoolExecutor(
+ max_workers=self._get_parallel_fetch_workers(len(collection_urls))
+ ) as executor:
+ futures: List[Future[Dict[str, Any]]] = [
+ executor.submit(self._load_json_resource, collection_url)
+ for collection_url in collection_urls
+ ]
+
+ results_parallel: List[
+ Tuple[str, Optional[Dict[str, Any]], Optional[Exception]]
+ ] = []
+ for collection_url, future in zip(collection_urls, futures):
+ try:
+ results_parallel.append((collection_url, future.result(), None))
+ except Exception as exc:
+ results_parallel.append((collection_url, None, exc))
+
+ return results_parallel
+
+ def _prefetch_api_collection_resources(
+ self, collection_url: str
+ ) -> Tuple[str, Optional[Dict[str, Dict[str, Any]]], Optional[Exception]]:
+ try:
+ collection_data = self._load_json_resource(collection_url)
+ except Exception as exc:
+ return collection_url, None, exc
+
+ resources = {collection_url: collection_data}
+ base_dir = collection_url.rsplit("/", 1)[0]
+
+ for link in collection_data.get("links", []):
+ if link.get("rel") != "items":
+ continue
+
+ href = link.get("href", "")
+ if not href:
+ continue
+
+ if href.startswith("http"):
+ items_path = href
+ else:
+ items_path = os.path.normpath(os.path.join(base_dir, href))
+
+ try:
+ resources[items_path] = self._load_json_resource(items_path)
+ except Exception:
+ pass
+
+ return collection_url, resources, None
+
+ def _prefetch_api_collection_resources_batch(
+ self, collection_urls: List[str]
+ ) -> List[Tuple[str, Optional[Dict[str, Dict[str, Any]]], Optional[Exception]]]:
+ if len(collection_urls) <= 1:
+ return [
+ self._prefetch_api_collection_resources(collection_url)
+ for collection_url in collection_urls
+ ]
+
+ with ThreadPoolExecutor(
+ max_workers=self._get_parallel_fetch_workers(len(collection_urls))
+ ) as executor:
+ futures: List[
+ Future[
+ Tuple[str, Optional[Dict[str, Dict[str, Any]]], Optional[Exception]]
+ ]
+ ] = [
+ executor.submit(self._prefetch_api_collection_resources, collection_url)
+ for collection_url in collection_urls
+ ]
+ return [future.result() for future in futures]
+
def run(self):
"""Universal high-speed STAC Validator (Items, Collections, Catalogs, FeatureCollections)"""
if not self.quiet:
click.secho(f"\n📂 Loading: {self.stac_file}", fg="blue", bold=True)
try:
- if self.stac_file.startswith("http"):
- req = urllib.request.Request(
- self.stac_file, headers={"User-Agent": "stac-fast-cli/5.0"}
- )
- with urllib.request.urlopen(req) as response:
- data = json.loads(response.read().decode("utf-8"))
- else:
- with open(self.stac_file, "r") as f:
- data = json.load(f)
+ data = self._load_json_resource(self.stac_file)
except Exception as e:
click.secho(f"❌ Error reading {self.stac_file}: {e}", fg="red", bold=True)
self.valid = False
@@ -195,13 +352,22 @@ class FastValidator:
return
# --- Metrics ---
+ available_objects = len(items_to_validate)
+ if self.limit is not None:
+ items_to_validate = items_to_validate[: self.limit]
+ if not self.quiet and available_objects > self.limit:
+ click.secho(
+ f"🔢 Limiting validation to first {self.limit} objects (out of {available_objects}).",
+ fg="yellow",
+ )
+
total_setup_ms = 0.0
total_exec_ms = 0.0
valid_count = 0
invalid_count = 0
error_registry: Dict[str, List[str]] = {}
- stac_versions_found: set = set()
- schemas_checked: set = set()
+ stac_versions_found: Set[str] = set()
+ schemas_checked: Set[str] = set()
for index, item in enumerate(items_to_validate):
# Determine specific STAC attributes for this object
@@ -218,14 +384,9 @@ class FastValidator:
)
# Build schema URI for this object type
- stac_type_lower = actual_type.lower()
- if stac_type_lower in ["item", "feature"]:
- base_schema = f"https://schemas.stacspec.org/v{stac_version}/item-spec/json-schema/item.json"
- elif stac_type_lower == "collection":
- base_schema = f"https://schemas.stacspec.org/v{stac_version}/collection-spec/json-schema/collection.json"
- elif stac_type_lower == "catalog":
- base_schema = f"https://schemas.stacspec.org/v{stac_version}/catalog-spec/json-schema/catalog.json"
- else:
+ try:
+ base_schema = self._get_base_schema_uri(actual_type, stac_version)
+ except ValueError:
base_schema = ""
if base_schema:
@@ -284,6 +445,38 @@ class FastValidator:
error_registry[error_msg].append(item_id)
status_text = click.style("❌ INVALID", fg="red")
+ except Exception as e:
+ t3 = time.perf_counter()
+ exec_time = (t3 - t2) * 1000
+ total_exec_ms += exec_time
+
+ if self._is_ref_resolution_error(e):
+ try:
+ self._validate_with_jsonschema_fallback(
+ item,
+ actual_type,
+ stac_version,
+ extensions,
+ )
+ valid_count += 1
+ status_text = click.style("✅ VALID", fg="green")
+ except Exception as fallback_err:
+ invalid_count += 1
+ self.valid = False
+ error_msg = str(fallback_err)
+ if error_msg not in error_registry:
+ error_registry[error_msg] = []
+ error_registry[error_msg].append(item_id)
+ status_text = click.style("❌ INVALID", fg="red")
+ else:
+ invalid_count += 1
+ self.valid = False
+ error_msg = str(e)
+ if error_msg not in error_registry:
+ error_registry[error_msg] = []
+ error_registry[error_msg].append(item_id)
+ status_text = click.style("❌ INVALID", fg="red")
+
if not self.quiet:
if self.verbose or index < 5 or (len(items_to_validate) < 20):
cache_icon = "⚡" if is_cached else "🐌"
@@ -357,3 +550,609 @@ class FastValidator:
]
click.echo("\n")
+
+ def run_dict(self, stac_dict: Dict[str, Any], source_name: str = "in-memory"):
+ """Validate a native Python dictionary directly without file/network loading."""
+ if not isinstance(stac_dict, dict):
+ self.valid = False
+ self.message = [
+ {
+ "path": source_name,
+ "valid_stac": False,
+ "error_message": "Input to run_dict must be a dictionary.",
+ }
+ ]
+ return
+
+ self.stac_file = source_name
+
+ data = dict(stac_dict)
+ obj_type = data.get("type", "")
+ items_to_validate: List[Dict[str, Any]] = []
+
+ if obj_type == "FeatureCollection":
+ features = data.get("features", [])
+ items_to_validate = features if isinstance(features, list) else []
+ elif obj_type in ["Feature", "Collection"]:
+ items_to_validate = [data]
+ elif obj_type == "Catalog" or ("id" in data and "description" in data):
+ data["type"] = "Catalog"
+ items_to_validate = [data]
+ else:
+ self.valid = False
+ if "type" in data:
+ error_msg = (
+ f"Unknown JSON type. Unsupported 'type' value: {obj_type!r}."
+ )
+ else:
+ error_msg = "Unknown JSON type. Missing 'type' field."
+
+ self.message = [
+ {
+ "path": source_name,
+ "valid_stac": False,
+ "error_message": error_msg,
+ }
+ ]
+ return
+
+ available_objects = len(items_to_validate)
+ if self.limit is not None:
+ items_to_validate = items_to_validate[: self.limit]
+
+ total_setup_ms = 0.0
+ total_exec_ms = 0.0
+ valid_count = 0
+ invalid_count = 0
+ error_registry: Dict[str, List[str]] = {}
+ stac_versions_found: Set[str] = set()
+ schemas_checked: Set[str] = set()
+
+ self.valid = True
+
+ for index, item in enumerate(items_to_validate):
+ item_id = item.get("id", f"unknown-{index}")
+ stac_version = item.get("stac_version", "1.0.0")
+ extensions = item.get("stac_extensions", [])
+
+ stac_versions_found.add(stac_version)
+
+ actual_type = (
+ "Item" if item.get("type") == "Feature" else item.get("type", "Catalog")
+ )
+
+ try:
+ base_schema = self._get_base_schema_uri(actual_type, stac_version)
+ except ValueError:
+ base_schema = ""
+
+ if base_schema:
+ schemas_checked.add(base_schema)
+
+ for ext in extensions:
+ schemas_checked.add(ext)
+
+ t0 = time.perf_counter()
+ try:
+ validator, _ = get_validator(actual_type, stac_version, extensions)
+ except Exception as e:
+ invalid_count += 1
+ self.valid = False
+ error_msg = str(e)
+ if error_msg not in error_registry:
+ error_registry[error_msg] = []
+ error_registry[error_msg].append(item_id)
+ continue
+ t1 = time.perf_counter()
+ total_setup_ms += (t1 - t0) * 1000
+
+ t2 = time.perf_counter()
+ try:
+ validator(item)
+ t3 = time.perf_counter()
+ total_exec_ms += (t3 - t2) * 1000
+ valid_count += 1
+ except fastjsonschema.JsonSchemaValueException as e:
+ t3 = time.perf_counter()
+ total_exec_ms += (t3 - t2) * 1000
+ invalid_count += 1
+ self.valid = False
+ error_msg = f"{e.name} {e.message.replace(e.name, '').strip()}"
+ if "disallowed definition" in error_msg and "collection" in error_msg:
+ error_msg = "STAC Spec Violation: Missing {'rel': 'collection'} in links array."
+ if error_msg not in error_registry:
+ error_registry[error_msg] = []
+ error_registry[error_msg].append(item_id)
+ except Exception as e:
+ t3 = time.perf_counter()
+ total_exec_ms += (t3 - t2) * 1000
+ if self._is_ref_resolution_error(e):
+ try:
+ self._validate_with_jsonschema_fallback(
+ item,
+ actual_type,
+ stac_version,
+ extensions,
+ )
+ valid_count += 1
+ except Exception as fallback_err:
+ invalid_count += 1
+ self.valid = False
+ error_msg = str(fallback_err)
+ if error_msg not in error_registry:
+ error_registry[error_msg] = []
+ error_registry[error_msg].append(item_id)
+ else:
+ invalid_count += 1
+ self.valid = False
+ error_msg = str(e)
+ if error_msg not in error_registry:
+ error_registry[error_msg] = []
+ error_registry[error_msg].append(item_id)
+
+ self.message = [
+ {
+ "path": source_name,
+ "valid_stac": self.valid,
+ "stac_versions": sorted(list(stac_versions_found)),
+ "schemas_checked": sorted(list(schemas_checked)),
+ "total_objects": len(items_to_validate),
+ "valid_objects": valid_count,
+ "invalid_objects": invalid_count,
+ "setup_time_ms": total_setup_ms,
+ "execution_time_ms": total_exec_ms,
+ "input_objects": available_objects,
+ "errors": [
+ {
+ "error_message": err_msg,
+ "affected_items": affected_ids,
+ "count": len(affected_ids),
+ }
+ for err_msg, affected_ids in error_registry.items()
+ ],
+ }
+ ]
+
+ def run_recursive(self):
+ """Recursively validate a local STAC catalog/collection and all its children."""
+ sys.setrecursionlimit(10000)
+ start_time = time.perf_counter()
+
+ # Load the root STAC object
+ try:
+ root_data = self._load_json_resource(self.stac_file)
+ root_path = (
+ self.stac_file
+ if self.stac_file.startswith("http")
+ else os.path.abspath(self.stac_file)
+ )
+ except Exception as e:
+ click.secho(f"❌ Error reading {self.stac_file}: {e}", fg="red", bold=True)
+ self.valid = False
+ return
+
+ # Recursively validate the root and all children
+ results = []
+ visited = set()
+ visited.add(root_path)
+ self._validate_recursive(root_data, root_path, results, visited, is_api=False)
+
+ if self.limit is not None and not self.quiet and len(results) >= self.limit:
+ click.secho(
+ f"🔢 Validation limit reached ({self.limit} objects).",
+ fg="yellow",
+ )
+
+ # Display results
+ click.echo("\n" + "=" * 55)
+ click.secho("📊 RECURSIVE VALIDATION SUMMARY", bold=True, fg="blue")
+ click.echo("=" * 55)
+
+ valid_count = sum(1 for r in results if r["valid_stac"])
+ invalid_count = len(results) - valid_count
+ elapsed_ms = (time.perf_counter() - start_time) * 1000
+
+ click.echo(f"Total Objects Validated: {len(results)}")
+ click.echo(f"Valid Objects: {valid_count}")
+ click.echo(f"Invalid Objects: {invalid_count}")
+ click.echo(f"Execution Time: {elapsed_ms:.2f} ms")
+
+ if invalid_count > 0:
+ click.echo("\n" + "=" * 55)
+ click.secho("🚨 INVALID OBJECTS", bold=True, fg="red")
+ click.echo("=" * 55)
+
+ # Group errors by message
+ error_groups = {}
+ for result in results:
+ if not result["valid_stac"]:
+ error_msg = result.get("error_message", "Unknown error")
+ if error_msg not in error_groups:
+ error_groups[error_msg] = []
+ # Store both path and ID for better identification
+ object_id = result.get("id", "unknown")
+ error_groups[error_msg].append(
+ {"path": result["path"], "id": object_id}
+ )
+
+ # Display grouped errors
+ for error_msg, items in error_groups.items():
+ click.echo(f"\n❌ {error_msg}")
+ click.echo(f" Affected Objects: {len(items)}")
+ # Show first 5 examples
+ for item in items[:5]:
+ item_id = item["id"] if item["id"] != "unknown" else ""
+ if item_id:
+ click.echo(f" - {item['path']} (ID: {item_id})")
+ else:
+ click.echo(f" - {item['path']}")
+ if len(items) > 5:
+ click.echo(f" ... and {len(items) - 5} more")
+
+ # Set overall validity
+ self.valid = all(r.get("valid_stac", False) for r in results)
+ self.message = results
+
+ def run_api(self):
+ """Recursively validate a STAC API catalog and all its collections/items."""
+ sys.setrecursionlimit(10000)
+ start_time = time.perf_counter()
+
+ if not self.quiet:
+ click.secho("🚀 Starting STAC API validation...", fg="blue", bold=True)
+ click.secho(
+ "⏳ Fetching API root and discovery links...", fg="cyan", dim=True
+ )
+
+ # Load the root STAC API object
+ try:
+ root_data = self._load_json_resource(self.stac_file)
+ root_path = (
+ self.stac_file
+ if self.stac_file.startswith("http")
+ else os.path.abspath(self.stac_file)
+ )
+ except Exception as e:
+ click.secho(f"❌ Error reading {self.stac_file}: {e}", fg="red", bold=True)
+ self.valid = False
+ return
+
+ # Recursively validate the root and all children (API mode)
+ results = []
+ visited = set()
+ visited.add(root_path)
+ self._progress_count = 0
+
+ if not self.quiet:
+ click.secho(
+ "🧠 Compiling/warming schemas (first objects may be slower)...",
+ fg="cyan",
+ dim=True,
+ )
+
+ self._validate_recursive(root_data, root_path, results, visited, is_api=True)
+
+ if self.limit is not None and not self.quiet and len(results) >= self.limit:
+ click.secho(
+ f"🔢 Validation limit reached ({self.limit} objects).",
+ fg="yellow",
+ )
+
+ # Display results
+ click.echo("\n" + "=" * 55)
+ click.secho("📊 STAC API VALIDATION SUMMARY", bold=True, fg="blue")
+ click.echo("=" * 55)
+
+ valid_count = sum(1 for r in results if r["valid_stac"])
+ invalid_count = len(results) - valid_count
+ elapsed_ms = (time.perf_counter() - start_time) * 1000
+
+ click.echo(f"Total Objects Validated: {len(results)}")
+ click.echo(f"Valid Objects: {valid_count}")
+ click.echo(f"Invalid Objects: {invalid_count}")
+ click.echo(f"Execution Time: {elapsed_ms:.2f} ms")
+
+ if invalid_count > 0:
+ click.echo("\n" + "=" * 55)
+ click.secho("🚨 INVALID OBJECTS", bold=True, fg="red")
+ click.echo("=" * 55)
+
+ # Group errors by message
+ error_groups = {}
+ for result in results:
+ if not result["valid_stac"]:
+ error_msg = result.get("error_message", "Unknown error")
+ if error_msg not in error_groups:
+ error_groups[error_msg] = []
+ # Store both path and ID for better identification
+ object_id = result.get("id", "unknown")
+ error_groups[error_msg].append(
+ {"path": result["path"], "id": object_id}
+ )
+
+ # Display grouped errors
+ for error_msg, items in error_groups.items():
+ click.echo(f"\n❌ {error_msg}")
+ click.echo(f" Affected Objects: {len(items)}")
+ # Show first 5 examples
+ for item in items[:5]:
+ item_id = item["id"] if item["id"] != "unknown" else ""
+ if item_id:
+ click.echo(f" - {item['path']} (ID: {item_id})")
+ else:
+ click.echo(f" - {item['path']}")
+ if len(items) > 5:
+ click.echo(f" ... and {len(items) - 5} more")
+
+ # Set overall validity
+ self.valid = all(r.get("valid_stac", False) for r in results)
+ self.message = results
+
+ def _validate_recursive(
+ self,
+ data: Dict[str, Any],
+ file_path: str,
+ results: List[Dict],
+ visited: Set[str],
+ is_api: bool = False,
+ collection_id: Optional[str] = None,
+ prefetched_resources: Optional[Dict[str, Dict[str, Any]]] = None,
+ ):
+ """Recursively validate a STAC object and its children.
+
+ Args:
+ data: The STAC object to validate
+ file_path: Path or URL to the object
+ results: List to accumulate validation results
+ visited: Set of already-visited paths to prevent circular references
+ is_api: If True, follow API-specific links (data, items, next); if False, follow catalog links (child, item)
+ collection_id: Optional collection ID for items from FeatureCollections
+ """
+ if self._limit_reached(results):
+ return
+
+ # Log progress in API mode
+ if is_api and not self.quiet:
+ self._progress_count += 1
+ object_id = data.get("id", "unknown")
+ object_type = data.get("type", "unknown")
+ if collection_id and object_type == "Feature":
+ click.secho(
+ f" [{self._progress_count}] Validating {object_type}: {object_id} (Collection: {collection_id})",
+ fg="cyan",
+ dim=True,
+ )
+ else:
+ click.secho(
+ f" [{self._progress_count}] Validating {object_type}: {object_id}",
+ fg="cyan",
+ dim=True,
+ )
+
+ # Determine STAC type - could be "Catalog", "Collection", or "Feature" (Item)
+ raw_type = data.get("type", "unknown")
+ if raw_type == "Feature":
+ stac_type = "item"
+ elif raw_type == "Collection":
+ stac_type = "collection"
+ elif raw_type == "Catalog":
+ stac_type = "catalog"
+ else:
+ stac_type = raw_type.lower() if raw_type else "unknown"
+
+ stac_version = data.get("stac_version", "unknown")
+
+ # Validate current object using get_validator (same as run() does)
+ # Skip validation for STAC API responses (they have conformsTo instead of stac_extensions)
+ is_stac_api = "conformsTo" in data
+
+ if is_stac_api:
+ # STAC API catalogs don't validate against STAC schemas, just mark as valid
+ is_valid = True
+ error_msg = None
+ else:
+ try:
+ extensions = data.get("stac_extensions", [])
+
+ # Mute noisy "[Fallback]" and "[Network]" prints from validation execution path
+ with redirect_stdout(io.StringIO()), redirect_stderr(io.StringIO()):
+ validator, _ = get_validator(stac_type, stac_version, extensions)
+ validator(data)
+
+ is_valid = True
+ error_msg = None
+ except fastjsonschema.JsonSchemaValueException as e:
+ is_valid = False
+ error_msg = f"{e.name} {e.message.replace(e.name, '').strip()}"
+ except Exception as e:
+ if self._is_ref_resolution_error(e):
+ try:
+ self._validate_with_jsonschema_fallback(
+ data,
+ stac_type,
+ stac_version,
+ extensions,
+ )
+ is_valid = True
+ error_msg = None
+ except Exception as fallback_err:
+ is_valid = False
+ error_msg = str(fallback_err)
+ else:
+ is_valid = False
+ error_msg = str(e)
+
+ # Create result for this object
+ # Extract ID if available
+ object_id = data.get("id", "unknown")
+
+ result = {
+ "path": file_path,
+ "id": object_id,
+ "valid_stac": is_valid,
+ "stac_type": stac_type,
+ "stac_version": stac_version,
+ }
+ if error_msg:
+ result["error_message"] = error_msg
+
+ results.append(result)
+
+ if self._limit_reached(results):
+ return
+
+ # Process child links
+ base_dir = (
+ os.path.dirname(file_path)
+ if not file_path.startswith("http")
+ else file_path.rsplit("/", 1)[0]
+ )
+ links = data.get("links", [])
+
+ for link in links:
+ if self._limit_reached(results):
+ break
+
+ rel = link.get("rel", "")
+ href = link.get("href", "")
+
+ # Determine if we should follow this link based on mode
+ should_follow = False
+ if is_api:
+ # API mode: follow "data" (collections), "child", "item", and "items" links
+ if rel in ["data", "child", "item", "items"] and href:
+ should_follow = True
+ else:
+ # Local mode: follow "child" and "item" links only
+ if rel in ["child", "item"] and href:
+ should_follow = True
+
+ if should_follow:
+ # Resolve relative path
+ if href.startswith("http"):
+ child_path = href
+ else:
+ child_path = os.path.normpath(os.path.join(base_dir, href))
+
+ if child_path in visited:
+ continue
+ visited.add(child_path)
+
+ # Load and validate child
+ try:
+ if prefetched_resources and child_path in prefetched_resources:
+ child_data = prefetched_resources[child_path]
+ else:
+ if is_api and not self.quiet and rel in ["data", "items"]:
+ label = "collections" if rel == "data" else "items"
+ click.secho(
+ f" Discovering {label}: {child_path}",
+ fg="cyan",
+ dim=True,
+ )
+ child_data = self._load_json_resource(child_path)
+
+ # If this is a collections list endpoint, extract individual collections
+ if rel == "data" and is_api and isinstance(child_data, dict):
+ collections = child_data.get("collections", [])
+ if collections:
+ # This is a collections list - process each collection
+ collection_urls = []
+ for collection in collections:
+ collection_id = collection.get("id")
+ if collection_id:
+ collection_urls.append(
+ f"{child_path.rstrip('/')}/{collection_id}"
+ )
+
+ # Avoid prefetching beyond remaining validation capacity.
+ if self.limit is not None:
+ remaining = max(1, self.limit - len(results))
+ collection_urls = collection_urls[:remaining]
+
+ for (
+ collection_url,
+ prefetched_collection_resources,
+ load_error,
+ ) in self._prefetch_api_collection_resources_batch(
+ collection_urls
+ ):
+ if self._limit_reached(results):
+ break
+
+ if load_error is not None:
+ results.append(
+ {
+ "path": collection_url,
+ "valid_stac": False,
+ "error_message": f"Failed to load: {str(load_error)}",
+ }
+ )
+ continue
+
+ visited.add(collection_url)
+ if prefetched_collection_resources is None:
+ continue
+ collection_data = prefetched_collection_resources[
+ collection_url
+ ]
+
+ self._validate_recursive(
+ collection_data,
+ collection_url,
+ results,
+ visited,
+ is_api,
+ prefetched_resources=prefetched_collection_resources,
+ )
+ else:
+ # Not a collections list, validate as normal
+ self._validate_recursive(
+ child_data, child_path, results, visited, is_api
+ )
+ # If this is an items endpoint (GeoJSON FeatureCollection), validate only Features
+ elif rel == "items" and is_api and isinstance(child_data, dict):
+ features = child_data.get("features")
+
+ # Extract collection ID from URL (e.g., /collections/{id}/items)
+ collection_id_from_items: Optional[str] = None
+ if "/collections/" in child_path:
+ parts = child_path.split("/collections/")
+ if len(parts) > 1:
+ collection_parts = parts[1].split("/items")
+ collection_id_from_items = (
+ collection_parts[0] if collection_parts else None
+ )
+
+ # Validate each feature item from the items page, not the FeatureCollection container.
+ if isinstance(features, list):
+ for feature in features:
+ if self._limit_reached(results):
+ break
+
+ item_id = feature.get("id", "unknown")
+ item_path = f"{child_path}#{item_id}"
+ self._validate_recursive(
+ feature,
+ item_path,
+ results,
+ visited,
+ is_api,
+ collection_id_from_items,
+ )
+ else:
+ # Recursively validate child
+ self._validate_recursive(
+ child_data, child_path, results, visited, is_api
+ )
+ except Exception as e:
+ if self._limit_reached(results):
+ break
+
+ results.append(
+ {
+ "path": child_path,
+ "valid_stac": False,
+ "error_message": f"Failed to load: {str(e)}",
+ }
+ )
=====================================
stac_validator/stac_validator.py
=====================================
@@ -539,11 +539,49 @@ def batch(
is_flag=True,
help="Show full validation logs for all items. By default, a limited sample of item logs is shown.",
)
-def fast(stac_file: str, quiet: bool, verbose: bool):
+ at click.option(
+ "--recursive",
+ "-r",
+ is_flag=True,
+ help="Recursively validate all child catalogs, collections, and items.",
+)
+ at click.option(
+ "--api",
+ "-a",
+ is_flag=True,
+ help="Validate a STAC API catalog recursively (follows data, child, item, and items links).",
+)
+ at click.option(
+ "--limit",
+ type=click.IntRange(min=1),
+ default=None,
+ help="Limit number of STAC objects to validate.",
+)
+def fast(
+ stac_file: str,
+ quiet: bool,
+ verbose: bool,
+ recursive: bool,
+ api: bool,
+ limit: Optional[int],
+):
"""High-speed validation using fastjsonschema and local caching."""
+ if api and not stac_file.startswith(("http://", "https://")):
+ click.secho(
+ "❌ Invalid STAC API URL. Include 'http://' or 'https://' (example: https://example.com/stac).",
+ fg="red",
+ bold=True,
+ )
+ sys.exit(1)
+
try:
- fv = FastValidator(stac_file, quiet=quiet, verbose=verbose)
- fv.run()
+ fv = FastValidator(stac_file, quiet=quiet, verbose=verbose, limit=limit)
+ if api:
+ fv.run_api()
+ elif recursive:
+ fv.run_recursive()
+ else:
+ fv.run()
sys.exit(0 if fv.valid else 1)
except RuntimeError as e:
click.secho(f"\n🚨 FATAL ERROR: {e}", fg="red", bold=True)
=====================================
stac_validator/utilities.py
=====================================
@@ -3,7 +3,7 @@ import json
import logging
import os
import ssl
-from typing import Dict, Optional, Tuple
+from typing import Dict, Optional, Set, Tuple
from urllib.parse import urlparse
from urllib.request import Request, urlopen
@@ -233,7 +233,7 @@ def _fetch_and_parse_schema_cache_clear() -> None:
_schema_cache.cache_clear()
-_cached_schemas = set()
+_cached_schemas: Set[str] = set()
def _map_extension_url_to_local(url: str) -> str:
=====================================
tests/test_fast_validator.py
=====================================
@@ -210,6 +210,678 @@ class TestFastValidatorOptions:
assert "[1]" in captured.out
assert "silencing output" in captured.out
+ def test_limit_reduces_validated_objects(self, tmp_path):
+ """Test limit option validates only the first N objects."""
+ fc_path = tmp_path / "limited_fc.json"
+ fc_data = {
+ "type": "FeatureCollection",
+ "features": [
+ {
+ "stac_version": "1.0.0",
+ "type": "Feature",
+ "id": f"item-{i}",
+ "geometry": None,
+ "properties": {"datetime": "2023-01-01T00:00:00Z"},
+ "links": [{"rel": "self", "href": "http://example.com"}],
+ "assets": {},
+ }
+ for i in range(10)
+ ],
+ }
+ fc_path.write_text(json.dumps(fc_data))
+
+ fv = FastValidator(str(fc_path), quiet=True, limit=3)
+ fv.run()
+
+ msg = fv.message[0]
+ assert msg["total_objects"] == 3
+ assert msg["valid_objects"] == 3
+ assert msg["invalid_objects"] == 0
+
+ def test_limit_above_total_does_not_change_count(self, valid_feature_collection):
+ """Test limit larger than object count validates all objects."""
+ fv = FastValidator(valid_feature_collection, quiet=True, limit=20)
+ fv.run()
+
+ msg = fv.message[0]
+ assert msg["total_objects"] == 5
+ assert msg["valid_objects"] == 5
+
+
+class TestFastValidatorRunDict:
+ """Test in-memory dictionary validation entrypoint."""
+
+ def test_run_dict_valid_item(self):
+ payload = {
+ "stac_version": "1.0.0",
+ "type": "Feature",
+ "id": "test-item",
+ "geometry": None,
+ "properties": {"datetime": "2023-01-01T00:00:00Z"},
+ "links": [{"rel": "self", "href": "http://example.com"}],
+ "assets": {},
+ }
+
+ fv = FastValidator("", quiet=True)
+ fv.run_dict(payload)
+
+ assert fv.valid is True
+ assert fv.message[0]["path"] == "in-memory"
+ assert fv.message[0]["total_objects"] == 1
+ assert fv.message[0]["valid_objects"] == 1
+ assert fv.message[0]["invalid_objects"] == 0
+
+ def test_run_dict_invalid_item(self):
+ payload = {
+ "stac_version": "1.0.0",
+ "type": "Feature",
+ "geometry": None,
+ "properties": {"datetime": "2023-01-01T00:00:00Z"},
+ "links": [],
+ "assets": {},
+ }
+
+ fv = FastValidator("", quiet=True)
+ fv.run_dict(payload)
+
+ assert fv.valid is False
+ assert fv.message[0]["total_objects"] == 1
+ assert fv.message[0]["valid_objects"] == 0
+ assert fv.message[0]["invalid_objects"] == 1
+ assert len(fv.message[0]["errors"]) > 0
+
+ def test_run_dict_feature_collection_limit(self):
+ payload = {
+ "type": "FeatureCollection",
+ "features": [
+ {
+ "stac_version": "1.0.0",
+ "type": "Feature",
+ "id": f"item-{i}",
+ "geometry": None,
+ "properties": {"datetime": "2023-01-01T00:00:00Z"},
+ "links": [{"rel": "self", "href": "http://example.com"}],
+ "assets": {},
+ }
+ for i in range(5)
+ ],
+ }
+
+ fv = FastValidator("", quiet=True, limit=2)
+ fv.run_dict(payload)
+
+ assert fv.valid is True
+ assert fv.message[0]["input_objects"] == 5
+ assert fv.message[0]["total_objects"] == 2
+ assert fv.message[0]["valid_objects"] == 2
+
+
+class TestFastValidatorRecursiveAndApi:
+ """Test recursive and API traversal behavior."""
+
+ def test_load_collection_documents_keeps_order_and_errors(self, monkeypatch):
+ """Test parallel collection loading preserves URL order and captures per-URL errors."""
+
+ def _fake_load(self, resource_path):
+ if resource_path.endswith("two"):
+ raise RuntimeError("boom")
+ return {"id": resource_path.rsplit("/", 1)[-1]}
+
+ monkeypatch.setattr(FastValidator, "_load_json_resource", _fake_load)
+
+ fv = FastValidator("https://api.example.com", quiet=True)
+ loaded = fv._load_collection_documents(
+ [
+ "https://api.example.com/one",
+ "https://api.example.com/two",
+ "https://api.example.com/three",
+ ]
+ )
+
+ assert [entry[0] for entry in loaded] == [
+ "https://api.example.com/one",
+ "https://api.example.com/two",
+ "https://api.example.com/three",
+ ]
+ assert loaded[0][1] == {"id": "one"}
+ assert loaded[0][2] is None
+ assert loaded[1][1] is None
+ assert str(loaded[1][2]) == "boom"
+ assert loaded[2][1] == {"id": "three"}
+ assert loaded[2][2] is None
+
+ def test_prefetch_api_collection_resources_batch_prefetches_items(
+ self, monkeypatch
+ ):
+ """Test API collection prefetch preserves order and includes items pages."""
+
+ payloads = {
+ "https://api.example.com/one": {
+ "id": "one",
+ "links": [
+ {
+ "rel": "items",
+ "href": "https://api.example.com/one/items",
+ }
+ ],
+ },
+ "https://api.example.com/one/items": {
+ "type": "FeatureCollection",
+ "features": [],
+ },
+ "https://api.example.com/two": {"id": "two", "links": []},
+ }
+
+ def _fake_load(self, resource_path):
+ return payloads[resource_path]
+
+ monkeypatch.setattr(FastValidator, "_load_json_resource", _fake_load)
+
+ fv = FastValidator("https://api.example.com", quiet=True)
+ loaded = fv._prefetch_api_collection_resources_batch(
+ [
+ "https://api.example.com/one",
+ "https://api.example.com/two",
+ ]
+ )
+
+ assert [entry[0] for entry in loaded] == [
+ "https://api.example.com/one",
+ "https://api.example.com/two",
+ ]
+ assert loaded[0][1]["https://api.example.com/one"]["id"] == "one"
+ assert (
+ loaded[0][1]["https://api.example.com/one/items"]["type"]
+ == "FeatureCollection"
+ )
+ assert loaded[1][1]["https://api.example.com/two"]["id"] == "two"
+
+ def test_api_prefetch_truncates_to_remaining_limit(self, monkeypatch):
+ """Test API data-link prefetch list is trimmed to remaining validation capacity."""
+
+ def _ok_validator(data):
+ return None
+
+ monkeypatch.setattr(
+ "stac_validator.fast_validator.get_validator",
+ lambda *args, **kwargs: (_ok_validator, True),
+ )
+
+ payloads = {
+ "https://api.example.com": {
+ "conformsTo": ["https://api.stacspec.org/v1.0.0/core"],
+ "id": "api-root",
+ "type": "Catalog",
+ "description": "api root",
+ "links": [
+ {"rel": "data", "href": "https://api.example.com/collections"}
+ ],
+ },
+ "https://api.example.com/collections": {
+ "collections": [
+ {"id": "c1"},
+ {"id": "c2"},
+ {"id": "c3"},
+ ]
+ },
+ }
+
+ def _fake_load(self, resource_path):
+ return payloads[resource_path]
+
+ captured = []
+
+ def _fake_prefetch(self, collection_urls):
+ captured.extend(collection_urls)
+ return []
+
+ monkeypatch.setattr(FastValidator, "_load_json_resource", _fake_load)
+ monkeypatch.setattr(
+ FastValidator,
+ "_prefetch_api_collection_resources_batch",
+ _fake_prefetch,
+ )
+
+ fv = FastValidator("https://api.example.com", quiet=True, limit=2)
+ fv.run_api()
+
+ # One slot is consumed by the root catalog, so only one collection should be prefetched.
+ assert captured == ["https://api.example.com/collections/c1"]
+
+ def test_recursive_mode_respects_limit(self, tmp_path, monkeypatch):
+ """Test recursive validation follows links and stops at limit."""
+
+ def _ok_validator(data):
+ return None
+
+ monkeypatch.setattr(
+ "stac_validator.fast_validator.get_validator",
+ lambda *args, **kwargs: (_ok_validator, True),
+ )
+
+ root = {
+ "stac_version": "1.0.0",
+ "type": "Catalog",
+ "id": "root",
+ "description": "root catalog",
+ "links": [{"rel": "child", "href": "child.json"}],
+ }
+ child = {
+ "stac_version": "1.0.0",
+ "type": "Catalog",
+ "id": "child",
+ "description": "child catalog",
+ "links": [
+ {"rel": "item", "href": "item-1.json"},
+ {"rel": "item", "href": "item-2.json"},
+ ],
+ }
+ item_1 = {
+ "stac_version": "1.0.0",
+ "type": "Feature",
+ "id": "item-1",
+ "geometry": None,
+ "properties": {"datetime": "2023-01-01T00:00:00Z"},
+ "links": [{"rel": "self", "href": "http://example.com/item-1"}],
+ "assets": {},
+ }
+ item_2 = {
+ "stac_version": "1.0.0",
+ "type": "Feature",
+ "id": "item-2",
+ "geometry": None,
+ "properties": {"datetime": "2023-01-01T00:00:00Z"},
+ "links": [{"rel": "self", "href": "http://example.com/item-2"}],
+ "assets": {},
+ }
+
+ root_path = tmp_path / "catalog.json"
+ (tmp_path / "child.json").write_text(json.dumps(child))
+ (tmp_path / "item-1.json").write_text(json.dumps(item_1))
+ (tmp_path / "item-2.json").write_text(json.dumps(item_2))
+ root_path.write_text(json.dumps(root))
+
+ fv = FastValidator(str(root_path), quiet=True, limit=2)
+ fv.run_recursive()
+
+ assert fv.valid is True
+ assert len(fv.message) == 2
+ assert fv.message[0]["id"] == "root"
+ assert fv.message[1]["id"] == "child"
+
+ def test_recursive_mode_summary_includes_execution_time(
+ self, tmp_path, monkeypatch, capsys
+ ):
+ """Test recursive mode keeps recursive summary format and includes execution time."""
+
+ def _ok_validator(data):
+ return None
+
+ monkeypatch.setattr(
+ "stac_validator.fast_validator.get_validator",
+ lambda *args, **kwargs: (_ok_validator, True),
+ )
+
+ root = {
+ "stac_version": "1.0.0",
+ "type": "Catalog",
+ "id": "root",
+ "description": "root catalog",
+ "links": [{"rel": "child", "href": "child.json"}],
+ }
+ child = {
+ "stac_version": "1.0.0",
+ "type": "Catalog",
+ "id": "child",
+ "description": "child catalog",
+ "links": [],
+ }
+
+ root_path = tmp_path / "catalog.json"
+ (tmp_path / "child.json").write_text(json.dumps(child))
+ root_path.write_text(json.dumps(root))
+
+ fv = FastValidator(str(root_path), quiet=False, verbose=True)
+ fv.run_recursive()
+
+ captured = capsys.readouterr()
+ assert "RECURSIVE VALIDATION SUMMARY" in captured.out
+ assert "Execution Time" in captured.out
+
+ def test_api_mode_respects_limit(self, monkeypatch):
+ """Test API validation follows API links and stops at limit."""
+
+ def _ok_validator(data):
+ return None
+
+ monkeypatch.setattr(
+ "stac_validator.fast_validator.get_validator",
+ lambda *args, **kwargs: (_ok_validator, True),
+ )
+
+ payloads = {
+ "https://api.example.com": {
+ "conformsTo": ["https://api.stacspec.org/v1.0.0/core"],
+ "id": "api-root",
+ "type": "Catalog",
+ "description": "api root",
+ "links": [
+ {"rel": "data", "href": "https://api.example.com/collections"}
+ ],
+ },
+ "https://api.example.com/collections": {
+ "collections": [{"id": "demo-collection"}],
+ },
+ "https://api.example.com/collections/demo-collection": {
+ "stac_version": "1.0.0",
+ "type": "Collection",
+ "id": "demo-collection",
+ "description": "demo",
+ "license": "MIT",
+ "extent": {
+ "spatial": {"bbox": [[-180, -90, 180, 90]]},
+ "temporal": {"interval": [["2023-01-01T00:00:00Z", None]]},
+ },
+ "links": [
+ {
+ "rel": "items",
+ "href": "https://api.example.com/collections/demo-collection/items",
+ }
+ ],
+ },
+ "https://api.example.com/collections/demo-collection/items": {
+ "type": "FeatureCollection",
+ "features": [
+ {
+ "stac_version": "1.0.0",
+ "type": "Feature",
+ "id": "item-1",
+ "geometry": None,
+ "properties": {"datetime": "2023-01-01T00:00:00Z"},
+ "links": [
+ {
+ "rel": "self",
+ "href": "https://api.example.com/items/item-1",
+ }
+ ],
+ "assets": {},
+ }
+ ],
+ },
+ }
+
+ class _Response:
+ def __init__(self, data):
+ self._data = data
+
+ def raise_for_status(self):
+ return None
+
+ def json(self):
+ return self._data
+
+ def _fake_get(url, timeout=15):
+ if url not in payloads:
+ raise RuntimeError(f"Unexpected URL: {url}")
+ return _Response(payloads[url])
+
+ monkeypatch.setattr("stac_validator.fast_validator.HTTP_SESSION.get", _fake_get)
+
+ fv = FastValidator("https://api.example.com", quiet=True, limit=2)
+ fv.run_api()
+
+ assert fv.valid is True
+ assert len(fv.message) == 2
+ assert fv.message[0]["id"] == "api-root"
+ assert fv.message[1]["id"] == "demo-collection"
+
+ def test_api_mode_summary_includes_execution_time(self, monkeypatch, capsys):
+ """Test API mode keeps API summary format and includes execution time."""
+
+ def _ok_validator(data):
+ return None
+
+ monkeypatch.setattr(
+ "stac_validator.fast_validator.get_validator",
+ lambda *args, **kwargs: (_ok_validator, True),
+ )
+
+ payloads = {
+ "https://api.example.com": {
+ "conformsTo": ["https://api.stacspec.org/v1.0.0/core"],
+ "id": "api-root",
+ "type": "Catalog",
+ "description": "api root",
+ "links": [
+ {"rel": "data", "href": "https://api.example.com/collections"}
+ ],
+ },
+ "https://api.example.com/collections": {
+ "collections": [{"id": "demo-collection"}],
+ },
+ "https://api.example.com/collections/demo-collection": {
+ "stac_version": "1.0.0",
+ "type": "Collection",
+ "id": "demo-collection",
+ "description": "demo",
+ "license": "MIT",
+ "extent": {
+ "spatial": {"bbox": [[-180, -90, 180, 90]]},
+ "temporal": {"interval": [["2023-01-01T00:00:00Z", None]]},
+ },
+ "links": [],
+ },
+ }
+
+ class _Response:
+ def __init__(self, data):
+ self._data = data
+
+ def raise_for_status(self):
+ return None
+
+ def json(self):
+ return self._data
+
+ def _fake_get(url, timeout=15):
+ if url not in payloads:
+ raise RuntimeError(f"Unexpected URL: {url}")
+ return _Response(payloads[url])
+
+ monkeypatch.setattr("stac_validator.fast_validator.HTTP_SESSION.get", _fake_get)
+
+ fv = FastValidator("https://api.example.com", quiet=False, verbose=True)
+ fv.run_api()
+
+ captured = capsys.readouterr()
+ assert "[1] Validating Catalog: api-root" in captured.out
+ assert "STAC API VALIDATION SUMMARY" in captured.out
+ assert "Execution Time" in captured.out
+
+ def test_api_mode_does_not_validate_items_featurecollection(self, monkeypatch):
+ """Test API mode validates item features, not the /items FeatureCollection object."""
+
+ def _ok_validator(data):
+ return None
+
+ monkeypatch.setattr(
+ "stac_validator.fast_validator.get_validator",
+ lambda *args, **kwargs: (_ok_validator, True),
+ )
+
+ payloads = {
+ "https://api.example.com": {
+ "conformsTo": ["https://api.stacspec.org/v1.0.0/core"],
+ "id": "api-root",
+ "type": "Catalog",
+ "description": "api root",
+ "links": [
+ {"rel": "data", "href": "https://api.example.com/collections"}
+ ],
+ },
+ "https://api.example.com/collections": {
+ "collections": [{"id": "demo-collection"}],
+ },
+ "https://api.example.com/collections/demo-collection": {
+ "stac_version": "1.0.0",
+ "type": "Collection",
+ "id": "demo-collection",
+ "description": "demo",
+ "license": "MIT",
+ "extent": {
+ "spatial": {"bbox": [[-180, -90, 180, 90]]},
+ "temporal": {"interval": [["2023-01-01T00:00:00Z", None]]},
+ },
+ "links": [
+ {
+ "rel": "items",
+ "href": "https://api.example.com/collections/demo-collection/items",
+ }
+ ],
+ },
+ "https://api.example.com/collections/demo-collection/items": {
+ "type": "FeatureCollection",
+ "features": [],
+ "links": [],
+ },
+ }
+
+ class _Response:
+ def __init__(self, data):
+ self._data = data
+
+ def raise_for_status(self):
+ return None
+
+ def json(self):
+ return self._data
+
+ def _fake_get(url, timeout=15):
+ if url not in payloads:
+ raise RuntimeError(f"Unexpected URL: {url}")
+ return _Response(payloads[url])
+
+ monkeypatch.setattr("stac_validator.fast_validator.HTTP_SESSION.get", _fake_get)
+
+ fv = FastValidator("https://api.example.com", quiet=True)
+ fv.run_api()
+
+ assert fv.valid is True
+ assert len(fv.message) == 2
+ paths = {entry["path"] for entry in fv.message}
+ assert "https://api.example.com/collections/demo-collection/items" not in paths
+
+
+class TestFastValidatorRefResolutionFallback:
+ """Test fallback behavior when fast path hits ref-resolution errors."""
+
+ def test_run_falls_back_to_jsonschema_on_ref_error(self, valid_item, monkeypatch):
+ """run() should retry via jsonschema resolver on ref-resolution errors."""
+
+ class FakeRefError(Exception):
+ pass
+
+ def _raise_ref_error(_data):
+ raise FakeRefError("Unresolvable JSON pointer: 'definitions/link'")
+
+ monkeypatch.setattr(
+ "stac_validator.fast_validator.get_validator",
+ lambda *args, **kwargs: (_raise_ref_error, True),
+ )
+
+ fallback_calls = []
+
+ def _fallback(schema_path, content):
+ fallback_calls.append(schema_path)
+
+ monkeypatch.setattr(
+ "stac_validator.fast_validator.validate_with_ref_resolver",
+ _fallback,
+ )
+
+ fv = FastValidator(valid_item, quiet=True)
+ fv.run()
+
+ assert fv.valid is True
+ assert len(fallback_calls) == 1
+ assert "item-spec/json-schema/item.json" in fallback_calls[0]
+ assert fv.message[0]["invalid_objects"] == 0
+
+ def test_run_api_falls_back_to_jsonschema_on_ref_error(self, monkeypatch):
+ """run_api() should retry via jsonschema resolver on ref-resolution errors."""
+
+ class FakeRefError(Exception):
+ pass
+
+ def _raise_ref_error(_data):
+ raise FakeRefError("Unresolvable JSON pointer: 'definitions/asset'")
+
+ monkeypatch.setattr(
+ "stac_validator.fast_validator.get_validator",
+ lambda *args, **kwargs: (_raise_ref_error, True),
+ )
+
+ fallback_calls = []
+
+ def _fallback(schema_path, content):
+ fallback_calls.append(schema_path)
+
+ monkeypatch.setattr(
+ "stac_validator.fast_validator.validate_with_ref_resolver",
+ _fallback,
+ )
+
+ payloads = {
+ "https://api.example.com": {
+ "conformsTo": ["https://api.stacspec.org/v1.0.0/core"],
+ "id": "api-root",
+ "type": "Catalog",
+ "description": "api root",
+ "links": [
+ {"rel": "data", "href": "https://api.example.com/collections"}
+ ],
+ },
+ "https://api.example.com/collections": {
+ "collections": [{"id": "demo-collection"}],
+ },
+ "https://api.example.com/collections/demo-collection": {
+ "stac_version": "1.0.0",
+ "type": "Collection",
+ "id": "demo-collection",
+ "description": "demo",
+ "license": "MIT",
+ "extent": {
+ "spatial": {"bbox": [[-180, -90, 180, 90]]},
+ "temporal": {"interval": [["2023-01-01T00:00:00Z", None]]},
+ },
+ "links": [],
+ },
+ }
+
+ class _Response:
+ def __init__(self, data):
+ self._data = data
+
+ def raise_for_status(self):
+ return None
+
+ def json(self):
+ return self._data
+
+ def _fake_get(url, timeout=15):
+ if url not in payloads:
+ raise RuntimeError(f"Unexpected URL: {url}")
+ return _Response(payloads[url])
+
+ monkeypatch.setattr("stac_validator.fast_validator.HTTP_SESSION.get", _fake_get)
+
+ fv = FastValidator("https://api.example.com", quiet=True)
+ fv.run_api()
+
+ assert fv.valid is True
+ assert len(fallback_calls) == 1
+ assert "collection-spec/json-schema/collection.json" in fallback_calls[0]
+
class TestFastValidatorDetection:
"""Test STAC type detection."""
=====================================
tests/test_sys_exit.py
=====================================
@@ -52,3 +52,19 @@ def test_cli_schema_cache_size_option():
],
check=True,
)
+
+
+def test_fast_api_requires_url_scheme():
+ result = subprocess.run(
+ [
+ "stac-validator",
+ "fast",
+ "--api",
+ "stac.opensearch.dataspace.copernicus.eu/v1",
+ ],
+ capture_output=True,
+ text=True,
+ )
+
+ assert result.returncode == 1
+ assert "Invalid STAC API URL" in (result.stdout + result.stderr)
View it on GitLab: https://salsa.debian.org/debian-gis-team/stac-validator/-/compare/6cf80b475c0aebe6451b571e57a97aeffae9bfae...86a07aeb7e5df8eab01f1af5e89e85e89d72b046
--
View it on GitLab: https://salsa.debian.org/debian-gis-team/stac-validator/-/compare/6cf80b475c0aebe6451b571e57a97aeffae9bfae...86a07aeb7e5df8eab01f1af5e89e85e89d72b046
You're receiving this email because of your account on salsa.debian.org. Manage all notifications: https://salsa.debian.org/-/profile/notifications | Help: https://salsa.debian.org/help
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/pkg-grass-devel/attachments/20260601/4c2064bf/attachment-0001.htm>
More information about the Pkg-grass-devel
mailing list