[Git][debian-gis-team/pooch][upstream] New upstream version 1.7.0
Antonio Valentino (@antonio.valentino)
gitlab at salsa.debian.org
Sun Jun 11 15:50:34 BST 2023
Antonio Valentino pushed to branch upstream at Debian GIS Project / pooch
Commits:
178bd166 by Antonio Valentino at 2023-06-10T15:33:53+00:00
New upstream version 1.7.0
- - - - -
28 changed files:
- .github/workflows/docs.yml
- .github/workflows/pypi.yml
- .github/workflows/test.yml
- AUTHORS.md
- + CITATION.cff
- CITATION.rst
- + README.md
- − README.rst
- doc/changes.rst
- doc/conf.py
- doc/downloaders.rst
- doc/install.rst
- doc/protocols.rst
- doc/versions.rst
- env/requirements-docs.txt
- environment.yml
- pooch/core.py
- pooch/downloaders.py
- pooch/hashes.py
- pooch/processors.py
- + pooch/tests/data/registry-spaces.txt
- pooch/tests/test_core.py
- pooch/tests/test_downloaders.py
- pooch/tests/test_processors.py
- pooch/tests/test_utils.py
- pooch/tests/utils.py
- pooch/utils.py
- setup.cfg
Changes:
=====================================
.github/workflows/docs.yml
=====================================
@@ -30,7 +30,6 @@ jobs:
runs-on: ubuntu-latest
env:
REQUIREMENTS: env/requirements-build.txt env/requirements-docs.txt
- PYTHON: 3.9
steps:
# Cancel any previous run of the test job
@@ -61,7 +60,7 @@ jobs:
- name: Setup Python
uses: actions/setup-python at v2
with:
- python-version: ${{ env.PYTHON }}
+ python-version: "3.10"
- name: Collect requirements
run: |
=====================================
.github/workflows/pypi.yml
=====================================
@@ -47,7 +47,7 @@ jobs:
- name: Setup Python
uses: actions/setup-python at v2
with:
- python-version: "3.9"
+ python-version: "3.10"
- name: Install requirements
run: |
=====================================
.github/workflows/test.yml
=====================================
@@ -20,7 +20,7 @@ on:
schedule:
# Run every Monday at 12:00 UTC
# * is a special character in YAML so you have to quote this string
- - cron: '00 12 * * 1'
+ - cron: "00 12 * * 1"
# Use bash by default in all jobs
defaults:
@@ -28,7 +28,6 @@ defaults:
shell: bash
jobs:
-
#############################################################################
# Run tests and upload to codecov
test:
@@ -44,7 +43,7 @@ jobs:
- macos
- windows
python:
- - "3.6"
+ - "3.7"
- "3.10"
dependencies:
- latest
@@ -148,9 +147,9 @@ jobs:
run: coverage xml
- name: Upload coverage to Codecov
- uses: codecov/codecov-action at v1
+ uses: codecov/codecov-action at v3
with:
- file: ./coverage.xml
+ files: ./coverage.xml
env_vars: OS,PYTHON,DEPENDENCIES
# Don't mark the job as failed if the upload fails for some reason.
# It does sometimes but shouldn't be the reason for running
=====================================
AUTHORS.md
=====================================
@@ -10,8 +10,10 @@ order by last name) and are considered "The Pooch Developers":
* [Mark Harfouche](https://github.com/hmaarrfk) - Ramona Optics Inc. - [0000-0002-4657-4603](https://orcid.org/0000-0002-4657-4603)
* [Danilo Horta](https://github.com/horta) - EMBL-EBI, UK
* [Hugo van Kemenade](https://github.com/hugovk) - Independent (Non-affiliated) (ORCID: [0000-0001-5715-8632](https://www.orcid.org/0000-0001-5715-8632))
+* [Dominic Kempf](https://github.com/dokempf) - Scientific Software Center, Heidelberg University, Germany (ORCID: [0000-0002-6140-2332](https://www.orcid.org/0000-0002-6140-2332))
* [Kacper Kowalik](https://github.com/Xarthisius) - National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, USA (ORCID: [0000-0003-1709-3744](https://www.orcid.org/0000-0003-1709-3744))
* [John Leeman](https://github.com/jrleeman)
+* [Björn Ludwig](https://github.com/BjoernLudwigPTB) - Physikalisch-Technische Bundesanstalt, Germany (ORCID: [0000-0002-5910-9137](https://www.orcid.org/0000-0002-5910-9137))
* [Daniel McCloy](https://github.com/drammock) - University of Washington, USA (ORCID: [0000-0002-7572-3241](https://orcid.org/0000-0002-7572-3241))
* [Rémi Rampin](https://github.com/remram44) - New York University, USA (ORCID: [0000-0002-0524-2282](https://www.orcid.org/0000-0002-0524-2282))
* [Clément Robert](https://github.com/neutrinoceros) - Institut de Planétologie et d'Astrophysique de Grenoble, France (ORCID: [0000-0001-8629-7068](https://orcid.org/0000-0001-8629-7068))
=====================================
CITATION.cff
=====================================
@@ -0,0 +1,53 @@
+cff-version: 1.2.0
+title: 'Pooch: A friend to fetch your data files'
+message: >-
+ If you use this software, please cite it using the
+ information in this file.
+type: software
+url: 'https://www.fatiando.org/pooch/'
+repository-code: 'https://github.com/fatiando/pooch'
+repository-artifact: 'https://pypi.org/project/pooch/'
+license: BSD-3-Clause
+preferred-citation:
+ type: article
+ title: 'Pooch: A friend to fetch your data files'
+ journal: Journal of Open Source Software
+ year: 2020
+ doi: 10.21105/joss.01943
+ volume: 5
+ issue: 45
+ start: 1943
+ license: CC-BY-4.0
+ authors:
+ - given-names: Leonardo
+ family-names: Uieda
+ affiliation: University of Liverpool
+ orcid: 'https://orcid.org/0000-0001-6123-9515'
+ - given-names: Santiago Rubén
+ family-names: Soler
+ affiliation: Universidad Nacional de San Juan
+ orcid: 'https://orcid.org/0000-0001-9202-5317'
+ - given-names: Rémi
+ family-names: Rampin
+ affiliation: New York University
+ orcid: 'https://orcid.org/0000-0002-0524-2282'
+ - given-names: Hugo
+ name-particle: van
+ family-names: Kemenade
+ orcid: 'https://orcid.org/0000-0001-5715-8632'
+ - given-names: Matthew
+ family-names: Turk
+ affiliation: School of Information Sciences
+ orcid: 'https://orcid.org/0000-0002-5294-0198'
+ - given-names: Daniel
+ family-names: Shapero
+ affiliation: University of Washington
+ orcid: 'https://orcid.org/0000-0002-3651-0649'
+ - given-names: Anderson
+ family-names: Banihirwe
+ affiliation: National Center for Atmospheric Research
+ orcid: 'https://orcid.org/0000-0001-6583-571X'
+ - given-names: John
+ family-names: Leeman
+ affiliation: Leeman Geophysical
+ orcid: 'https://orcid.org/0000-0002-3624-1821'
=====================================
CITATION.rst
=====================================
@@ -14,5 +14,20 @@ If you used Pooch in your research, please consider citing our paper:
This is an open-access publication. The paper and the associated software
review can be freely accessed at: https://doi.org/10.21105/joss.01943
-If you need a Bibtex entry for the paper, grab it here:
-https://www.doi2bib.org/bib/10.21105/joss.01943
+Here is a Bibtex entry to make things easier if you’re using Latex:
+
+.. code:: bibtex
+
+ @article{uieda2020,
+ title = {{Pooch}: {A} friend to fetch your data files},
+ author = {Leonardo Uieda and Santiago Soler and R{\'{e}}mi Rampin and Hugo van Kemenade and Matthew Turk and Daniel Shapero and Anderson Banihirwe and John Leeman},
+ year = {2020},
+ doi = {10.21105/joss.01943},
+ url = {https://doi.org/10.21105/joss.01943},
+ month = jan,
+ publisher = {The Open Journal},
+ volume = {5},
+ number = {45},
+ pages = {1943},
+ journal = {Journal of Open Source Software}
+ }
=====================================
README.md
=====================================
@@ -0,0 +1,185 @@
+<img src="https://github.com/fatiando/pooch/raw/main/doc/_static/readme-banner.png" alt="Pooch: A friend to fetch your data files">
+
+<p align="center">
+<a href="https://www.fatiando.org/pooch"><strong>Documentation</strong> (latest)</a> •
+<a href="https://www.fatiando.org/pooch/dev"><strong>Documentation</strong> (main branch)</a> •
+<a href="https://github.com/fatiando/pooch/blob/main/CONTRIBUTING.md"><strong>Contributing</strong></a> •
+<a href="https://www.fatiando.org/contact/"><strong>Contact</strong></a>
+</p>
+
+<p align="center">
+Part of the <a href="https://www.fatiando.org"><strong>Fatiando a Terra</strong></a> project
+</p>
+
+<p align="center">
+<a href="https://pypi.python.org/pypi/pooch"><img src="http://img.shields.io/pypi/v/pooch.svg?style=flat-square" alt="Latest version on PyPI"></a>
+<a href="https://github.com/conda-forge/pooch-feedstock"><img src="https://img.shields.io/conda/vn/conda-forge/pooch.svg?style=flat-square" alt="Latest version on conda-forge"></a>
+<a href="https://codecov.io/gh/fatiando/pooch"><img src="https://img.shields.io/codecov/c/github/fatiando/pooch/main.svg?style=flat-square" alt="Test coverage status"></a>
+<a href="https://pypi.python.org/pypi/pooch"><img src="https://img.shields.io/pypi/pyversions/pooch.svg?style=flat-square" alt="Compatible Python versions."></a>
+<a href="https://doi.org/10.21105/joss.01943"><img src="https://img.shields.io/badge/doi-10.21105%2Fjoss.01943-blue?style=flat-square" alt="DOI used to cite Pooch"></a>
+</p>
+
+## About
+
+*Does your Python package include sample datasets?
+Are you shipping them with the code?
+Are they getting too big?*
+
+**Pooch** is here to help! It will manage a data *registry* by downloading your
+data files from a server only when needed and storing them locally in a data
+*cache* (a folder on your computer).
+
+Here are Pooch's main features:
+
+* Pure Python and minimal dependencies.
+* Download a file only if necessary (it's not in the data cache or needs to be
+ updated).
+* Verify download integrity through SHA256 hashes (also used to check if a file
+ needs to be updated).
+* Designed to be extended: plug in custom download (FTP, scp, etc) and
+ post-processing (unzip, decompress, rename) functions.
+* Includes utilities to unzip/decompress the data upon download to save loading
+ time.
+* Can handle basic HTTP authentication (for servers that require a login) and
+ printing download progress bars.
+* Easily set up an environment variable to overwrite the data cache location.
+
+*Are you a scientist or researcher? Pooch can help you too!*
+
+* Automatically download your data files so you don't have to keep them in your
+ GitHub repository.
+* Make sure everyone running the code has the same version of the data files
+ (enforced through the SHA256 hashes).
+
+## Example
+
+For a **scientist downloading a data file** for analysis:
+
+```python
+import pooch
+import pandas as pd
+
+# Download a file and save it locally, returning the path to it.
+# Running this again will not cause a download. Pooch will check the hash
+# (checksum) of the downloaded file against the given value to make sure
+# it's the right file (not corrupted or outdated).
+fname_bathymetry = pooch.retrieve(
+ url="https://github.com/fatiando-data/caribbean-bathymetry/releases/download/v1/caribbean-bathymetry.csv.xz",
+ known_hash="md5:a7332aa6e69c77d49d7fb54b764caa82",
+)
+
+# Pooch can also download based on a DOI from certain providers.
+fname_gravity = pooch.retrieve(
+ url="doi:10.5281/zenodo.5882430/southern-africa-gravity.csv.xz",
+ known_hash="md5:1dee324a14e647855366d6eb01a1ef35",
+)
+
+# Load the data with Pandas
+data_bathymetry = pd.read_csv(fname_bathymetry)
+data_gravity = pd.read_csv(fname_gravity)
+```
+
+For **package developers** including sample data in their projects:
+
+```python
+"""
+Module mypackage/datasets.py
+"""
+import pkg_resources
+import pandas
+import pooch
+
+# Get the version string from your project. You have one of these, right?
+from . import version
+
+# Create a new friend to manage your sample data storage
+GOODBOY = pooch.create(
+ # Folder where the data will be stored. For a sensible default, use the
+ # default cache folder for your OS.
+ path=pooch.os_cache("mypackage"),
+ # Base URL of the remote data store. Will call .format on this string
+ # to insert the version (see below).
+ base_url="https://github.com/myproject/mypackage/raw/{version}/data/",
+ # Pooches are versioned so that you can use multiple versions of a
+ # package simultaneously. Use PEP440 compliant version number. The
+ # version will be appended to the path.
+ version=version,
+ # If a version as a "+XX.XXXXX" suffix, we'll assume that this is a dev
+ # version and replace the version with this string.
+ version_dev="main",
+ # An environment variable that overwrites the path.
+ env="MYPACKAGE_DATA_DIR",
+ # The cache file registry. A dictionary with all files managed by this
+ # pooch. Keys are the file names (relative to *base_url*) and values
+ # are their respective SHA256 hashes. Files will be downloaded
+ # automatically when needed (see fetch_gravity_data).
+ registry={"gravity-data.csv": "89y10phsdwhs09whljwc09whcowsdhcwodcydw"}
+)
+# You can also load the registry from a file. Each line contains a file
+# name and it's sha256 hash separated by a space. This makes it easier to
+# manage large numbers of data files. The registry file should be packaged
+# and distributed with your software.
+GOODBOY.load_registry(
+ pkg_resources.resource_stream("mypackage", "registry.txt")
+)
+
+# Define functions that your users can call to get back the data in memory
+def fetch_gravity_data():
+ """
+ Load some sample gravity data to use in your docs.
+ """
+ # Fetch the path to a file in the local storage. If it's not there,
+ # we'll download it.
+ fname = GOODBOY.fetch("gravity-data.csv")
+ # Load it with numpy/pandas/etc
+ data = pandas.read_csv(fname)
+ return data
+```
+
+## Projects using Pooch
+
+* [SciPy](https://github.com/scipy/scipy)
+* [scikit-image](https://github.com/scikit-image/scikit-image)
+* [MetPy](https://github.com/Unidata/MetPy)
+* [icepack](https://github.com/icepack/icepack)
+* [histolab](https://github.com/histolab/histolab)
+* [seaborn-image](https://github.com/SarthakJariwala/seaborn-image)
+* [Ensaio](https://github.com/fatiando/ensaio)
+* [Open AR-Sandbox](https://github.com/cgre-aachen/open_AR_Sandbox)
+* [climlab](https://github.com/climlab/climlab)
+* [napari](https://github.com/napari/napari)
+* [mne-python](https://github.com/mne-tools/mne-python)
+
+*If you're using Pooch, send us a pull request adding your project to the list.*
+
+## Getting involved
+
+🗨️ **Contact us:**
+Find out more about how to reach us at
+[fatiando.org/contact](https://www.fatiando.org/contact/).
+
+👩🏾💻 **Contributing to project development:**
+Please read our
+[Contributing Guide](https://github.com/fatiando/pooch/blob/main/CONTRIBUTING.md)
+to see how you can help and give feedback.
+
+🧑🏾🤝🧑🏼 **Code of conduct:**
+This project is released with a
+[Code of Conduct](https://github.com/fatiando/community/blob/main/CODE_OF_CONDUCT.md).
+By participating in this project you agree to abide by its terms.
+
+> **Imposter syndrome disclaimer:**
+> We want your help. **No, really.** There may be a little voice inside your
+> head that is telling you that you're not ready, that you aren't skilled
+> enough to contribute. We assure you that the little voice in your head is
+> wrong. Most importantly, **there are many valuable ways to contribute besides
+> writing code**.
+>
+> *This disclaimer was adapted from the*
+> [MetPy project](https://github.com/Unidata/MetPy).
+
+## License
+
+This is free software: you can redistribute it and/or modify it under the terms
+of the **BSD 3-clause License**. A copy of this license is provided in
+[`LICENSE.txt`](https://github.com/fatiando/pooch/blob/main/LICENSE.txt).
=====================================
README.rst deleted
=====================================
@@ -1,230 +0,0 @@
-.. image:: https://github.com/fatiando/pooch/raw/main/doc/_static/readme-banner.png
- :alt: Pooch
-
-`Documentation <https://www.fatiando.org/pooch>`__ |
-`Documentation (dev version) <https://www.fatiando.org/pooch/dev>`__ |
-Part of the `Fatiando a Terra <https://www.fatiando.org>`__ project
-
-.. image:: https://img.shields.io/pypi/v/pooch.svg?style=flat-square
- :alt: Latest version on PyPI
- :target: https://pypi.org/project/pooch/
-.. image:: https://img.shields.io/conda/vn/conda-forge/pooch.svg?style=flat-square
- :alt: Latest version on conda-forge
- :target: https://github.com/conda-forge/pooch-feedstock
-.. image:: https://img.shields.io/codecov/c/github/fatiando/pooch/main.svg?style=flat-square
- :alt: Test coverage status
- :target: https://codecov.io/gh/fatiando/pooch
-.. image:: https://img.shields.io/pypi/pyversions/pooch.svg?style=flat-square
- :alt: Compatible Python versions.
- :target: https://pypi.org/project/pooch/
-.. image:: https://img.shields.io/badge/doi-10.21105%2Fjoss.01943-blue.svg?style=flat-square
- :alt: Digital Object Identifier for the JOSS paper
- :target: https://doi.org/10.21105/joss.01943
-
-
-About
------
-
-*Does your Python package include sample datasets? Are you shipping them with the code?
-Are they getting too big?*
-
-Pooch is here to help! It will manage a data *registry* by downloading your data files
-from a server only when needed and storing them locally in a data *cache* (a folder on
-your computer).
-
-Here are Pooch's main features:
-
-* Pure Python and minimal dependencies.
-* Download a file only if necessary (it's not in the data cache or needs to be updated).
-* Verify download integrity through SHA256 hashes (also used to check if a file needs to
- be updated).
-* Designed to be extended: plug in custom download (FTP, scp, etc) and post-processing
- (unzip, decompress, rename) functions.
-* Includes utilities to unzip/decompress the data upon download to save loading time.
-* Can handle basic HTTP authentication (for servers that require a login) and printing
- download progress bars.
-* Easily set up an environment variable to overwrite the data cache location.
-
-*Are you a scientist or researcher? Pooch can help you too!*
-
-* Automatically download your data files so you don't have to keep them in your GitHub
- repository.
-* Make sure everyone running the code has the same version of the data files (enforced
- through the SHA256 hashes).
-
-
-Example
--------
-
-For a **scientist downloading a data file** for analysis:
-
-.. code:: python
-
- import pooch
- import pandas as pd
-
-
- # Download a file and save it locally, returning the path to it.
- # Running this again will not cause a download. Pooch will check the hash
- # (checksum) of the downloaded file against the given value to make sure
- # it's the right file (not corrupted or outdated).
- fname_bathymetry = pooch.retrieve(
- url="https://github.com/fatiando-data/caribbean-bathymetry/releases/download/v1/caribbean-bathymetry.csv.xz",
- known_hash="md5:a7332aa6e69c77d49d7fb54b764caa82",
- )
-
- # Pooch can also download based on a DOI from certain providers.
- fname_gravity = pooch.retrieve(
- url="doi:10.5281/zenodo.5882430/southern-africa-gravity.csv.xz",
- known_hash="md5:1dee324a14e647855366d6eb01a1ef35",
- )
-
- # Load the data with Pandas
- data_bathymetry = pd.read_csv(fname_bathymetry)
- data_gravity = pd.read_csv(fname_gravity)
-
-
-
-For **package developers** including sample data in their projects:
-
-.. code:: python
-
- """
- Module mypackage/datasets.py
- """
- import pkg_resources
- import pandas
- import pooch
-
- # Get the version string from your project. You have one of these, right?
- from . import version
-
-
- # Create a new friend to manage your sample data storage
- GOODBOY = pooch.create(
- # Folder where the data will be stored. For a sensible default, use the
- # default cache folder for your OS.
- path=pooch.os_cache("mypackage"),
- # Base URL of the remote data store. Will call .format on this string
- # to insert the version (see below).
- base_url="https://github.com/myproject/mypackage/raw/{version}/data/",
- # Pooches are versioned so that you can use multiple versions of a
- # package simultaneously. Use PEP440 compliant version number. The
- # version will be appended to the path.
- version=version,
- # If a version as a "+XX.XXXXX" suffix, we'll assume that this is a dev
- # version and replace the version with this string.
- version_dev="main",
- # An environment variable that overwrites the path.
- env="MYPACKAGE_DATA_DIR",
- # The cache file registry. A dictionary with all files managed by this
- # pooch. Keys are the file names (relative to *base_url*) and values
- # are their respective SHA256 hashes. Files will be downloaded
- # automatically when needed (see fetch_gravity_data).
- registry={"gravity-data.csv": "89y10phsdwhs09whljwc09whcowsdhcwodcydw"}
- )
- # You can also load the registry from a file. Each line contains a file
- # name and it's sha256 hash separated by a space. This makes it easier to
- # manage large numbers of data files. The registry file should be packaged
- # and distributed with your software.
- GOODBOY.load_registry(
- pkg_resources.resource_stream("mypackage", "registry.txt")
- )
-
-
- # Define functions that your users can call to get back the data in memory
- def fetch_gravity_data():
- """
- Load some sample gravity data to use in your docs.
- """
- # Fetch the path to a file in the local storage. If it's not there,
- # we'll download it.
- fname = GOODBOY.fetch("gravity-data.csv")
- # Load it with numpy/pandas/etc
- data = pandas.read_csv(fname)
- return data
-
-
-Projects using Pooch
---------------------
-
-* `scikit-image <https://github.com/scikit-image/scikit-image>`__
-* `MetPy <https://github.com/Unidata/MetPy>`__
-* `icepack <https://github.com/icepack/icepack>`__
-* `histolab <https://github.com/histolab/histolab>`__
-* `seaborn-image <https://github.com/SarthakJariwala/seaborn-image>`__
-* `Ensaio <https://github.com/fatiando/ensaio>`__
-
-*If you're using Pooch, send us a pull request adding your project to the list.*
-
-
-Contacting Us
--------------
-
-Find out more about how to reach us at
-`fatiando.org/contact <https://www.fatiando.org/contact/>`__
-
-
-Citing Pooch
-------------
-
-This is research software **made by scientists** (see
-`AUTHORS.md <https://github.com/fatiando/pooch/blob/main/AUTHORS.md>`__). Citations
-help us justify the effort that goes into building and maintaining this project. If you
-used Pooch for your research, please consider citing us.
-
-See our `CITATION.rst file <https://github.com/fatiando/pooch/blob/main/CITATION.rst>`__
-to find out more.
-
-
-Contributing
-------------
-
-Code of conduct
-+++++++++++++++
-
-Please note that this project is released with a
-`Code of Conduct <https://github.com/fatiando/community/blob/main/CODE_OF_CONDUCT.md>`__.
-By participating in this project you agree to abide by its terms.
-
-Contributing Guidelines
-+++++++++++++++++++++++
-
-Please read our
-`Contributing Guide <https://github.com/fatiando/pooch/blob/main/CONTRIBUTING.md>`__
-to see how you can help and give feedback.
-
-Imposter syndrome disclaimer
-++++++++++++++++++++++++++++
-
-**We want your help.** No, really.
-
-There may be a little voice inside your head that is telling you that you're
-not ready to be an open source contributor; that your skills aren't nearly good
-enough to contribute.
-What could you possibly offer?
-
-We assure you that the little voice in your head is wrong.
-
-**Being a contributor doesn't just mean writing code**.
-Equally important contributions include:
-writing or proof-reading documentation, suggesting or implementing tests, or
-even giving feedback about the project (including giving feedback about the
-contribution process).
-If you're coming to the project with fresh eyes, you might see the errors and
-assumptions that seasoned contributors have glossed over.
-If you can write any code at all, you can contribute code to open source.
-We are constantly trying out new skills, making mistakes, and learning from
-those mistakes.
-That's how we all improve and we are happy to help others learn.
-
-*This disclaimer was adapted from the*
-`MetPy project <https://github.com/Unidata/MetPy>`__.
-
-
-License
--------
-
-This is free software: you can redistribute it and/or modify it under the terms
-of the `BSD 3-clause License <https://github.com/fatiando/pooch/blob/main/LICENSE.txt>`__.
-A copy of this license is provided with distributions of the software.
=====================================
doc/changes.rst
=====================================
@@ -3,6 +3,64 @@
Changelog
=========
+Version 1.7.0
+-------------
+
+*Released on: 2023/02/27*
+
+doi:`10.5281/zenodo.7678844 <https://doi.org/10.5281/zenodo.7678844>`__
+
+Bug fixes:
+
+* Make archive extraction always take members into account (`#316 <https://github.com/fatiando/pooch/pull/316>`__)
+* Figshare downloaders fetch the correct version, instead of always the latest one. (`#343 <https://github.com/fatiando/pooch/pull/343>`__)
+
+New features:
+
+* Allow spaces in filenames in registry files (`#315 <https://github.com/fatiando/pooch/pull/315>`__)
+* Refactor ``Pooch.is_available`` to use downloaders (`#322 <https://github.com/fatiando/pooch/pull/322>`__)
+* Add support for downloading files from Dataverse DOIs (`#318 <https://github.com/fatiando/pooch/pull/318>`__)
+* Add a new ``Pooch.load_registry_from_doi`` method that populates the Pooch registry using DOI-based data repositories (`#325 <https://github.com/fatiando/pooch/pull/325>`__)
+* Support urls for Zenodo repositories created through the GitHub integration service, which include slashes in the filename of the main zip files (`#340 <https://github.com/fatiando/pooch/pull/340>`__)
+* Automatically add a trailing slash to ``base_url`` on ``pooch.create`` (`#344 <https://github.com/fatiando/pooch/pull/344>`__)
+
+Maintenance:
+
+* Drop support for Python 3.6 (`#299 <https://github.com/fatiando/pooch/pull/299>`__)
+* Port from deprecated ``appdirs`` to ``platformdirs`` (`#339 <https://github.com/fatiando/pooch/pull/339>`__)
+* Update version of Codecov's Action to v3 (`#345 <https://github.com/fatiando/pooch/pull/345>`__)
+
+Documentation:
+
+* Update sphinx, theme, and sphinx-panels (`#300 <https://github.com/fatiando/pooch/pull/300>`__)
+* Add CITATION.cff for the JOSS article (`#308 <https://github.com/fatiando/pooch/pull/308>`__)
+* Use Markdown for the README (`#311 <https://github.com/fatiando/pooch/pull/311>`__)
+* Improve docstring of `known_hash` in `retrieve` function (`#333 <https://github.com/fatiando/pooch/pull/333>`__)
+* Replace link to Pooch's citation with a BibTeX code snippet (`#335 <https://github.com/fatiando/pooch/pull/335>`__)
+
+Projects that started using Pooch:
+
+* Open AR-Sandbox (`#305 <https://github.com/fatiando/pooch/pull/305>`__)
+* ``climlab`` (`#312 <https://github.com/fatiando/pooch/pull/312>`__)
+* SciPy (`#320 <https://github.com/fatiando/pooch/pull/320>`__)
+* ``napari`` (`#321 <https://github.com/fatiando/pooch/pull/321>`__)
+* ``mne-python`` (`#323 <https://github.com/fatiando/pooch/pull/323>`__)
+
+This release contains contributions from:
+
+* Alex Fikl
+* Anirudh Dagar
+* Björn Ludwig
+* Brian Rose
+* Dominic Kempf
+* Florian Wellmann
+* Gabriel Fu
+* Kyle I S Harrington
+* Leonardo Uieda
+* myd7349
+* Rowan Cockett
+* Santiago Soler
+
Version 1.6.0
-------------
=====================================
doc/conf.py
=====================================
@@ -47,7 +47,7 @@ panels_css_variables = {
intersphinx_mapping = {
"python": ("https://docs.python.org/3/", None),
"pandas": ("http://pandas.pydata.org/pandas-docs/stable/", None),
- "requests": ("https://3.python-requests.org/", None),
+ "requests": ("https://requests.readthedocs.io/en/latest/", None),
}
# Autosummary pages will be generated by sphinx-autogen instead of sphinx-build
=====================================
doc/downloaders.rst
=====================================
@@ -20,8 +20,10 @@ Downloaders are Python *callable objects* (like functions or classes with a
'''
Download a file from the given URL to the given local file.
- The function **must** take as arguments (in order):
+ The function **must** take the following arguments (in order).
+ Parameters
+ ----------
url : str
The URL to the file you want to download.
output_file : str or file-like object
@@ -78,3 +80,47 @@ redirected from the original download URL:
fname = GOODBOY.fetch("some-data.csv", downloader=redirect_downloader)
data = pandas.read_csv(fname)
return data
+
+
+Availability checks
+-------------------
+
+**Optionally**, downloaders can take a ``check_only`` keyword argument (default
+to ``False``) that makes them only check if a given file is available for
+download **without** downloading the file.
+This makes a downloader compatible with :meth:`pooch.Pooch.is_available`.
+
+In this case, the downloader should return a boolean:
+
+.. code:: python
+
+ def mydownloader(url, output_file, pooch, check_only=False):
+ '''
+ Download a file from the given URL to the given local file.
+
+ The function **must** take the following arguments (in order).
+
+ Parameters
+ ----------
+ url : str
+ The URL to the file you want to download.
+ output_file : str or file-like object
+ Path (and file name) to which the file will be downloaded.
+ pooch : pooch.Pooch
+ The instance of the Pooch class that is calling this function.
+ check_only : bool
+ If True, will only check if a file exists on the server and
+ **without downloading the file**. Will return ``True`` if the file
+ exists and ``False`` otherwise.
+
+ Returns
+ -------
+ None or availability
+ If ``check_only==True``, returns a boolean indicating if the file
+ is available on the server. Otherwise, returns ``None``.
+ '''
+ ...
+
+If a downloader does not implement an availability check (i.e., doesn't take
+``check_only`` as a keyword argument), then :meth:`pooch.Pooch.is_available`
+will raise a ``NotImplementedError``.
=====================================
doc/install.rst
=====================================
@@ -39,7 +39,7 @@ There are different ways to install Pooch:
Which Python?
-------------
-You'll need **Python >= 3.6**. See :ref:`python-versions` if you
+You'll need **Python >= 3.7**. See :ref:`python-versions` if you
require support for older versions.
.. _dependencies:
@@ -53,7 +53,7 @@ manually.
Required:
-* `appdirs <https://github.com/ActiveState/appdirs>`__
+* `platformdirs <https://github.com/platformdirs/platformdirs>`__
* `packaging <https://github.com/pypa/packaging>`__
* `requests <https://docs.python-requests.org/>`__
=====================================
doc/protocols.rst
=====================================
@@ -103,3 +103,24 @@ figshare dataset:
``doi:10.6084/m9.figshare.c.4362224.v1``. Attempting to download files
from a figshare collection will raise an error.
See `issue #274 <https://github.com/fatiando/pooch/issues/274>`__ details.
+
+Since this type of repositories store information about the files contained in
+them, we can avoid having to manually type the registry with the file names and
+their hashes.
+Instead, we can use the :meth:`pooch.Pooch.load_registry_from_doi` to
+automatically populate the registry:
+
+.. code-block:: python
+
+ POOCH = pooch.create(
+ path=pooch.os_cache("plumbus"),
+ # Use the figshare DOI
+ base_url="doi:10.6084/m9.figshare.14763051.v1/",
+ registry=None,
+ )
+
+ # Automatically populate the registry
+ POOCH.load_registry_from_doi()
+
+ # Fetch one of the files in the repository
+ fname = POOCH.fetch("tiny-data.txt")
=====================================
doc/versions.rst
=====================================
@@ -7,6 +7,7 @@ Use the links below to access documentation for specific versions
* `Latest release <https://www.fatiando.org/pooch/latest>`__
* `Development <https://www.fatiando.org/pooch/dev>`__
(reflects the current development branch on GitHub)
+* `v1.7.0 <https://www.fatiando.org/pooch/v1.7.0>`__
* `v1.6.0 <https://www.fatiando.org/pooch/v1.6.0>`__
* `v1.5.2 <https://www.fatiando.org/pooch/v1.5.2>`__
* `v1.5.1 <https://www.fatiando.org/pooch/v1.5.1>`__
=====================================
env/requirements-docs.txt
=====================================
@@ -1,4 +1,4 @@
# Documentation requirements
-sphinx==3.5.*
-sphinx-book-theme==0.0.41
-sphinx-panels==0.5.*
+sphinx==4.4.*
+sphinx-book-theme==0.2.*
+sphinx-panels==0.6.*
=====================================
environment.yml
=====================================
@@ -3,12 +3,12 @@ channels:
- conda-forge
- defaults
dependencies:
- - python==3.9
+ - python==3.10
- pip
# Run
- requests
- packaging
- - appdirs
+ - platformdirs
# Build
- build
# Test
@@ -17,9 +17,9 @@ dependencies:
- pytest-localftpserver
- coverage
# Documentation
- - sphinx==3.5.*
- - sphinx-book-theme==0.0.41
- - sphinx-panels==0.5.*
+ - sphinx==4.4.*
+ - sphinx-book-theme==0.2.*
+ - sphinx-panels==0.6.*
# Style
- pathspec
- black>=20.8b1
=====================================
pooch/core.py
=====================================
@@ -11,8 +11,8 @@ import os
import time
import contextlib
from pathlib import Path
+import shlex
import shutil
-import ftplib
import requests
import requests.exceptions
@@ -20,7 +20,6 @@ import requests.exceptions
from .hashes import hash_matches, file_hash
from .utils import (
check_version,
- parse_url,
get_logger,
make_local_storage,
cache_location,
@@ -28,7 +27,7 @@ from .utils import (
os_cache,
unique_file_name,
)
-from .downloaders import choose_downloader
+from .downloaders import DOIDownloader, choose_downloader, doi_to_repository
def retrieve(
@@ -77,7 +76,7 @@ def retrieve(
url : str
The URL to the file that is to be downloaded. Ideally, the URL should
end in a file name.
- known_hash : str
+ known_hash : str or None
A known hash (checksum) of the file. Will be used to verify the
download or check if an existing file needs to be updated. By default,
will assume it's a SHA256 hash. To specify a different hashing method,
@@ -296,8 +295,8 @@ def create(
Base URL for the remote data source. All requests will be made relative
to this URL. The string should have a ``{version}`` formatting mark in
it. We will call ``.format(version=version)`` on this string. If the
- URL is a directory path, it must end in a ``'/'`` because we will not
- include it.
+ URL does not end in a ``'/'``, a trailing ``'/'`` will be added
+ automatically.
version : str or None
The version string for your project. Should be PEP440 compatible. If
None is given, will not attempt to format *base_url* and no subfolder
@@ -424,6 +423,8 @@ def create(
path = cache_location(path, env, version)
if isinstance(allow_updates, str):
allow_updates = os.environ.get(allow_updates, "true").lower() != "false"
+ # add trailing "/"
+ base_url = base_url.rstrip("/") + "/"
pup = Pooch(
path=path,
base_url=base_url,
@@ -656,7 +657,7 @@ class Pooch:
if line.startswith("#"):
continue
- elements = line.split()
+ elements = shlex.split(line)
if not len(elements) in [0, 2, 3]:
raise OSError(
f"Invalid entry in Pooch registry file '{fname}': "
@@ -671,7 +672,37 @@ class Pooch:
self.urls[file_name] = file_url
self.registry[file_name] = file_checksum.lower()
- def is_available(self, fname):
+ def load_registry_from_doi(self):
+ """
+ Populate the registry using the data repository API
+
+ Fill the registry with all the files available in the data repository,
+ along with their hashes. It will make a request to the data repository
+ API to retrieve this information. No file is downloaded during this
+ process.
+
+ .. important::
+
+ This method is intended to be used only when the ``base_url`` is
+ a DOI.
+ """
+
+ # Ensure that this is indeed a DOI-based pooch
+ downloader = choose_downloader(self.base_url)
+ if not isinstance(downloader, DOIDownloader):
+ raise ValueError(
+ f"Invalid base_url '{self.base_url}': "
+ + "Pooch.load_registry_from_doi is only implemented for DOIs"
+ )
+
+ # Create a repository instance
+ doi = self.base_url.replace("doi:", "")
+ repository = doi_to_repository(doi)
+
+ # Call registry population for this repository
+ return repository.populate_registry(self)
+
+ def is_available(self, fname, downloader=None):
"""
Check availability of a remote file without downloading it.
@@ -682,7 +713,11 @@ class Pooch:
----------
fname : str
The file name (relative to the *base_url* of the remote data
- storage) to fetch from the local storage.
+ storage).
+ downloader : None or callable
+ If not None, then a function (or callable object) that will be
+ called to check the availability of the file on the server. See
+ :ref:`downloaders` for details.
Returns
-------
@@ -691,20 +726,16 @@ class Pooch:
"""
self._assert_file_in_registry(fname)
- source = self.get_url(fname)
- parsed_url = parse_url(source)
- if parsed_url["protocol"] == "ftp":
- directory, file_name = os.path.split(parsed_url["path"])
- ftp = ftplib.FTP()
- ftp.connect(host=parsed_url["netloc"])
- try:
- ftp.login()
- available = file_name in ftp.nlst(directory)
- finally:
- ftp.close()
- else:
- response = requests.head(source, allow_redirects=True)
- available = bool(response.status_code == 200)
+ url = self.get_url(fname)
+ if downloader is None:
+ downloader = choose_downloader(url)
+ try:
+ available = downloader(url, None, self, check_only=True)
+ except TypeError as error:
+ error_msg = (
+ f"Downloader '{str(downloader)}' does not support availability checks."
+ )
+ raise NotImplementedError(error_msg) from error
return available
=====================================
pooch/downloaders.py
=====================================
@@ -7,9 +7,11 @@
"""
The classes that actually handle the downloads.
"""
+import os
import sys
import ftplib
+import warnings
import requests
from .utils import parse_url
@@ -164,7 +166,7 @@ class HTTPDownloader: # pylint: disable=too-few-public-methods
if self.progressbar is True and tqdm is None:
raise ValueError("Missing package 'tqdm' required for progress bars.")
- def __call__(self, url, output_file, pooch):
+ def __call__(self, url, output_file, pooch, check_only=False):
"""
Download the given URL over HTTP to the given output file.
@@ -178,8 +180,23 @@ class HTTPDownloader: # pylint: disable=too-few-public-methods
Path (and file name) to which the file will be downloaded.
pooch : :class:`~pooch.Pooch`
The instance of :class:`~pooch.Pooch` that is calling this method.
+ check_only : bool
+ If True, will only check if a file exists on the server and
+ **without downloading the file**. Will return ``True`` if the file
+ exists and ``False`` otherwise.
+
+ Returns
+ -------
+ availability : bool or None
+ If ``check_only==True``, returns a boolean indicating if the file
+ is available on the server. Otherwise, returns ``None``.
"""
+ if check_only:
+ response = requests.head(url, allow_redirects=True)
+ available = bool(response.status_code == 200)
+ return available
+
kwargs = self.kwargs.copy()
kwargs.setdefault("stream", True)
ispath = not hasattr(output_file, "write")
@@ -226,6 +243,7 @@ class HTTPDownloader: # pylint: disable=too-few-public-methods
finally:
if ispath:
output_file.close()
+ return None
class FTPDownloader: # pylint: disable=too-few-public-methods
@@ -275,7 +293,6 @@ class FTPDownloader: # pylint: disable=too-few-public-methods
progressbar=False,
chunk_size=1024,
):
-
self.port = port
self.username = username
self.password = password
@@ -286,7 +303,7 @@ class FTPDownloader: # pylint: disable=too-few-public-methods
if self.progressbar is True and tqdm is None:
raise ValueError("Missing package 'tqdm' required for progress bars.")
- def __call__(self, url, output_file, pooch):
+ def __call__(self, url, output_file, pooch, check_only=False):
"""
Download the given URL over FTP to the given output file.
@@ -298,11 +315,31 @@ class FTPDownloader: # pylint: disable=too-few-public-methods
Path (and file name) to which the file will be downloaded.
pooch : :class:`~pooch.Pooch`
The instance of :class:`~pooch.Pooch` that is calling this method.
- """
+ check_only : bool
+ If True, will only check if a file exists on the server and
+ **without downloading the file**. Will return ``True`` if the file
+ exists and ``False`` otherwise.
+
+ Returns
+ -------
+ availability : bool or None
+ If ``check_only==True``, returns a boolean indicating if the file
+ is available on the server. Otherwise, returns ``None``.
+ """
parsed_url = parse_url(url)
ftp = ftplib.FTP(timeout=self.timeout)
ftp.connect(host=parsed_url["netloc"], port=self.port)
+
+ if check_only:
+ directory, file_name = os.path.split(parsed_url["path"])
+ try:
+ ftp.login(user=self.username, passwd=self.password, acct=self.account)
+ available = file_name in ftp.nlst(directory)
+ finally:
+ ftp.close()
+ return available
+
ispath = not hasattr(output_file, "write")
if ispath:
output_file = open(output_file, "w+b")
@@ -313,10 +350,9 @@ class FTPDownloader: # pylint: disable=too-few-public-methods
# Make sure the file is set to binary mode, otherwise we can't
# get the file size. See: https://stackoverflow.com/a/22093848
ftp.voidcmd("TYPE I")
- size = int(ftp.size(parsed_url["path"]))
use_ascii = bool(sys.platform == "win32")
progress = tqdm(
- total=size,
+ total=int(ftp.size(parsed_url["path"])),
ncols=79,
ascii=use_ascii,
unit="B",
@@ -337,6 +373,7 @@ class FTPDownloader: # pylint: disable=too-few-public-methods
ftp.quit()
if ispath:
output_file.close()
+ return None
class SFTPDownloader: # pylint: disable=too-few-public-methods
@@ -477,6 +514,7 @@ class DOIDownloader: # pylint: disable=too-few-public-methods
* `figshare <https://www.figshare.com>`__
* `Zenodo <https://www.zenodo.org>`__
+ * `DataVerse <https://dataverse.org/>`__ instances
.. attention::
@@ -555,26 +593,18 @@ class DOIDownloader: # pylint: disable=too-few-public-methods
The instance of :class:`~pooch.Pooch` that is calling this method.
"""
- converters = {
- "figshare.com": figshare_download_url,
- "zenodo.org": zenodo_download_url,
- }
+
parsed_url = parse_url(url)
- doi = parsed_url["netloc"]
- archive_url = doi_to_url(doi)
- repository = parse_url(archive_url)["netloc"]
- if repository not in converters:
- raise ValueError(
- f"Invalid data repository '{repository}'. Must be one of "
- f"{list(converters.keys())}. "
- "To request or contribute support for this repository, "
- "please open an issue at https://github.com/fatiando/pooch/issues"
- )
- download_url = converters[repository](
- archive_url=archive_url,
- file_name=parsed_url["path"].split("/")[-1],
- doi=doi,
- )
+ data_repository = doi_to_repository(parsed_url["netloc"])
+
+ # Resolve the URL
+ file_name = parsed_url["path"]
+ # remove the leading slash in the path
+ if file_name[0] == "/":
+ file_name = file_name[1:]
+ download_url = data_repository.download_url(file_name)
+
+ # Instantiate the downloader object
downloader = HTTPDownloader(
progressbar=self.progressbar, chunk_size=self.chunk_size, **self.kwargs
)
@@ -606,66 +636,430 @@ def doi_to_url(doi):
return url
-def zenodo_download_url(archive_url, file_name, doi):
+def doi_to_repository(doi):
"""
- Use the API to get the download URL for a file given the archive URL.
+ Instantiate a data repository instance from a given DOI.
+
+ This function implements the chain of responsibility dispatch
+ to the correct data repository class.
Parameters
----------
- archive_url : str
- URL of the dataset in the repository.
- file_name : str
- The name of the file in the archive that will be downloaded.
doi : str
The DOI of the archive.
Returns
-------
- download_url : str
- The HTTP URL that can be used to download the file.
-
+ data_repository : DataRepository
+ The data repository object
"""
- article_id = archive_url.split("/")[-1]
- # With the ID, we can get a list of files and their download links
- article = requests.get(f"https://zenodo.org/api/records/{article_id}").json()
- files = {item["key"]: item for item in article["files"]}
- if file_name not in files:
+
+ # This should go away in a separate issue: DOI handling should
+ # not rely on the (non-)existence of trailing slashes. The issue
+ # is documented in https://github.com/fatiando/pooch/issues/324
+ if doi[-1] == "/":
+ doi = doi[:-1]
+
+ repositories = [
+ FigshareRepository,
+ ZenodoRepository,
+ DataverseRepository,
+ ]
+
+ # Extract the DOI and the repository information
+ archive_url = doi_to_url(doi)
+
+ # Try the converters one by one until one of them returned a URL
+ data_repository = None
+ for repo in repositories:
+ if data_repository is None:
+ data_repository = repo.initialize(
+ archive_url=archive_url,
+ doi=doi,
+ )
+
+ if data_repository is None:
+ repository = parse_url(archive_url)["netloc"]
raise ValueError(
- f"File '{file_name}' not found in data archive {archive_url} (doi:{doi})."
+ f"Invalid data repository '{repository}'. "
+ "To request or contribute support for this repository, "
+ "please open an issue at https://github.com/fatiando/pooch/issues"
)
- download_url = files[file_name]["links"]["self"]
- return download_url
+ return data_repository
-def figshare_download_url(archive_url, file_name, doi):
- """
- Use the API to get the download URL for a file given the archive URL.
- Parameters
- ----------
- archive_url : str
- URL of the dataset in the repository.
- file_name : str
- The name of the file in the archive that will be downloaded.
- doi : str
- The DOI of the archive.
+class DataRepository: # pylint: disable=too-few-public-methods, missing-class-docstring
+ @classmethod
+ def initialize(cls, doi, archive_url): # pylint: disable=unused-argument
+ """
+ Initialize the data repository if the given URL points to a
+ corresponding repository.
- Returns
- -------
- download_url : str
- The HTTP URL that can be used to download the file.
+ Initializes a data repository object. This is done as part of
+ a chain of responsibility. If the class cannot handle the given
+ repository URL, it returns `None`. Otherwise a `DataRepository`
+ instance is returned.
+
+ Parameters
+ ----------
+ doi : str
+ The DOI that identifies the repository
+ archive_url : str
+ The resolved URL for the DOI
+ """
+
+ return None # pragma: no cover
+
+ def download_url(self, file_name):
+ """
+ Use the repository API to get the download URL for a file given
+ the archive URL.
+
+ Parameters
+ ----------
+ file_name : str
+ The name of the file in the archive that will be downloaded.
+
+ Returns
+ -------
+ download_url : str
+ The HTTP URL that can be used to download the file.
+ """
+
+ raise NotImplementedError # pragma: no cover
+
+ def populate_registry(self, pooch):
+ """
+ Populate the registry using the data repository's API
+
+ Parameters
+ ----------
+ pooch : Pooch
+ The pooch instance that the registry will be added to.
+ """
+
+ raise NotImplementedError # pragma: no cover
+
+
+class ZenodoRepository(DataRepository): # pylint: disable=missing-class-docstring
+ def __init__(self, doi, archive_url):
+ self.archive_url = archive_url
+ self.doi = doi
+ self._api_response = None
+
+ @classmethod
+ def initialize(cls, doi, archive_url):
+ """
+ Initialize the data repository if the given URL points to a
+ corresponding repository.
+
+ Initializes a data repository object. This is done as part of
+ a chain of responsibility. If the class cannot handle the given
+ repository URL, it returns `None`. Otherwise a `DataRepository`
+ instance is returned.
+
+ Parameters
+ ----------
+ doi : str
+ The DOI that identifies the repository
+ archive_url : str
+ The resolved URL for the DOI
+ """
+
+ # Check whether this is a Zenodo URL
+ parsed_archive_url = parse_url(archive_url)
+ if parsed_archive_url["netloc"] != "zenodo.org":
+ return None
+
+ return cls(doi, archive_url)
+
+ @property
+ def api_response(self):
+ """Cached API response from Zenodo"""
+
+ if self._api_response is None:
+ article_id = self.archive_url.split("/")[-1]
+ self._api_response = requests.get(
+ f"https://zenodo.org/api/records/{article_id}"
+ ).json()
+
+ return self._api_response
+
+ def download_url(self, file_name):
+ """
+ Use the repository API to get the download URL for a file given
+ the archive URL.
+
+ Parameters
+ ----------
+ file_name : str
+ The name of the file in the archive that will be downloaded.
+
+ Returns
+ -------
+ download_url : str
+ The HTTP URL that can be used to download the file.
+ """
+
+ files = {item["key"]: item for item in self.api_response["files"]}
+ if file_name not in files:
+ raise ValueError(
+ f"File '{file_name}' not found in data archive {self.archive_url} (doi:{self.doi})."
+ )
+ download_url = files[file_name]["links"]["self"]
+ return download_url
+
+ def populate_registry(self, pooch):
+ """
+ Populate the registry using the data repository's API
+
+ Parameters
+ ----------
+ pooch : Pooch
+ The pooch instance that the registry will be added to.
+ """
+
+ for filedata in self.api_response["files"]:
+ pooch.registry[filedata["key"]] = filedata["checksum"]
+
+
+class FigshareRepository(DataRepository): # pylint: disable=missing-class-docstring
+ def __init__(self, doi, archive_url):
+ self.archive_url = archive_url
+ self.doi = doi
+ self._api_response = None
+
+ @classmethod
+ def initialize(cls, doi, archive_url):
+ """
+ Initialize the data repository if the given URL points to a
+ corresponding repository.
+
+ Initializes a data repository object. This is done as part of
+ a chain of responsibility. If the class cannot handle the given
+ repository URL, it returns `None`. Otherwise a `DataRepository`
+ instance is returned.
+
+ Parameters
+ ----------
+ doi : str
+ The DOI that identifies the repository
+ archive_url : str
+ The resolved URL for the DOI
+ """
+
+ # Check whether this is a Figshare URL
+ parsed_archive_url = parse_url(archive_url)
+ if parsed_archive_url["netloc"] != "figshare.com":
+ return None
+
+ return cls(doi, archive_url)
+
+ def _parse_version_from_doi(self):
+ """
+ Parse version from the doi
+
+ Return None if version is not available in the doi.
+ """
+ # Get suffix of the doi
+ _, suffix = self.doi.split("/")
+ # Split the suffix by dots and keep the last part
+ last_part = suffix.split(".")[-1]
+ # Parse the version from the last part
+ if last_part[0] != "v":
+ return None
+ version = int(last_part[1:])
+ return version
+
+ @property
+ def api_response(self):
+ """Cached API response from Figshare"""
+
+ if self._api_response is None:
+ # Use the figshare API to find the article ID from the DOI
+ article = requests.get(
+ f"https://api.figshare.com/v2/articles?doi={self.doi}"
+ ).json()[0]
+ article_id = article["id"]
+ # Parse desired version from the doi
+ version = self._parse_version_from_doi()
+ # With the ID and version, we can get a list of files and their
+ # download links
+ if version is None:
+ # Figshare returns the latest version available when no version
+ # is specified through the DOI.
+ warnings.warn(
+ f"The Figshare DOI '{self.doi}' doesn't specify which version of "
+ "the repository should be used. "
+ "Figshare will point to the latest version available.",
+ UserWarning,
+ )
+ # Define API url using only the article id
+ # (figshare will resolve the latest version)
+ api_url = f"https://api.figshare.com/v2/articles/{article_id}"
+ else:
+ # Define API url using article id and the desired version
+ # Get list of files using article id and the version
+ api_url = (
+ "https://api.figshare.com/v2/articles/"
+ f"{article_id}/versions/{version}"
+ )
+ # Make the request and return the files in the figshare repository
+ response = requests.get(api_url)
+ response.raise_for_status()
+ self._api_response = response.json()["files"]
+
+ return self._api_response
+
+ def download_url(self, file_name):
+ """
+ Use the repository API to get the download URL for a file given
+ the archive URL.
+
+ Parameters
+ ----------
+ file_name : str
+ The name of the file in the archive that will be downloaded.
+
+ Returns
+ -------
+ download_url : str
+ The HTTP URL that can be used to download the file.
+ """
+
+ files = {item["name"]: item for item in self.api_response}
+ if file_name not in files:
+ raise ValueError(
+ f"File '{file_name}' not found in data archive {self.archive_url} (doi:{self.doi})."
+ )
+ download_url = files[file_name]["download_url"]
+ return download_url
+
+ def populate_registry(self, pooch):
+ """
+ Populate the registry using the data repository's API
+
+ Parameters
+ ----------
+ pooch : Pooch
+ The pooch instance that the registry will be added to.
+ """
+
+ for filedata in self.api_response:
+ pooch.registry[filedata["name"]] = f"md5:{filedata['computed_md5']}"
+
+
+class DataverseRepository(DataRepository): # pylint: disable=missing-class-docstring
+ def __init__(self, doi, archive_url):
+ self.archive_url = archive_url
+ self.doi = doi
+ self._api_response = None
+
+ @classmethod
+ def initialize(cls, doi, archive_url):
+ """
+ Initialize the data repository if the given URL points to a
+ corresponding repository.
+
+ Initializes a data repository object. This is done as part of
+ a chain of responsibility. If the class cannot handle the given
+ repository URL, it returns `None`. Otherwise a `DataRepository`
+ instance is returned.
+
+ Parameters
+ ----------
+ doi : str
+ The DOI that identifies the repository
+ archive_url : str
+ The resolved URL for the DOI
+ """
+
+ # Access the DOI as if this was a DataVerse instance
+ response = cls._get_api_response(doi, archive_url)
+
+ # If we failed, this is probably not a DataVerse instance
+ if 400 <= response.status_code < 600:
+ return None
+
+ # Initialize the repository and overwrite the api response
+ repository = cls(doi, archive_url)
+ repository.api_response = response
+ return repository
+
+ @classmethod
+ def _get_api_response(cls, doi, archive_url):
+ """
+ Perform the actual API request
+
+ This has been separated into a separate ``classmethod``, as it can be
+ used prior and after the initialization.
+ """
+ parsed = parse_url(archive_url)
+ response = requests.get(
+ f"{parsed['protocol']}://{parsed['netloc']}/api/datasets/"
+ f":persistentId?persistentId=doi:{doi}"
+ )
+ return response
+
+ @property
+ def api_response(self):
+ """Cached API response from a DataVerse instance"""
+
+ if self._api_response is None:
+ self._api_response = self._get_api_response(
+ self.doi, self.archive_url
+ ) # pragma: no cover
+
+ return self._api_response
+
+ @api_response.setter
+ def api_response(self, response):
+ """Update the cached API response"""
+
+ self._api_response = response
+
+ def download_url(self, file_name):
+ """
+ Use the repository API to get the download URL for a file given
+ the archive URL.
+
+ Parameters
+ ----------
+ file_name : str
+ The name of the file in the archive that will be downloaded.
+
+ Returns
+ -------
+ download_url : str
+ The HTTP URL that can be used to download the file.
+ """
+
+ parsed = parse_url(self.archive_url)
+
+ # Iterate over the given files until we find one of the requested name
+ for filedata in self.api_response.json()["data"]["latestVersion"]["files"]:
+ if file_name == filedata["dataFile"]["filename"]:
+ return (
+ f"{parsed['protocol']}://{parsed['netloc']}/api/access/datafile/"
+ f":persistentId?persistentId={filedata['dataFile']['persistentId']}"
+ )
- """
- # Use the figshare API to find the article ID from the DOI
- article = requests.get(f"https://api.figshare.com/v2/articles?doi={doi}").json()[0]
- article_id = article["id"]
- # With the ID, we can get a list of files and their download links
- response = requests.get(f"https://api.figshare.com/v2/articles/{article_id}/files")
- response.raise_for_status()
- files = {item["name"]: item for item in response.json()}
- if file_name not in files:
raise ValueError(
- f"File '{file_name}' not found in data archive {archive_url} (doi:{doi})."
+ f"File '{file_name}' not found in data archive {self.archive_url} (doi:{self.doi})."
)
- download_url = files[file_name]["download_url"]
- return download_url
+
+ def populate_registry(self, pooch):
+ """
+ Populate the registry using the data repository's API
+
+ Parameters
+ ----------
+ pooch : Pooch
+ The pooch instance that the registry will be added to.
+ """
+
+ for filedata in self.api_response.json()["data"]["latestVersion"]["files"]:
+ pooch.registry[
+ filedata["dataFile"]["filename"]
+ ] = f"md5:{filedata['dataFile']['md5']}"
=====================================
pooch/hashes.py
=====================================
@@ -208,11 +208,9 @@ def make_registry(directory, output, recursive=True):
pattern = "*"
files = sorted(
- [
- str(path.relative_to(directory))
- for path in directory.glob(pattern)
- if path.is_file()
- ]
+ str(path.relative_to(directory))
+ for path in directory.glob(pattern)
+ if path.is_file()
)
hashes = [file_hash(str(directory / fname)) for fname in files]
=====================================
pooch/processors.py
=====================================
@@ -79,17 +79,36 @@ class ExtractorProcessor: # pylint: disable=too-few-public-methods
else:
archive_dir = fname.rsplit(os.path.sep, maxsplit=1)[0]
self.extract_dir = os.path.join(archive_dir, self.extract_dir)
- if action in ("update", "download") or not os.path.exists(self.extract_dir):
+ if (
+ (action in ("update", "download"))
+ or (not os.path.exists(self.extract_dir))
+ or (
+ (self.members is not None)
+ and (
+ not all(
+ os.path.exists(os.path.join(self.extract_dir, m))
+ for m in self.members
+ )
+ )
+ )
+ ):
# Make sure that the folder with the extracted files exists
os.makedirs(self.extract_dir, exist_ok=True)
self._extract_file(fname, self.extract_dir)
+
# Get a list of all file names (including subdirectories) in our folder
- # of unzipped files.
- fnames = [
- os.path.join(path, fname)
- for path, _, files in os.walk(self.extract_dir)
- for fname in files
- ]
+ # of unzipped files, filtered by the given members list
+ fnames = []
+ for path, _, files in os.walk(self.extract_dir):
+ for filename in files:
+ relpath = os.path.normpath(
+ os.path.join(os.path.relpath(path, self.extract_dir), filename)
+ )
+ if self.members is None or any(
+ relpath.startswith(os.path.normpath(m)) for m in self.members
+ ):
+ fnames.append(os.path.join(path, filename))
+
return fnames
def _extract_file(self, fname, extract_dir):
@@ -153,7 +172,9 @@ class Unzip(ExtractorProcessor): # pylint: disable=too-few-public-methods
# Based on:
# https://stackoverflow.com/questions/8008829/extract-only-a-single-directory-from-tar
subdir_members = [
- name for name in zip_file.namelist() if name.startswith(member)
+ name
+ for name in zip_file.namelist()
+ if os.path.normpath(name).startswith(os.path.normpath(member))
]
# Extract the data file from within the archive
zip_file.extractall(members=subdir_members, path=extract_dir)
@@ -216,7 +237,9 @@ class Untar(ExtractorProcessor): # pylint: disable=too-few-public-methods
subdir_members = [
info
for info in tar_file.getmembers()
- if info.name.startswith(member)
+ if os.path.normpath(info.name).startswith(
+ os.path.normpath(member)
+ )
]
# Extract the data file from within the archive
tar_file.extractall(members=subdir_members, path=extract_dir)
=====================================
pooch/tests/data/registry-spaces.txt
=====================================
@@ -0,0 +1,2 @@
+"file with spaces.txt" baee0894dba14b12085eacb204284b97e362f4f3e5a5807693cc90ef415c1b2d
+other\ with\ spaces.txt baee0894dba14b12085eacb204284b97e362f4f3e5a5807693cc90ef415c1b2d
=====================================
pooch/tests/test_core.py
=====================================
@@ -21,12 +21,15 @@ from ..hashes import file_hash, hash_matches
# Import the core module so that we can monkeypatch some functions
from .. import core
-from ..downloaders import HTTPDownloader
+from ..downloaders import HTTPDownloader, FTPDownloader
from .utils import (
pooch_test_url,
+ data_over_ftp,
pooch_test_figshare_url,
pooch_test_zenodo_url,
+ pooch_test_zenodo_with_slash_url,
+ pooch_test_dataverse_url,
pooch_test_registry,
check_tiny_data,
check_large_data,
@@ -39,6 +42,8 @@ REGISTRY = pooch_test_registry()
BASEURL = pooch_test_url()
FIGSHAREURL = pooch_test_figshare_url()
ZENODOURL = pooch_test_zenodo_url()
+ZENODOURL_W_SLASH = pooch_test_zenodo_with_slash_url()
+DATAVERSEURL = pooch_test_dataverse_url()
REGISTRY_CORRUPTED = {
# The same data file but I changed the hash manually to a wrong one
"tiny-data.txt": "098h0894dba14b12085eacb204284b97e362f4f3e5a5807693cc90ef415c1b2d"
@@ -134,7 +139,9 @@ def test_pooch_local(data_dir_mirror):
@pytest.mark.network
@pytest.mark.parametrize(
- "url", [BASEURL, FIGSHAREURL, ZENODOURL], ids=["https", "figshare", "zenodo"]
+ "url",
+ [BASEURL, FIGSHAREURL, ZENODOURL, DATAVERSEURL],
+ ids=["https", "figshare", "zenodo", "dataverse"],
)
def test_pooch_custom_url(url):
"Have pooch download the file from URL that is not base_url"
@@ -158,7 +165,9 @@ def test_pooch_custom_url(url):
@pytest.mark.network
@pytest.mark.parametrize(
- "url", [BASEURL, FIGSHAREURL, ZENODOURL], ids=["https", "figshare", "zenodo"]
+ "url",
+ [BASEURL, FIGSHAREURL, ZENODOURL, DATAVERSEURL],
+ ids=["https", "figshare", "zenodo", "dataverse"],
)
def test_pooch_download(url):
"Setup a pooch that has no local data and needs to download"
@@ -373,6 +382,15 @@ def test_pooch_update_disallowed_environment():
os.environ.pop(variable_name)
+def test_pooch_create_base_url_no_trailing_slash():
+ """
+ Test if pooch.create appends a trailing slash to the base url if missing
+ """
+ base_url = "https://mybase.url"
+ pup = create(base_url=base_url, registry=None, path=DATA_DIR)
+ assert pup.base_url == base_url + "/"
+
+
@pytest.mark.network
def test_pooch_corrupted(data_dir_mirror):
"Raise an exception if the file hash doesn't match the registry"
@@ -457,6 +475,14 @@ def test_pooch_load_registry_invalid_line():
pup.load_registry(os.path.join(DATA_DIR, "registry-invalid.txt"))
+def test_pooch_load_registry_with_spaces():
+ "Should check that spaces in filenames are allowed in registry files"
+ pup = Pooch(path="", base_url="")
+ pup.load_registry(os.path.join(DATA_DIR, "registry-spaces.txt"))
+ assert "file with spaces.txt" in pup.registry
+ assert "other with spaces.txt" in pup.registry
+
+
@pytest.mark.network
def test_check_availability():
"Should correctly check availability of existing and non existing files"
@@ -473,21 +499,38 @@ def test_check_availability():
assert not pup.is_available("not-a-real-data-file.txt")
- at pytest.mark.network
-def test_check_availability_on_ftp():
+def test_check_availability_on_ftp(ftpserver):
"Should correctly check availability of existing and non existing files"
- # Check available remote file on FTP server
- pup = Pooch(
- path=DATA_DIR,
- base_url="ftp://data-out.unavco.org/pub/products/velocity/rel_201712/",
- registry={
- "pbo.final_igs08.20171202.vel": "md5:0b75d4049dedd0e179615f4b5e956156",
- "doesnot_exist.zip": "jdjdjdjdflld",
- },
- )
- assert pup.is_available("pbo.final_igs08.20171202.vel")
- # Check non available remote file
- assert not pup.is_available("doesnot_exist.zip")
+ with data_over_ftp(ftpserver, "tiny-data.txt") as url:
+ # Check available remote file on FTP server
+ pup = Pooch(
+ path=DATA_DIR,
+ base_url=url.replace("tiny-data.txt", ""),
+ registry={
+ "tiny-data.txt": "baee0894dba14b12085eacb204284b97e362f4f3e5a5807693cc90ef415c1b2d",
+ "doesnot_exist.zip": "jdjdjdjdflld",
+ },
+ )
+ downloader = FTPDownloader(port=ftpserver.server_port)
+ assert pup.is_available("tiny-data.txt", downloader=downloader)
+ # Check non available remote file
+ assert not pup.is_available("doesnot_exist.zip", downloader=downloader)
+
+
+def test_check_availability_invalid_downloader():
+ "Should raise an exception if the downloader doesn't support this"
+
+ def downloader(url, output, pooch): # pylint: disable=unused-argument
+ "A downloader that doesn't support check_only"
+ return None
+
+ pup = Pooch(path=DATA_DIR, base_url=BASEURL, registry=REGISTRY)
+ # First check that everything works without the custom downloader
+ assert pup.is_available("tiny-data.txt")
+ # Now use the bad one
+ with pytest.raises(NotImplementedError) as error:
+ pup.is_available("tiny-data.txt", downloader=downloader)
+ assert "does not support availability checks" in str(error)
@pytest.mark.network
@@ -582,3 +625,56 @@ def test_stream_download(fname):
stream_download(url, destination, known_hash, downloader, pooch=None)
assert destination.exists()
check_tiny_data(str(destination))
+
+
+ at pytest.mark.parametrize(
+ "url",
+ [FIGSHAREURL, ZENODOURL, DATAVERSEURL],
+ ids=["figshare", "zenodo", "dataverse"],
+)
+def test_load_registry_from_doi(url):
+ """Check that the registry is correctly populated from the API"""
+ with TemporaryDirectory() as local_store:
+ path = os.path.abspath(local_store)
+ pup = Pooch(path=path, base_url=url)
+ pup.load_registry_from_doi()
+
+ # Check the existence of all files in the registry
+ assert len(pup.registry) == 2
+ assert "tiny-data.txt" in pup.registry
+ assert "store.zip" in pup.registry
+
+ # Ensure that all files have correct checksums by fetching them
+ for filename in pup.registry:
+ pup.fetch(filename)
+
+
+def test_load_registry_from_doi_zenodo_with_slash():
+ """
+ Check that the registry is correctly populated from the Zenodo API when
+ the filename contains a slash
+ """
+ url = ZENODOURL_W_SLASH
+ with TemporaryDirectory() as local_store:
+ path = os.path.abspath(local_store)
+ pup = Pooch(path=path, base_url=url)
+ pup.load_registry_from_doi()
+
+ # Check the existence of all files in the registry
+ assert len(pup.registry) == 1
+ assert "santisoler/pooch-test-data-v1.zip" in pup.registry
+
+ # Ensure that all files have correct checksums by fetching them
+ for filename in pup.registry:
+ pup.fetch(filename)
+
+
+def test_wrong_load_registry_from_doi():
+ """Check that non-DOI URLs produce an error"""
+
+ pup = Pooch(path="", base_url=BASEURL)
+
+ with pytest.raises(ValueError) as exc:
+ pup.load_registry_from_doi()
+
+ assert "only implemented for DOIs" in str(exc.value)
=====================================
pooch/tests/test_downloaders.py
=====================================
@@ -29,10 +29,12 @@ from ..downloaders import (
SFTPDownloader,
DOIDownloader,
choose_downloader,
- figshare_download_url,
- zenodo_download_url,
+ FigshareRepository,
+ ZenodoRepository,
+ DataverseRepository,
doi_to_url,
)
+from ..processors import Unzip
from .utils import (
pooch_test_url,
check_large_data,
@@ -40,12 +42,16 @@ from .utils import (
data_over_ftp,
pooch_test_figshare_url,
pooch_test_zenodo_url,
+ pooch_test_zenodo_with_slash_url,
+ pooch_test_dataverse_url,
)
BASEURL = pooch_test_url()
FIGSHAREURL = pooch_test_figshare_url()
ZENODOURL = pooch_test_zenodo_url()
+ZENODOURL_W_SLASH = pooch_test_zenodo_with_slash_url()
+DATAVERSEURL = pooch_test_dataverse_url()
@pytest.mark.skipif(tqdm is None, reason="requires tqdm")
@@ -97,22 +103,28 @@ def test_doi_url_not_found():
@pytest.mark.parametrize(
- "converter,doi",
+ "repository,doi",
[
- (figshare_download_url, "10.6084/m9.figshare.14763051.v1"),
- (zenodo_download_url, "10.5281/zenodo.4924875"),
+ (FigshareRepository, "10.6084/m9.figshare.14763051.v1"),
+ (ZenodoRepository, "10.5281/zenodo.4924875"),
+ (DataverseRepository, "10.11588/data/TKCFEF"),
],
- ids=["figshare", "zenodo"],
+ ids=["figshare", "zenodo", "dataverse"],
)
-def test_figshare_url_file_not_found(converter, doi):
+def test_figshare_url_file_not_found(repository, doi):
"Should fail if the file is not found in the archive"
with pytest.raises(ValueError) as exc:
url = doi_to_url(doi)
- converter(archive_url=url, file_name="bla.txt", doi=doi)
+ repo = repository.initialize(doi, url)
+ repo.download_url(file_name="bla.txt")
assert "File 'bla.txt' not found" in str(exc.value)
- at pytest.mark.parametrize("url", [FIGSHAREURL, ZENODOURL], ids=["figshare", "zenodo"])
+ at pytest.mark.parametrize(
+ "url",
+ [FIGSHAREURL, ZENODOURL, DATAVERSEURL],
+ ids=["figshare", "zenodo", "dataverse"],
+)
def test_doi_downloader(url):
"Test the DOI downloader"
# Use the test data we have on the repository
@@ -123,6 +135,75 @@ def test_doi_downloader(url):
check_tiny_data(outfile)
+ at pytest.mark.network
+def test_zenodo_downloader_with_slash_in_fname():
+ """
+ Test the Zenodo downloader when the path contains a forward slash
+
+ Related to issue #336
+ """
+ # Use the test data we have on the repository
+ with TemporaryDirectory() as local_store:
+ base_url = ZENODOURL_W_SLASH + "santisoler/pooch-test-data-v1.zip"
+ downloader = DOIDownloader()
+ outfile = os.path.join(local_store, "test-data.zip")
+ downloader(base_url, outfile, None)
+ # unpack the downloaded zip file so we can check the integrity of
+ # tiny-data.txt
+ fnames = Unzip()(outfile, action="download", pooch=None)
+ (fname,) = [f for f in fnames if "tiny-data.txt" in f]
+ check_tiny_data(fname)
+
+
+ at pytest.mark.network
+def test_figshare_unspecified_version():
+ """
+ Test if passing a Figshare url without a version warns about it, but still
+ downloads it.
+ """
+ url = FIGSHAREURL
+ # Remove the last bits of the doi, where the version is specified and
+ url = url[: url.rindex(".")] + "/"
+ # Create expected warning message
+ doi = url[4:-1]
+ warning_msg = f"The Figshare DOI '{doi}' doesn't specify which version of "
+ with TemporaryDirectory() as local_store:
+ downloader = DOIDownloader()
+ outfile = os.path.join(local_store, "tiny-data.txt")
+ with pytest.warns(UserWarning, match=warning_msg):
+ downloader(url + "tiny-data.txt", outfile, None)
+
+
+ at pytest.mark.network
+ at pytest.mark.parametrize(
+ "version, missing, present",
+ [
+ (
+ 1,
+ "LC08_L2SP_218074_20190114_20200829_02_T1-cropped.tar.gz",
+ "cropped-before.tar.gz",
+ ),
+ (
+ 2,
+ "cropped-before.tar.gz",
+ "LC08_L2SP_218074_20190114_20200829_02_T1-cropped.tar.gz",
+ ),
+ ],
+)
+def test_figshare_data_repository_versions(version, missing, present):
+ """
+ Test if setting the version in Figshare DOI works as expected
+ """
+ # Use a Figshare repo as example (we won't download files from it since
+ # they are too big)
+ doi = f"10.6084/m9.figshare.21665630.v{version}"
+ url = f"https://doi.org/{doi}/"
+ figshare = FigshareRepository(doi, url)
+ filenames = [item["name"] for item in figshare.api_response]
+ assert present in filenames
+ assert missing not in filenames
+
+
@pytest.mark.network
def test_ftp_downloader(ftpserver):
"Test ftp downloader"
=====================================
pooch/tests/test_processors.py
=====================================
@@ -169,6 +169,57 @@ def test_unpacking(processor_class, extension, target_path, archive, members):
check_tiny_data(fname)
+ at pytest.mark.network
+ at pytest.mark.parametrize(
+ "processor_class,extension",
+ [(Unzip, ".zip"), (Untar, ".tar.gz")],
+)
+def test_multiple_unpacking(processor_class, extension):
+ "Test that multiple subsequent calls to a processor yield correct results"
+
+ with TemporaryDirectory() as local_store:
+ pup = Pooch(path=Path(local_store), base_url=BASEURL, registry=REGISTRY)
+
+ # Do a first fetch with the one member only
+ processor1 = processor_class(members=["store/tiny-data.txt"])
+ filenames1 = pup.fetch("store" + extension, processor=processor1)
+ assert len(filenames1) == 1
+ check_tiny_data(filenames1[0])
+
+ # Do a second fetch with the other member
+ processor2 = processor_class(
+ members=["store/tiny-data.txt", "store/subdir/tiny-data.txt"]
+ )
+ filenames2 = pup.fetch("store" + extension, processor=processor2)
+ assert len(filenames2) == 2
+ check_tiny_data(filenames2[0])
+ check_tiny_data(filenames2[1])
+
+ # Do a third fetch, again with one member and assert
+ # that only this member was returned
+ filenames3 = pup.fetch("store" + extension, processor=processor1)
+ assert len(filenames3) == 1
+ check_tiny_data(filenames3[0])
+
+
+ at pytest.mark.network
+ at pytest.mark.parametrize(
+ "processor_class,extension",
+ [(Unzip, ".zip"), (Untar, ".tar.gz")],
+)
+def test_unpack_members_with_leading_dot(processor_class, extension):
+ "Test that unpack members can also be specifed both with a leading ./"
+
+ with TemporaryDirectory() as local_store:
+ pup = Pooch(path=Path(local_store), base_url=BASEURL, registry=REGISTRY)
+
+ # Do a first fetch with the one member only
+ processor1 = processor_class(members=["./store/tiny-data.txt"])
+ filenames1 = pup.fetch("store" + extension, processor=processor1)
+ assert len(filenames1) == 1
+ check_tiny_data(filenames1[0])
+
+
def _check_logs(log_file, expected_lines):
"""
Assert that the lines in the log match the expected ones.
=====================================
pooch/tests/test_utils.py
=====================================
@@ -141,8 +141,16 @@ def test_local_storage_newfile_permissionerror(monkeypatch):
"path": "/dike.json",
},
),
+ (
+ r"doi:10.5281/zenodo.7632643/santisoler/pooch-test-data-v1.zip",
+ {
+ "protocol": "doi",
+ "netloc": "10.5281/zenodo.7632643",
+ "path": "/santisoler/pooch-test-data-v1.zip",
+ },
+ ),
],
- ids=["http", "ftp", "doi"],
+ ids=["http", "ftp", "doi", "zenodo-doi-with-slash"],
)
def test_parse_url(url, output):
"Parse URL into 3 components"
=====================================
pooch/tests/utils.py
=====================================
@@ -98,6 +98,37 @@ def pooch_test_zenodo_url():
return url
+def pooch_test_zenodo_with_slash_url():
+ """
+ Get base URL for test data in Zenodo, where the file name contains a slash
+
+ The URL contains the DOI for the Zenodo dataset that has a slash in the
+ filename (created with the GitHub-Zenodo integration service), using the
+ appropriate version for this version of Pooch.
+
+ Returns
+ -------
+ url
+ The URL for pooch's test data.
+
+ """
+ url = "doi:10.5281/zenodo.7632643/"
+ return url
+
+
+def pooch_test_dataverse_url():
+ """
+ Get the base URL for the test data stored on a DataVerse instance.
+
+ Returns
+ -------
+ url
+ The URL for pooch's test data.
+ """
+ url = "doi:10.11588/data/TKCFEF/"
+ return url
+
+
def pooch_test_registry():
"""
Get a registry for the test data used in Pooch itself.
=====================================
pooch/utils.py
=====================================
@@ -16,7 +16,7 @@ from urllib.parse import urlsplit
from contextlib import contextmanager
import warnings
-import appdirs
+import platformdirs
from packaging.version import Version
@@ -74,10 +74,10 @@ def os_cache(project):
r"""
Default cache location based on the operating system.
- The folder locations are defined by the ``appdirs`` package
+ The folder locations are defined by the ``platformdirs`` package
using the ``user_cache_dir`` function.
Usually, the locations will be following (see the
- `appdirs documentation <https://github.com/ActiveState/appdirs>`__):
+ `platformdirs documentation <https://platformdirs.readthedocs.io>`__):
* Mac: ``~/Library/Caches/<AppName>``
* Unix: ``~/.cache/<AppName>`` or the value of the ``XDG_CACHE_HOME``
@@ -96,7 +96,7 @@ def os_cache(project):
not expanded.
"""
- return Path(appdirs.user_cache_dir(project))
+ return Path(platformdirs.user_cache_dir(project))
def check_version(version, fallback="master"):
@@ -158,7 +158,12 @@ def parse_url(url):
* doi:10.6084/m9.figshare.923450.v1/test.nc
The DOI is a special case. The protocol will be "doi", the netloc will be
- the DOI, and the path is what comes after the second "/".
+ the DOI, and the path is what comes after the last "/".
+ The only exception are Zenodo dois: the protocol will be "doi", the netloc
+ will be composed by the "prefix/suffix" and the path is what comes after
+ the second "/". This allows to support special cases of Zenodo dois where
+ the path contains forward slashes "/", created by the GitHub-Zenodo
+ integration service.
Parameters
----------
@@ -179,8 +184,12 @@ def parse_url(url):
if url.startswith("doi:"):
protocol = "doi"
parts = url[4:].split("/")
- netloc = "/".join(parts[:2])
- path = "/" + "/".join(parts[2:])
+ if "zenodo" in parts[1].lower():
+ netloc = "/".join(parts[:2])
+ path = "/" + "/".join(parts[2:])
+ else:
+ netloc = "/".join(parts[:-1])
+ path = "/" + parts[-1]
else:
parsed_url = urlsplit(url)
protocol = parsed_url.scheme or "file"
=====================================
setup.cfg
=====================================
@@ -2,8 +2,8 @@
name = pooch
fullname = Pooch
description = "Pooch manages your Python library's sample data files: it automatically downloads and stores them in a local directory, with support for versioning and corruption checks."
-long_description = file: README.rst
-long_description_content_type = text/x-rst
+long_description = file: README.md
+long_description_content_type = text/markdown
author = The Pooch Developers
author_email = fatiandoaterra at protonmail.com
maintainer = "Leonardo Uieda"
@@ -23,7 +23,6 @@ classifiers =
Topic :: Scientific/Engineering
Topic :: Software Development :: Libraries
Programming Language :: Python :: 3 :: Only
- Programming Language :: Python :: 3.6
Programming Language :: Python :: 3.7
Programming Language :: Python :: 3.8
Programming Language :: Python :: 3.9
@@ -39,10 +38,10 @@ project_urls =
zip_safe = True
include_package_data = True
packages = find:
-python_requires = >=3.6
+python_requires = >=3.7
setup_requires =
install_requires =
- appdirs>=1.3.0
+ platformdirs>=2.5.0
packaging>=20.0
requests>=2.19.0
View it on GitLab: https://salsa.debian.org/debian-gis-team/pooch/-/commit/178bd166be85589bfd1a90d518c5136b3ac184d3
--
View it on GitLab: https://salsa.debian.org/debian-gis-team/pooch/-/commit/178bd166be85589bfd1a90d518c5136b3ac184d3
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/pkg-grass-devel/attachments/20230611/f95a4268/attachment-0001.htm>
More information about the Pkg-grass-devel
mailing list