[Git][debian-gis-team/pooch][upstream] New upstream version 1.7.0

Antonio Valentino (@antonio.valentino) gitlab at salsa.debian.org
Sun Jun 11 15:50:34 BST 2023



Antonio Valentino pushed to branch upstream at Debian GIS Project / pooch


Commits:
178bd166 by Antonio Valentino at 2023-06-10T15:33:53+00:00
New upstream version 1.7.0
- - - - -


28 changed files:

- .github/workflows/docs.yml
- .github/workflows/pypi.yml
- .github/workflows/test.yml
- AUTHORS.md
- + CITATION.cff
- CITATION.rst
- + README.md
- − README.rst
- doc/changes.rst
- doc/conf.py
- doc/downloaders.rst
- doc/install.rst
- doc/protocols.rst
- doc/versions.rst
- env/requirements-docs.txt
- environment.yml
- pooch/core.py
- pooch/downloaders.py
- pooch/hashes.py
- pooch/processors.py
- + pooch/tests/data/registry-spaces.txt
- pooch/tests/test_core.py
- pooch/tests/test_downloaders.py
- pooch/tests/test_processors.py
- pooch/tests/test_utils.py
- pooch/tests/utils.py
- pooch/utils.py
- setup.cfg


Changes:

=====================================
.github/workflows/docs.yml
=====================================
@@ -30,7 +30,6 @@ jobs:
     runs-on: ubuntu-latest
     env:
       REQUIREMENTS: env/requirements-build.txt env/requirements-docs.txt
-      PYTHON: 3.9
 
     steps:
       # Cancel any previous run of the test job
@@ -61,7 +60,7 @@ jobs:
       - name: Setup Python
         uses: actions/setup-python at v2
         with:
-          python-version: ${{ env.PYTHON }}
+          python-version: "3.10"
 
       - name: Collect requirements
         run: |


=====================================
.github/workflows/pypi.yml
=====================================
@@ -47,7 +47,7 @@ jobs:
       - name: Setup Python
         uses: actions/setup-python at v2
         with:
-          python-version: "3.9"
+          python-version: "3.10"
 
       - name: Install requirements
         run: |


=====================================
.github/workflows/test.yml
=====================================
@@ -20,7 +20,7 @@ on:
   schedule:
     # Run every Monday at 12:00 UTC
     # * is a special character in YAML so you have to quote this string
-    - cron:  '00 12 * * 1'
+    - cron: "00 12 * * 1"
 
 # Use bash by default in all jobs
 defaults:
@@ -28,7 +28,6 @@ defaults:
     shell: bash
 
 jobs:
-
   #############################################################################
   # Run tests and upload to codecov
   test:
@@ -44,7 +43,7 @@ jobs:
           - macos
           - windows
         python:
-          - "3.6"
+          - "3.7"
           - "3.10"
         dependencies:
           - latest
@@ -148,9 +147,9 @@ jobs:
         run: coverage xml
 
       - name: Upload coverage to Codecov
-        uses: codecov/codecov-action at v1
+        uses: codecov/codecov-action at v3
         with:
-          file: ./coverage.xml
+          files: ./coverage.xml
           env_vars: OS,PYTHON,DEPENDENCIES
           # Don't mark the job as failed if the upload fails for some reason.
           # It does sometimes but shouldn't be the reason for running


=====================================
AUTHORS.md
=====================================
@@ -10,8 +10,10 @@ order by last name) and are considered "The Pooch Developers":
 * [Mark Harfouche](https://github.com/hmaarrfk) - Ramona Optics Inc. - [0000-0002-4657-4603](https://orcid.org/0000-0002-4657-4603)
 * [Danilo Horta](https://github.com/horta) - EMBL-EBI, UK
 * [Hugo van Kemenade](https://github.com/hugovk) - Independent (Non-affiliated) (ORCID: [0000-0001-5715-8632](https://www.orcid.org/0000-0001-5715-8632))
+* [Dominic Kempf](https://github.com/dokempf) - Scientific Software Center, Heidelberg University, Germany (ORCID: [0000-0002-6140-2332](https://www.orcid.org/0000-0002-6140-2332))
 * [Kacper Kowalik](https://github.com/Xarthisius) - National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, USA (ORCID: [0000-0003-1709-3744](https://www.orcid.org/0000-0003-1709-3744))
 * [John Leeman](https://github.com/jrleeman)
+* [Björn Ludwig](https://github.com/BjoernLudwigPTB) - Physikalisch-Technische Bundesanstalt, Germany (ORCID: [0000-0002-5910-9137](https://www.orcid.org/0000-0002-5910-9137))
 * [Daniel McCloy](https://github.com/drammock) - University of Washington, USA (ORCID: [0000-0002-7572-3241](https://orcid.org/0000-0002-7572-3241))
 * [Rémi Rampin](https://github.com/remram44) - New York University, USA (ORCID: [0000-0002-0524-2282](https://www.orcid.org/0000-0002-0524-2282))
 * [Clément Robert](https://github.com/neutrinoceros) - Institut de Planétologie et d'Astrophysique de Grenoble, France (ORCID: [0000-0001-8629-7068](https://orcid.org/0000-0001-8629-7068))


=====================================
CITATION.cff
=====================================
@@ -0,0 +1,53 @@
+cff-version: 1.2.0
+title: 'Pooch: A friend to fetch your data files'
+message: >-
+  If you use this software, please cite it using the
+  information in this file.
+type: software
+url: 'https://www.fatiando.org/pooch/'
+repository-code: 'https://github.com/fatiando/pooch'
+repository-artifact: 'https://pypi.org/project/pooch/'
+license: BSD-3-Clause
+preferred-citation:
+  type: article
+  title: 'Pooch: A friend to fetch your data files'
+  journal: Journal of Open Source Software
+  year: 2020
+  doi: 10.21105/joss.01943
+  volume: 5
+  issue: 45
+  start: 1943
+  license: CC-BY-4.0
+  authors:
+    - given-names: Leonardo
+      family-names: Uieda
+      affiliation: University of Liverpool
+      orcid: 'https://orcid.org/0000-0001-6123-9515'
+    - given-names: Santiago Rubén
+      family-names: Soler
+      affiliation: Universidad Nacional de San Juan
+      orcid: 'https://orcid.org/0000-0001-9202-5317'
+    - given-names: Rémi
+      family-names: Rampin
+      affiliation: New York University
+      orcid: 'https://orcid.org/0000-0002-0524-2282'
+    - given-names: Hugo
+      name-particle: van
+      family-names: Kemenade
+      orcid: 'https://orcid.org/0000-0001-5715-8632'
+    - given-names: Matthew
+      family-names: Turk
+      affiliation: School of Information Sciences
+      orcid: 'https://orcid.org/0000-0002-5294-0198'
+    - given-names: Daniel
+      family-names: Shapero
+      affiliation: University of Washington
+      orcid: 'https://orcid.org/0000-0002-3651-0649'
+    - given-names: Anderson
+      family-names: Banihirwe
+      affiliation: National Center for Atmospheric Research
+      orcid: 'https://orcid.org/0000-0001-6583-571X'
+    - given-names: John
+      family-names: Leeman
+      affiliation: Leeman Geophysical
+      orcid: 'https://orcid.org/0000-0002-3624-1821'


=====================================
CITATION.rst
=====================================
@@ -14,5 +14,20 @@ If you used Pooch in your research, please consider citing our paper:
 This is an open-access publication. The paper and the associated software
 review can be freely accessed at: https://doi.org/10.21105/joss.01943
 
-If you need a Bibtex entry for the paper, grab it here:
-https://www.doi2bib.org/bib/10.21105/joss.01943
+Here is a Bibtex entry to make things easier if you’re using Latex:
+
+.. code:: bibtex
+
+    @article{uieda2020,
+      title = {{Pooch}: {A} friend to fetch your data files},
+      author = {Leonardo Uieda and Santiago Soler and R{\'{e}}mi Rampin and Hugo van Kemenade and Matthew Turk and Daniel Shapero and Anderson Banihirwe and John Leeman},
+      year = {2020},
+      doi = {10.21105/joss.01943},
+      url = {https://doi.org/10.21105/joss.01943},
+      month = jan,
+      publisher = {The Open Journal},
+      volume = {5},
+      number = {45},
+      pages = {1943},
+      journal = {Journal of Open Source Software}
+    }


=====================================
README.md
=====================================
@@ -0,0 +1,185 @@
+<img src="https://github.com/fatiando/pooch/raw/main/doc/_static/readme-banner.png" alt="Pooch: A friend to fetch your data files">
+
+<p align="center">
+<a href="https://www.fatiando.org/pooch"><strong>Documentation</strong> (latest)</a> •
+<a href="https://www.fatiando.org/pooch/dev"><strong>Documentation</strong> (main branch)</a> •
+<a href="https://github.com/fatiando/pooch/blob/main/CONTRIBUTING.md"><strong>Contributing</strong></a> •
+<a href="https://www.fatiando.org/contact/"><strong>Contact</strong></a>
+</p>
+
+<p align="center">
+Part of the <a href="https://www.fatiando.org"><strong>Fatiando a Terra</strong></a> project
+</p>
+
+<p align="center">
+<a href="https://pypi.python.org/pypi/pooch"><img src="http://img.shields.io/pypi/v/pooch.svg?style=flat-square" alt="Latest version on PyPI"></a>
+<a href="https://github.com/conda-forge/pooch-feedstock"><img src="https://img.shields.io/conda/vn/conda-forge/pooch.svg?style=flat-square" alt="Latest version on conda-forge"></a>
+<a href="https://codecov.io/gh/fatiando/pooch"><img src="https://img.shields.io/codecov/c/github/fatiando/pooch/main.svg?style=flat-square" alt="Test coverage status"></a>
+<a href="https://pypi.python.org/pypi/pooch"><img src="https://img.shields.io/pypi/pyversions/pooch.svg?style=flat-square" alt="Compatible Python versions."></a>
+<a href="https://doi.org/10.21105/joss.01943"><img src="https://img.shields.io/badge/doi-10.21105%2Fjoss.01943-blue?style=flat-square" alt="DOI used to cite Pooch"></a>
+</p>
+
+## About
+
+*Does your Python package include sample datasets?
+Are you shipping them with the code?
+Are they getting too big?*
+
+**Pooch** is here to help! It will manage a data *registry* by downloading your
+data files from a server only when needed and storing them locally in a data
+*cache* (a folder on your computer).
+
+Here are Pooch's main features:
+
+* Pure Python and minimal dependencies.
+* Download a file only if necessary (it's not in the data cache or needs to be
+  updated).
+* Verify download integrity through SHA256 hashes (also used to check if a file
+  needs to be updated).
+* Designed to be extended: plug in custom download (FTP, scp, etc) and
+  post-processing (unzip, decompress, rename) functions.
+* Includes utilities to unzip/decompress the data upon download to save loading
+  time.
+* Can handle basic HTTP authentication (for servers that require a login) and
+  printing download progress bars.
+* Easily set up an environment variable to overwrite the data cache location.
+
+*Are you a scientist or researcher? Pooch can help you too!*
+
+* Automatically download your data files so you don't have to keep them in your
+  GitHub repository.
+* Make sure everyone running the code has the same version of the data files
+  (enforced through the SHA256 hashes).
+
+## Example
+
+For a **scientist downloading a data file** for analysis:
+
+```python
+import pooch
+import pandas as pd
+
+# Download a file and save it locally, returning the path to it.
+# Running this again will not cause a download. Pooch will check the hash
+# (checksum) of the downloaded file against the given value to make sure
+# it's the right file (not corrupted or outdated).
+fname_bathymetry = pooch.retrieve(
+    url="https://github.com/fatiando-data/caribbean-bathymetry/releases/download/v1/caribbean-bathymetry.csv.xz",
+    known_hash="md5:a7332aa6e69c77d49d7fb54b764caa82",
+)
+
+# Pooch can also download based on a DOI from certain providers.
+fname_gravity = pooch.retrieve(
+    url="doi:10.5281/zenodo.5882430/southern-africa-gravity.csv.xz",
+    known_hash="md5:1dee324a14e647855366d6eb01a1ef35",
+)
+
+# Load the data with Pandas
+data_bathymetry = pd.read_csv(fname_bathymetry)
+data_gravity = pd.read_csv(fname_gravity)
+```
+
+For **package developers** including sample data in their projects:
+
+```python
+"""
+Module mypackage/datasets.py
+"""
+import pkg_resources
+import pandas
+import pooch
+
+# Get the version string from your project. You have one of these, right?
+from . import version
+
+# Create a new friend to manage your sample data storage
+GOODBOY = pooch.create(
+    # Folder where the data will be stored. For a sensible default, use the
+    # default cache folder for your OS.
+    path=pooch.os_cache("mypackage"),
+    # Base URL of the remote data store. Will call .format on this string
+    # to insert the version (see below).
+    base_url="https://github.com/myproject/mypackage/raw/{version}/data/",
+    # Pooches are versioned so that you can use multiple versions of a
+    # package simultaneously. Use PEP440 compliant version number. The
+    # version will be appended to the path.
+    version=version,
+    # If a version as a "+XX.XXXXX" suffix, we'll assume that this is a dev
+    # version and replace the version with this string.
+    version_dev="main",
+    # An environment variable that overwrites the path.
+    env="MYPACKAGE_DATA_DIR",
+    # The cache file registry. A dictionary with all files managed by this
+    # pooch. Keys are the file names (relative to *base_url*) and values
+    # are their respective SHA256 hashes. Files will be downloaded
+    # automatically when needed (see fetch_gravity_data).
+    registry={"gravity-data.csv": "89y10phsdwhs09whljwc09whcowsdhcwodcydw"}
+)
+# You can also load the registry from a file. Each line contains a file
+# name and it's sha256 hash separated by a space. This makes it easier to
+# manage large numbers of data files. The registry file should be packaged
+# and distributed with your software.
+GOODBOY.load_registry(
+    pkg_resources.resource_stream("mypackage", "registry.txt")
+)
+
+# Define functions that your users can call to get back the data in memory
+def fetch_gravity_data():
+    """
+    Load some sample gravity data to use in your docs.
+    """
+    # Fetch the path to a file in the local storage. If it's not there,
+    # we'll download it.
+    fname = GOODBOY.fetch("gravity-data.csv")
+    # Load it with numpy/pandas/etc
+    data = pandas.read_csv(fname)
+    return data
+```
+
+## Projects using Pooch
+
+* [SciPy](https://github.com/scipy/scipy)
+* [scikit-image](https://github.com/scikit-image/scikit-image)
+* [MetPy](https://github.com/Unidata/MetPy)
+* [icepack](https://github.com/icepack/icepack)
+* [histolab](https://github.com/histolab/histolab)
+* [seaborn-image](https://github.com/SarthakJariwala/seaborn-image)
+* [Ensaio](https://github.com/fatiando/ensaio)
+* [Open AR-Sandbox](https://github.com/cgre-aachen/open_AR_Sandbox)
+* [climlab](https://github.com/climlab/climlab)
+* [napari](https://github.com/napari/napari)
+* [mne-python](https://github.com/mne-tools/mne-python)
+
+*If you're using Pooch, send us a pull request adding your project to the list.*
+
+## Getting involved
+
+🗨️ **Contact us:**
+Find out more about how to reach us at
+[fatiando.org/contact](https://www.fatiando.org/contact/).
+
+👩🏾‍💻 **Contributing to project development:**
+Please read our
+[Contributing Guide](https://github.com/fatiando/pooch/blob/main/CONTRIBUTING.md)
+to see how you can help and give feedback.
+
+🧑🏾‍🤝‍🧑🏼 **Code of conduct:**
+This project is released with a
+[Code of Conduct](https://github.com/fatiando/community/blob/main/CODE_OF_CONDUCT.md).
+By participating in this project you agree to abide by its terms.
+
+> **Imposter syndrome disclaimer:**
+> We want your help. **No, really.** There may be a little voice inside your
+> head that is telling you that you're not ready, that you aren't skilled
+> enough to contribute. We assure you that the little voice in your head is
+> wrong. Most importantly, **there are many valuable ways to contribute besides
+> writing code**.
+>
+> *This disclaimer was adapted from the*
+> [MetPy project](https://github.com/Unidata/MetPy).
+
+## License
+
+This is free software: you can redistribute it and/or modify it under the terms
+of the **BSD 3-clause License**. A copy of this license is provided in
+[`LICENSE.txt`](https://github.com/fatiando/pooch/blob/main/LICENSE.txt).


=====================================
README.rst deleted
=====================================
@@ -1,230 +0,0 @@
-.. image:: https://github.com/fatiando/pooch/raw/main/doc/_static/readme-banner.png
-    :alt: Pooch
-
-`Documentation <https://www.fatiando.org/pooch>`__ |
-`Documentation (dev version) <https://www.fatiando.org/pooch/dev>`__ |
-Part of the `Fatiando a Terra <https://www.fatiando.org>`__ project
-
-.. image:: https://img.shields.io/pypi/v/pooch.svg?style=flat-square
-    :alt: Latest version on PyPI
-    :target: https://pypi.org/project/pooch/
-.. image:: https://img.shields.io/conda/vn/conda-forge/pooch.svg?style=flat-square
-    :alt: Latest version on conda-forge
-    :target: https://github.com/conda-forge/pooch-feedstock
-.. image:: https://img.shields.io/codecov/c/github/fatiando/pooch/main.svg?style=flat-square
-    :alt: Test coverage status
-    :target: https://codecov.io/gh/fatiando/pooch
-.. image:: https://img.shields.io/pypi/pyversions/pooch.svg?style=flat-square
-    :alt: Compatible Python versions.
-    :target: https://pypi.org/project/pooch/
-.. image:: https://img.shields.io/badge/doi-10.21105%2Fjoss.01943-blue.svg?style=flat-square
-    :alt: Digital Object Identifier for the JOSS paper
-    :target: https://doi.org/10.21105/joss.01943
-
-
-About
------
-
-*Does your Python package include sample datasets? Are you shipping them with the code?
-Are they getting too big?*
-
-Pooch is here to help! It will manage a data *registry* by downloading your data files
-from a server only when needed and storing them locally in a data *cache* (a folder on
-your computer).
-
-Here are Pooch's main features:
-
-* Pure Python and minimal dependencies.
-* Download a file only if necessary (it's not in the data cache or needs to be updated).
-* Verify download integrity through SHA256 hashes (also used to check if a file needs to
-  be updated).
-* Designed to be extended: plug in custom download (FTP, scp, etc) and post-processing
-  (unzip, decompress, rename) functions.
-* Includes utilities to unzip/decompress the data upon download to save loading time.
-* Can handle basic HTTP authentication (for servers that require a login) and printing
-  download progress bars.
-* Easily set up an environment variable to overwrite the data cache location.
-
-*Are you a scientist or researcher? Pooch can help you too!*
-
-* Automatically download your data files so you don't have to keep them in your GitHub
-  repository.
-* Make sure everyone running the code has the same version of the data files (enforced
-  through the SHA256 hashes).
-
-
-Example
--------
-
-For a **scientist downloading a data file** for analysis:
-
-.. code:: python
-
-    import pooch
-    import pandas as pd
-
-
-    # Download a file and save it locally, returning the path to it.
-    # Running this again will not cause a download. Pooch will check the hash
-    # (checksum) of the downloaded file against the given value to make sure
-    # it's the right file (not corrupted or outdated).
-    fname_bathymetry = pooch.retrieve(
-        url="https://github.com/fatiando-data/caribbean-bathymetry/releases/download/v1/caribbean-bathymetry.csv.xz",
-        known_hash="md5:a7332aa6e69c77d49d7fb54b764caa82",
-    )
-
-    # Pooch can also download based on a DOI from certain providers.
-    fname_gravity = pooch.retrieve(
-        url="doi:10.5281/zenodo.5882430/southern-africa-gravity.csv.xz",
-        known_hash="md5:1dee324a14e647855366d6eb01a1ef35",
-    )
-
-    # Load the data with Pandas
-    data_bathymetry = pd.read_csv(fname_bathymetry)
-    data_gravity = pd.read_csv(fname_gravity)
-
-
-
-For **package developers** including sample data in their projects:
-
-.. code:: python
-
-    """
-    Module mypackage/datasets.py
-    """
-    import pkg_resources
-    import pandas
-    import pooch
-
-    # Get the version string from your project. You have one of these, right?
-    from . import version
-
-
-    # Create a new friend to manage your sample data storage
-    GOODBOY = pooch.create(
-        # Folder where the data will be stored. For a sensible default, use the
-        # default cache folder for your OS.
-        path=pooch.os_cache("mypackage"),
-        # Base URL of the remote data store. Will call .format on this string
-        # to insert the version (see below).
-        base_url="https://github.com/myproject/mypackage/raw/{version}/data/",
-        # Pooches are versioned so that you can use multiple versions of a
-        # package simultaneously. Use PEP440 compliant version number. The
-        # version will be appended to the path.
-        version=version,
-        # If a version as a "+XX.XXXXX" suffix, we'll assume that this is a dev
-        # version and replace the version with this string.
-        version_dev="main",
-        # An environment variable that overwrites the path.
-        env="MYPACKAGE_DATA_DIR",
-        # The cache file registry. A dictionary with all files managed by this
-        # pooch. Keys are the file names (relative to *base_url*) and values
-        # are their respective SHA256 hashes. Files will be downloaded
-        # automatically when needed (see fetch_gravity_data).
-        registry={"gravity-data.csv": "89y10phsdwhs09whljwc09whcowsdhcwodcydw"}
-    )
-    # You can also load the registry from a file. Each line contains a file
-    # name and it's sha256 hash separated by a space. This makes it easier to
-    # manage large numbers of data files. The registry file should be packaged
-    # and distributed with your software.
-    GOODBOY.load_registry(
-        pkg_resources.resource_stream("mypackage", "registry.txt")
-    )
-
-
-    # Define functions that your users can call to get back the data in memory
-    def fetch_gravity_data():
-        """
-        Load some sample gravity data to use in your docs.
-        """
-        # Fetch the path to a file in the local storage. If it's not there,
-        # we'll download it.
-        fname = GOODBOY.fetch("gravity-data.csv")
-        # Load it with numpy/pandas/etc
-        data = pandas.read_csv(fname)
-        return data
-
-
-Projects using Pooch
---------------------
-
-* `scikit-image <https://github.com/scikit-image/scikit-image>`__
-* `MetPy <https://github.com/Unidata/MetPy>`__
-* `icepack <https://github.com/icepack/icepack>`__
-* `histolab <https://github.com/histolab/histolab>`__
-* `seaborn-image <https://github.com/SarthakJariwala/seaborn-image>`__
-* `Ensaio <https://github.com/fatiando/ensaio>`__
-
-*If you're using Pooch, send us a pull request adding your project to the list.*
-
-
-Contacting Us
--------------
-
-Find out more about how to reach us at
-`fatiando.org/contact <https://www.fatiando.org/contact/>`__
-
-
-Citing Pooch
-------------
-
-This is research software **made by scientists** (see
-`AUTHORS.md <https://github.com/fatiando/pooch/blob/main/AUTHORS.md>`__). Citations
-help us justify the effort that goes into building and maintaining this project. If you
-used Pooch for your research, please consider citing us.
-
-See our `CITATION.rst file <https://github.com/fatiando/pooch/blob/main/CITATION.rst>`__
-to find out more.
-
-
-Contributing
-------------
-
-Code of conduct
-+++++++++++++++
-
-Please note that this project is released with a
-`Code of Conduct <https://github.com/fatiando/community/blob/main/CODE_OF_CONDUCT.md>`__.
-By participating in this project you agree to abide by its terms.
-
-Contributing Guidelines
-+++++++++++++++++++++++
-
-Please read our
-`Contributing Guide <https://github.com/fatiando/pooch/blob/main/CONTRIBUTING.md>`__
-to see how you can help and give feedback.
-
-Imposter syndrome disclaimer
-++++++++++++++++++++++++++++
-
-**We want your help.** No, really.
-
-There may be a little voice inside your head that is telling you that you're
-not ready to be an open source contributor; that your skills aren't nearly good
-enough to contribute.
-What could you possibly offer?
-
-We assure you that the little voice in your head is wrong.
-
-**Being a contributor doesn't just mean writing code**.
-Equally important contributions include:
-writing or proof-reading documentation, suggesting or implementing tests, or
-even giving feedback about the project (including giving feedback about the
-contribution process).
-If you're coming to the project with fresh eyes, you might see the errors and
-assumptions that seasoned contributors have glossed over.
-If you can write any code at all, you can contribute code to open source.
-We are constantly trying out new skills, making mistakes, and learning from
-those mistakes.
-That's how we all improve and we are happy to help others learn.
-
-*This disclaimer was adapted from the*
-`MetPy project <https://github.com/Unidata/MetPy>`__.
-
-
-License
--------
-
-This is free software: you can redistribute it and/or modify it under the terms
-of the `BSD 3-clause License <https://github.com/fatiando/pooch/blob/main/LICENSE.txt>`__.
-A copy of this license is provided with distributions of the software.


=====================================
doc/changes.rst
=====================================
@@ -3,6 +3,64 @@
 Changelog
 =========
 
+Version 1.7.0
+-------------
+
+*Released on: 2023/02/27*
+
+doi:`10.5281/zenodo.7678844 <https://doi.org/10.5281/zenodo.7678844>`__
+
+Bug fixes:
+
+* Make archive extraction always take members into account (`#316 <https://github.com/fatiando/pooch/pull/316>`__)
+* Figshare downloaders fetch the correct version, instead of always the latest one. (`#343 <https://github.com/fatiando/pooch/pull/343>`__)
+
+New features:
+
+* Allow spaces in filenames in registry files (`#315 <https://github.com/fatiando/pooch/pull/315>`__)
+* Refactor ``Pooch.is_available`` to use downloaders (`#322 <https://github.com/fatiando/pooch/pull/322>`__)
+* Add support for downloading files from Dataverse DOIs (`#318 <https://github.com/fatiando/pooch/pull/318>`__)
+* Add a new ``Pooch.load_registry_from_doi`` method that populates the Pooch registry using DOI-based data repositories (`#325 <https://github.com/fatiando/pooch/pull/325>`__)
+* Support urls for Zenodo repositories created through the GitHub integration service, which include slashes in the filename of the main zip files (`#340 <https://github.com/fatiando/pooch/pull/340>`__)
+* Automatically add a trailing slash to ``base_url`` on ``pooch.create`` (`#344 <https://github.com/fatiando/pooch/pull/344>`__)
+
+Maintenance:
+
+* Drop support for Python 3.6 (`#299 <https://github.com/fatiando/pooch/pull/299>`__)
+* Port from deprecated ``appdirs`` to ``platformdirs`` (`#339 <https://github.com/fatiando/pooch/pull/339>`__)
+* Update version of Codecov's Action to v3 (`#345 <https://github.com/fatiando/pooch/pull/345>`__)
+
+Documentation:
+
+* Update sphinx, theme, and sphinx-panels (`#300 <https://github.com/fatiando/pooch/pull/300>`__)
+* Add CITATION.cff for the JOSS article (`#308 <https://github.com/fatiando/pooch/pull/308>`__)
+* Use Markdown for the README (`#311 <https://github.com/fatiando/pooch/pull/311>`__)
+* Improve docstring of `known_hash` in `retrieve` function (`#333 <https://github.com/fatiando/pooch/pull/333>`__)
+* Replace link to Pooch's citation with a BibTeX code snippet (`#335 <https://github.com/fatiando/pooch/pull/335>`__)
+
+Projects that started using Pooch:
+
+* Open AR-Sandbox (`#305 <https://github.com/fatiando/pooch/pull/305>`__)
+* ``climlab`` (`#312 <https://github.com/fatiando/pooch/pull/312>`__)
+* SciPy (`#320 <https://github.com/fatiando/pooch/pull/320>`__)
+* ``napari`` (`#321 <https://github.com/fatiando/pooch/pull/321>`__)
+* ``mne-python`` (`#323 <https://github.com/fatiando/pooch/pull/323>`__)
+
+This release contains contributions from:
+
+* Alex Fikl
+* Anirudh Dagar
+* Björn Ludwig
+* Brian Rose
+* Dominic Kempf
+* Florian Wellmann
+* Gabriel Fu
+* Kyle I S Harrington
+* Leonardo Uieda
+* myd7349
+* Rowan Cockett
+* Santiago Soler
+
 Version 1.6.0
 -------------
 


=====================================
doc/conf.py
=====================================
@@ -47,7 +47,7 @@ panels_css_variables = {
 intersphinx_mapping = {
     "python": ("https://docs.python.org/3/", None),
     "pandas": ("http://pandas.pydata.org/pandas-docs/stable/", None),
-    "requests": ("https://3.python-requests.org/", None),
+    "requests": ("https://requests.readthedocs.io/en/latest/", None),
 }
 
 # Autosummary pages will be generated by sphinx-autogen instead of sphinx-build


=====================================
doc/downloaders.rst
=====================================
@@ -20,8 +20,10 @@ Downloaders are Python *callable objects*  (like functions or classes with a
         '''
         Download a file from the given URL to the given local file.
 
-        The function **must** take as arguments (in order):
+        The function **must** take the following arguments (in order).
 
+        Parameters
+        ----------
         url : str
             The URL to the file you want to download.
         output_file : str or file-like object
@@ -78,3 +80,47 @@ redirected from the original download URL:
         fname = GOODBOY.fetch("some-data.csv", downloader=redirect_downloader)
         data = pandas.read_csv(fname)
         return data
+
+
+Availability checks
+-------------------
+
+**Optionally**, downloaders can take a ``check_only`` keyword argument (default
+to ``False``) that makes them only check if a given file is available for
+download **without** downloading the file.
+This makes a downloader compatible with :meth:`pooch.Pooch.is_available`.
+
+In this case, the downloader should return a boolean:
+
+.. code:: python
+
+    def mydownloader(url, output_file, pooch, check_only=False):
+        '''
+        Download a file from the given URL to the given local file.
+
+        The function **must** take the following arguments (in order).
+
+        Parameters
+        ----------
+        url : str
+            The URL to the file you want to download.
+        output_file : str or file-like object
+            Path (and file name) to which the file will be downloaded.
+        pooch : pooch.Pooch
+            The instance of the Pooch class that is calling this function.
+        check_only : bool
+            If True, will only check if a file exists on the server and
+            **without downloading the file**. Will return ``True`` if the file
+            exists and ``False`` otherwise.
+
+        Returns
+        -------
+        None or availability
+            If ``check_only==True``, returns a boolean indicating if the file
+            is available on the server. Otherwise, returns ``None``.
+        '''
+        ...
+
+If a downloader does not implement an availability check (i.e., doesn't take
+``check_only`` as a keyword argument), then :meth:`pooch.Pooch.is_available`
+will raise a ``NotImplementedError``.


=====================================
doc/install.rst
=====================================
@@ -39,7 +39,7 @@ There are different ways to install Pooch:
 Which Python?
 -------------
 
-You'll need **Python >= 3.6**. See :ref:`python-versions` if you
+You'll need **Python >= 3.7**. See :ref:`python-versions` if you
 require support for older versions.
 
 .. _dependencies:
@@ -53,7 +53,7 @@ manually.
 
 Required:
 
-* `appdirs <https://github.com/ActiveState/appdirs>`__
+* `platformdirs <https://github.com/platformdirs/platformdirs>`__
 * `packaging <https://github.com/pypa/packaging>`__
 * `requests <https://docs.python-requests.org/>`__
 


=====================================
doc/protocols.rst
=====================================
@@ -103,3 +103,24 @@ figshare dataset:
     ``doi:10.6084/m9.figshare.c.4362224.v1``. Attempting to download files
     from a figshare collection will raise an error.
     See `issue #274 <https://github.com/fatiando/pooch/issues/274>`__ details.
+
+Since this type of repositories store information about the files contained in
+them, we can avoid having to manually type the registry with the file names and
+their hashes.
+Instead, we can use the :meth:`pooch.Pooch.load_registry_from_doi` to
+automatically populate the registry:
+
+.. code-block:: python
+
+    POOCH = pooch.create(
+        path=pooch.os_cache("plumbus"),
+        # Use the figshare DOI
+        base_url="doi:10.6084/m9.figshare.14763051.v1/",
+        registry=None,
+    )
+
+    # Automatically populate the registry
+    POOCH.load_registry_from_doi()
+
+    # Fetch one of the files in the repository
+    fname = POOCH.fetch("tiny-data.txt")


=====================================
doc/versions.rst
=====================================
@@ -7,6 +7,7 @@ Use the links below to access documentation for specific versions
 * `Latest release <https://www.fatiando.org/pooch/latest>`__
 * `Development <https://www.fatiando.org/pooch/dev>`__
   (reflects the current development branch on GitHub)
+* `v1.7.0 <https://www.fatiando.org/pooch/v1.7.0>`__
 * `v1.6.0 <https://www.fatiando.org/pooch/v1.6.0>`__
 * `v1.5.2 <https://www.fatiando.org/pooch/v1.5.2>`__
 * `v1.5.1 <https://www.fatiando.org/pooch/v1.5.1>`__


=====================================
env/requirements-docs.txt
=====================================
@@ -1,4 +1,4 @@
 # Documentation requirements
-sphinx==3.5.*
-sphinx-book-theme==0.0.41
-sphinx-panels==0.5.*
+sphinx==4.4.*
+sphinx-book-theme==0.2.*
+sphinx-panels==0.6.*


=====================================
environment.yml
=====================================
@@ -3,12 +3,12 @@ channels:
     - conda-forge
     - defaults
 dependencies:
-    - python==3.9
+    - python==3.10
     - pip
     # Run
     - requests
     - packaging
-    - appdirs
+    - platformdirs
     # Build
     - build
     # Test
@@ -17,9 +17,9 @@ dependencies:
     - pytest-localftpserver
     - coverage
     # Documentation
-    - sphinx==3.5.*
-    - sphinx-book-theme==0.0.41
-    - sphinx-panels==0.5.*
+    - sphinx==4.4.*
+    - sphinx-book-theme==0.2.*
+    - sphinx-panels==0.6.*
     # Style
     - pathspec
     - black>=20.8b1


=====================================
pooch/core.py
=====================================
@@ -11,8 +11,8 @@ import os
 import time
 import contextlib
 from pathlib import Path
+import shlex
 import shutil
-import ftplib
 
 import requests
 import requests.exceptions
@@ -20,7 +20,6 @@ import requests.exceptions
 from .hashes import hash_matches, file_hash
 from .utils import (
     check_version,
-    parse_url,
     get_logger,
     make_local_storage,
     cache_location,
@@ -28,7 +27,7 @@ from .utils import (
     os_cache,
     unique_file_name,
 )
-from .downloaders import choose_downloader
+from .downloaders import DOIDownloader, choose_downloader, doi_to_repository
 
 
 def retrieve(
@@ -77,7 +76,7 @@ def retrieve(
     url : str
         The URL to the file that is to be downloaded. Ideally, the URL should
         end in a file name.
-    known_hash : str
+    known_hash : str or None
         A known hash (checksum) of the file. Will be used to verify the
         download or check if an existing file needs to be updated. By default,
         will assume it's a SHA256 hash. To specify a different hashing method,
@@ -296,8 +295,8 @@ def create(
         Base URL for the remote data source. All requests will be made relative
         to this URL. The string should have a ``{version}`` formatting mark in
         it. We will call ``.format(version=version)`` on this string. If the
-        URL is a directory path, it must end in a ``'/'`` because we will not
-        include it.
+        URL does not end in a ``'/'``, a trailing ``'/'`` will be added
+        automatically.
     version : str or None
         The version string for your project. Should be PEP440 compatible. If
         None is given, will not attempt to format *base_url* and no subfolder
@@ -424,6 +423,8 @@ def create(
     path = cache_location(path, env, version)
     if isinstance(allow_updates, str):
         allow_updates = os.environ.get(allow_updates, "true").lower() != "false"
+    # add trailing "/"
+    base_url = base_url.rstrip("/") + "/"
     pup = Pooch(
         path=path,
         base_url=base_url,
@@ -656,7 +657,7 @@ class Pooch:
                 if line.startswith("#"):
                     continue
 
-                elements = line.split()
+                elements = shlex.split(line)
                 if not len(elements) in [0, 2, 3]:
                     raise OSError(
                         f"Invalid entry in Pooch registry file '{fname}': "
@@ -671,7 +672,37 @@ class Pooch:
                         self.urls[file_name] = file_url
                     self.registry[file_name] = file_checksum.lower()
 
-    def is_available(self, fname):
+    def load_registry_from_doi(self):
+        """
+        Populate the registry using the data repository API
+
+        Fill the registry with all the files available in the data repository,
+        along with their hashes. It will make a request to the data repository
+        API to retrieve this information. No file is downloaded during this
+        process.
+
+        .. important::
+
+            This method is intended to be used only when the ``base_url`` is
+            a DOI.
+        """
+
+        # Ensure that this is indeed a DOI-based pooch
+        downloader = choose_downloader(self.base_url)
+        if not isinstance(downloader, DOIDownloader):
+            raise ValueError(
+                f"Invalid base_url '{self.base_url}': "
+                + "Pooch.load_registry_from_doi is only implemented for DOIs"
+            )
+
+        # Create a repository instance
+        doi = self.base_url.replace("doi:", "")
+        repository = doi_to_repository(doi)
+
+        # Call registry population for this repository
+        return repository.populate_registry(self)
+
+    def is_available(self, fname, downloader=None):
         """
         Check availability of a remote file without downloading it.
 
@@ -682,7 +713,11 @@ class Pooch:
         ----------
         fname : str
             The file name (relative to the *base_url* of the remote data
-            storage) to fetch from the local storage.
+            storage).
+        downloader : None or callable
+            If not None, then a function (or callable object) that will be
+            called to check the availability of the file on the server. See
+            :ref:`downloaders` for details.
 
         Returns
         -------
@@ -691,20 +726,16 @@ class Pooch:
 
         """
         self._assert_file_in_registry(fname)
-        source = self.get_url(fname)
-        parsed_url = parse_url(source)
-        if parsed_url["protocol"] == "ftp":
-            directory, file_name = os.path.split(parsed_url["path"])
-            ftp = ftplib.FTP()
-            ftp.connect(host=parsed_url["netloc"])
-            try:
-                ftp.login()
-                available = file_name in ftp.nlst(directory)
-            finally:
-                ftp.close()
-        else:
-            response = requests.head(source, allow_redirects=True)
-            available = bool(response.status_code == 200)
+        url = self.get_url(fname)
+        if downloader is None:
+            downloader = choose_downloader(url)
+        try:
+            available = downloader(url, None, self, check_only=True)
+        except TypeError as error:
+            error_msg = (
+                f"Downloader '{str(downloader)}' does not support availability checks."
+            )
+            raise NotImplementedError(error_msg) from error
         return available
 
 


=====================================
pooch/downloaders.py
=====================================
@@ -7,9 +7,11 @@
 """
 The classes that actually handle the downloads.
 """
+import os
 import sys
 import ftplib
 
+import warnings
 import requests
 
 from .utils import parse_url
@@ -164,7 +166,7 @@ class HTTPDownloader:  # pylint: disable=too-few-public-methods
         if self.progressbar is True and tqdm is None:
             raise ValueError("Missing package 'tqdm' required for progress bars.")
 
-    def __call__(self, url, output_file, pooch):
+    def __call__(self, url, output_file, pooch, check_only=False):
         """
         Download the given URL over HTTP to the given output file.
 
@@ -178,8 +180,23 @@ class HTTPDownloader:  # pylint: disable=too-few-public-methods
             Path (and file name) to which the file will be downloaded.
         pooch : :class:`~pooch.Pooch`
             The instance of :class:`~pooch.Pooch` that is calling this method.
+        check_only : bool
+            If True, will only check if a file exists on the server and
+            **without downloading the file**. Will return ``True`` if the file
+            exists and ``False`` otherwise.
+
+        Returns
+        -------
+        availability : bool or None
+            If ``check_only==True``, returns a boolean indicating if the file
+            is available on the server. Otherwise, returns ``None``.
 
         """
+        if check_only:
+            response = requests.head(url, allow_redirects=True)
+            available = bool(response.status_code == 200)
+            return available
+
         kwargs = self.kwargs.copy()
         kwargs.setdefault("stream", True)
         ispath = not hasattr(output_file, "write")
@@ -226,6 +243,7 @@ class HTTPDownloader:  # pylint: disable=too-few-public-methods
         finally:
             if ispath:
                 output_file.close()
+        return None
 
 
 class FTPDownloader:  # pylint: disable=too-few-public-methods
@@ -275,7 +293,6 @@ class FTPDownloader:  # pylint: disable=too-few-public-methods
         progressbar=False,
         chunk_size=1024,
     ):
-
         self.port = port
         self.username = username
         self.password = password
@@ -286,7 +303,7 @@ class FTPDownloader:  # pylint: disable=too-few-public-methods
         if self.progressbar is True and tqdm is None:
             raise ValueError("Missing package 'tqdm' required for progress bars.")
 
-    def __call__(self, url, output_file, pooch):
+    def __call__(self, url, output_file, pooch, check_only=False):
         """
         Download the given URL over FTP to the given output file.
 
@@ -298,11 +315,31 @@ class FTPDownloader:  # pylint: disable=too-few-public-methods
             Path (and file name) to which the file will be downloaded.
         pooch : :class:`~pooch.Pooch`
             The instance of :class:`~pooch.Pooch` that is calling this method.
-        """
+        check_only : bool
+            If True, will only check if a file exists on the server and
+            **without downloading the file**. Will return ``True`` if the file
+            exists and ``False`` otherwise.
+
+        Returns
+        -------
+        availability : bool or None
+            If ``check_only==True``, returns a boolean indicating if the file
+            is available on the server. Otherwise, returns ``None``.
 
+        """
         parsed_url = parse_url(url)
         ftp = ftplib.FTP(timeout=self.timeout)
         ftp.connect(host=parsed_url["netloc"], port=self.port)
+
+        if check_only:
+            directory, file_name = os.path.split(parsed_url["path"])
+            try:
+                ftp.login(user=self.username, passwd=self.password, acct=self.account)
+                available = file_name in ftp.nlst(directory)
+            finally:
+                ftp.close()
+            return available
+
         ispath = not hasattr(output_file, "write")
         if ispath:
             output_file = open(output_file, "w+b")
@@ -313,10 +350,9 @@ class FTPDownloader:  # pylint: disable=too-few-public-methods
                 # Make sure the file is set to binary mode, otherwise we can't
                 # get the file size. See: https://stackoverflow.com/a/22093848
                 ftp.voidcmd("TYPE I")
-                size = int(ftp.size(parsed_url["path"]))
                 use_ascii = bool(sys.platform == "win32")
                 progress = tqdm(
-                    total=size,
+                    total=int(ftp.size(parsed_url["path"])),
                     ncols=79,
                     ascii=use_ascii,
                     unit="B",
@@ -337,6 +373,7 @@ class FTPDownloader:  # pylint: disable=too-few-public-methods
             ftp.quit()
             if ispath:
                 output_file.close()
+        return None
 
 
 class SFTPDownloader:  # pylint: disable=too-few-public-methods
@@ -477,6 +514,7 @@ class DOIDownloader:  # pylint: disable=too-few-public-methods
 
     * `figshare <https://www.figshare.com>`__
     * `Zenodo <https://www.zenodo.org>`__
+    * `DataVerse <https://dataverse.org/>`__ instances
 
     .. attention::
 
@@ -555,26 +593,18 @@ class DOIDownloader:  # pylint: disable=too-few-public-methods
             The instance of :class:`~pooch.Pooch` that is calling this method.
 
         """
-        converters = {
-            "figshare.com": figshare_download_url,
-            "zenodo.org": zenodo_download_url,
-        }
+
         parsed_url = parse_url(url)
-        doi = parsed_url["netloc"]
-        archive_url = doi_to_url(doi)
-        repository = parse_url(archive_url)["netloc"]
-        if repository not in converters:
-            raise ValueError(
-                f"Invalid data repository '{repository}'. Must be one of "
-                f"{list(converters.keys())}. "
-                "To request or contribute support for this repository, "
-                "please open an issue at https://github.com/fatiando/pooch/issues"
-            )
-        download_url = converters[repository](
-            archive_url=archive_url,
-            file_name=parsed_url["path"].split("/")[-1],
-            doi=doi,
-        )
+        data_repository = doi_to_repository(parsed_url["netloc"])
+
+        # Resolve the URL
+        file_name = parsed_url["path"]
+        # remove the leading slash in the path
+        if file_name[0] == "/":
+            file_name = file_name[1:]
+        download_url = data_repository.download_url(file_name)
+
+        # Instantiate the downloader object
         downloader = HTTPDownloader(
             progressbar=self.progressbar, chunk_size=self.chunk_size, **self.kwargs
         )
@@ -606,66 +636,430 @@ def doi_to_url(doi):
     return url
 
 
-def zenodo_download_url(archive_url, file_name, doi):
+def doi_to_repository(doi):
     """
-    Use the API to get the download URL for a file given the archive URL.
+    Instantiate a data repository instance from a given DOI.
+
+    This function implements the chain of responsibility dispatch
+    to the correct data repository class.
 
     Parameters
     ----------
-    archive_url : str
-        URL of the dataset in the repository.
-    file_name : str
-        The name of the file in the archive that will be downloaded.
     doi : str
         The DOI of the archive.
 
     Returns
     -------
-    download_url : str
-        The HTTP URL that can be used to download the file.
-
+    data_repository : DataRepository
+        The data repository object
     """
-    article_id = archive_url.split("/")[-1]
-    # With the ID, we can get a list of files and their download links
-    article = requests.get(f"https://zenodo.org/api/records/{article_id}").json()
-    files = {item["key"]: item for item in article["files"]}
-    if file_name not in files:
+
+    # This should go away in a separate issue: DOI handling should
+    # not rely on the (non-)existence of trailing slashes. The issue
+    # is documented in https://github.com/fatiando/pooch/issues/324
+    if doi[-1] == "/":
+        doi = doi[:-1]
+
+    repositories = [
+        FigshareRepository,
+        ZenodoRepository,
+        DataverseRepository,
+    ]
+
+    # Extract the DOI and the repository information
+    archive_url = doi_to_url(doi)
+
+    # Try the converters one by one until one of them returned a URL
+    data_repository = None
+    for repo in repositories:
+        if data_repository is None:
+            data_repository = repo.initialize(
+                archive_url=archive_url,
+                doi=doi,
+            )
+
+    if data_repository is None:
+        repository = parse_url(archive_url)["netloc"]
         raise ValueError(
-            f"File '{file_name}' not found in data archive {archive_url} (doi:{doi})."
+            f"Invalid data repository '{repository}'. "
+            "To request or contribute support for this repository, "
+            "please open an issue at https://github.com/fatiando/pooch/issues"
         )
-    download_url = files[file_name]["links"]["self"]
-    return download_url
 
+    return data_repository
 
-def figshare_download_url(archive_url, file_name, doi):
-    """
-    Use the API to get the download URL for a file given the archive URL.
 
-    Parameters
-    ----------
-    archive_url : str
-        URL of the dataset in the repository.
-    file_name : str
-        The name of the file in the archive that will be downloaded.
-    doi : str
-        The DOI of the archive.
+class DataRepository:  # pylint: disable=too-few-public-methods, missing-class-docstring
+    @classmethod
+    def initialize(cls, doi, archive_url):  # pylint: disable=unused-argument
+        """
+        Initialize the data repository if the given URL points to a
+        corresponding repository.
 
-    Returns
-    -------
-    download_url : str
-        The HTTP URL that can be used to download the file.
+        Initializes a data repository object. This is done as part of
+        a chain of responsibility. If the class cannot handle the given
+        repository URL, it returns `None`. Otherwise a `DataRepository`
+        instance is returned.
+
+        Parameters
+        ----------
+        doi : str
+            The DOI that identifies the repository
+        archive_url : str
+            The resolved URL for the DOI
+        """
+
+        return None  # pragma: no cover
+
+    def download_url(self, file_name):
+        """
+        Use the repository API to get the download URL for a file given
+        the archive URL.
+
+        Parameters
+        ----------
+        file_name : str
+            The name of the file in the archive that will be downloaded.
+
+        Returns
+        -------
+        download_url : str
+            The HTTP URL that can be used to download the file.
+        """
+
+        raise NotImplementedError  # pragma: no cover
+
+    def populate_registry(self, pooch):
+        """
+        Populate the registry using the data repository's API
+
+        Parameters
+        ----------
+        pooch : Pooch
+            The pooch instance that the registry will be added to.
+        """
+
+        raise NotImplementedError  # pragma: no cover
+
+
+class ZenodoRepository(DataRepository):  # pylint: disable=missing-class-docstring
+    def __init__(self, doi, archive_url):
+        self.archive_url = archive_url
+        self.doi = doi
+        self._api_response = None
+
+    @classmethod
+    def initialize(cls, doi, archive_url):
+        """
+        Initialize the data repository if the given URL points to a
+        corresponding repository.
+
+        Initializes a data repository object. This is done as part of
+        a chain of responsibility. If the class cannot handle the given
+        repository URL, it returns `None`. Otherwise a `DataRepository`
+        instance is returned.
+
+        Parameters
+        ----------
+        doi : str
+            The DOI that identifies the repository
+        archive_url : str
+            The resolved URL for the DOI
+        """
+
+        # Check whether this is a Zenodo URL
+        parsed_archive_url = parse_url(archive_url)
+        if parsed_archive_url["netloc"] != "zenodo.org":
+            return None
+
+        return cls(doi, archive_url)
+
+    @property
+    def api_response(self):
+        """Cached API response from Zenodo"""
+
+        if self._api_response is None:
+            article_id = self.archive_url.split("/")[-1]
+            self._api_response = requests.get(
+                f"https://zenodo.org/api/records/{article_id}"
+            ).json()
+
+        return self._api_response
+
+    def download_url(self, file_name):
+        """
+        Use the repository API to get the download URL for a file given
+        the archive URL.
+
+        Parameters
+        ----------
+        file_name : str
+            The name of the file in the archive that will be downloaded.
+
+        Returns
+        -------
+        download_url : str
+            The HTTP URL that can be used to download the file.
+        """
+
+        files = {item["key"]: item for item in self.api_response["files"]}
+        if file_name not in files:
+            raise ValueError(
+                f"File '{file_name}' not found in data archive {self.archive_url} (doi:{self.doi})."
+            )
+        download_url = files[file_name]["links"]["self"]
+        return download_url
+
+    def populate_registry(self, pooch):
+        """
+        Populate the registry using the data repository's API
+
+        Parameters
+        ----------
+        pooch : Pooch
+            The pooch instance that the registry will be added to.
+        """
+
+        for filedata in self.api_response["files"]:
+            pooch.registry[filedata["key"]] = filedata["checksum"]
+
+
+class FigshareRepository(DataRepository):  # pylint: disable=missing-class-docstring
+    def __init__(self, doi, archive_url):
+        self.archive_url = archive_url
+        self.doi = doi
+        self._api_response = None
+
+    @classmethod
+    def initialize(cls, doi, archive_url):
+        """
+        Initialize the data repository if the given URL points to a
+        corresponding repository.
+
+        Initializes a data repository object. This is done as part of
+        a chain of responsibility. If the class cannot handle the given
+        repository URL, it returns `None`. Otherwise a `DataRepository`
+        instance is returned.
+
+        Parameters
+        ----------
+        doi : str
+            The DOI that identifies the repository
+        archive_url : str
+            The resolved URL for the DOI
+        """
+
+        # Check whether this is a Figshare URL
+        parsed_archive_url = parse_url(archive_url)
+        if parsed_archive_url["netloc"] != "figshare.com":
+            return None
+
+        return cls(doi, archive_url)
+
+    def _parse_version_from_doi(self):
+        """
+        Parse version from the doi
+
+        Return None if version is not available in the doi.
+        """
+        # Get suffix of the doi
+        _, suffix = self.doi.split("/")
+        # Split the suffix by dots and keep the last part
+        last_part = suffix.split(".")[-1]
+        # Parse the version from the last part
+        if last_part[0] != "v":
+            return None
+        version = int(last_part[1:])
+        return version
+
+    @property
+    def api_response(self):
+        """Cached API response from Figshare"""
+
+        if self._api_response is None:
+            # Use the figshare API to find the article ID from the DOI
+            article = requests.get(
+                f"https://api.figshare.com/v2/articles?doi={self.doi}"
+            ).json()[0]
+            article_id = article["id"]
+            # Parse desired version from the doi
+            version = self._parse_version_from_doi()
+            # With the ID and version, we can get a list of files and their
+            # download links
+            if version is None:
+                # Figshare returns the latest version available when no version
+                # is specified through the DOI.
+                warnings.warn(
+                    f"The Figshare DOI '{self.doi}' doesn't specify which version of "
+                    "the repository should be used. "
+                    "Figshare will point to the latest version available.",
+                    UserWarning,
+                )
+                # Define API url using only the article id
+                # (figshare will resolve the latest version)
+                api_url = f"https://api.figshare.com/v2/articles/{article_id}"
+            else:
+                # Define API url using article id and the desired version
+                # Get list of files using article id and the version
+                api_url = (
+                    "https://api.figshare.com/v2/articles/"
+                    f"{article_id}/versions/{version}"
+                )
+            # Make the request and return the files in the figshare repository
+            response = requests.get(api_url)
+            response.raise_for_status()
+            self._api_response = response.json()["files"]
+
+        return self._api_response
+
+    def download_url(self, file_name):
+        """
+        Use the repository API to get the download URL for a file given
+        the archive URL.
+
+        Parameters
+        ----------
+        file_name : str
+            The name of the file in the archive that will be downloaded.
+
+        Returns
+        -------
+        download_url : str
+            The HTTP URL that can be used to download the file.
+        """
+
+        files = {item["name"]: item for item in self.api_response}
+        if file_name not in files:
+            raise ValueError(
+                f"File '{file_name}' not found in data archive {self.archive_url} (doi:{self.doi})."
+            )
+        download_url = files[file_name]["download_url"]
+        return download_url
+
+    def populate_registry(self, pooch):
+        """
+        Populate the registry using the data repository's API
+
+        Parameters
+        ----------
+        pooch : Pooch
+            The pooch instance that the registry will be added to.
+        """
+
+        for filedata in self.api_response:
+            pooch.registry[filedata["name"]] = f"md5:{filedata['computed_md5']}"
+
+
+class DataverseRepository(DataRepository):  # pylint: disable=missing-class-docstring
+    def __init__(self, doi, archive_url):
+        self.archive_url = archive_url
+        self.doi = doi
+        self._api_response = None
+
+    @classmethod
+    def initialize(cls, doi, archive_url):
+        """
+        Initialize the data repository if the given URL points to a
+        corresponding repository.
+
+        Initializes a data repository object. This is done as part of
+        a chain of responsibility. If the class cannot handle the given
+        repository URL, it returns `None`. Otherwise a `DataRepository`
+        instance is returned.
+
+        Parameters
+        ----------
+        doi : str
+            The DOI that identifies the repository
+        archive_url : str
+            The resolved URL for the DOI
+        """
+
+        # Access the DOI as if this was a DataVerse instance
+        response = cls._get_api_response(doi, archive_url)
+
+        # If we failed, this is probably not a DataVerse instance
+        if 400 <= response.status_code < 600:
+            return None
+
+        # Initialize the repository and overwrite the api response
+        repository = cls(doi, archive_url)
+        repository.api_response = response
+        return repository
+
+    @classmethod
+    def _get_api_response(cls, doi, archive_url):
+        """
+        Perform the actual API request
+
+        This has been separated into a separate ``classmethod``, as it can be
+        used prior and after the initialization.
+        """
+        parsed = parse_url(archive_url)
+        response = requests.get(
+            f"{parsed['protocol']}://{parsed['netloc']}/api/datasets/"
+            f":persistentId?persistentId=doi:{doi}"
+        )
+        return response
+
+    @property
+    def api_response(self):
+        """Cached API response from a DataVerse instance"""
+
+        if self._api_response is None:
+            self._api_response = self._get_api_response(
+                self.doi, self.archive_url
+            )  # pragma: no cover
+
+        return self._api_response
+
+    @api_response.setter
+    def api_response(self, response):
+        """Update the cached API response"""
+
+        self._api_response = response
+
+    def download_url(self, file_name):
+        """
+        Use the repository API to get the download URL for a file given
+        the archive URL.
+
+        Parameters
+        ----------
+        file_name : str
+            The name of the file in the archive that will be downloaded.
+
+        Returns
+        -------
+        download_url : str
+            The HTTP URL that can be used to download the file.
+        """
+
+        parsed = parse_url(self.archive_url)
+
+        # Iterate over the given files until we find one of the requested name
+        for filedata in self.api_response.json()["data"]["latestVersion"]["files"]:
+            if file_name == filedata["dataFile"]["filename"]:
+                return (
+                    f"{parsed['protocol']}://{parsed['netloc']}/api/access/datafile/"
+                    f":persistentId?persistentId={filedata['dataFile']['persistentId']}"
+                )
 
-    """
-    # Use the figshare API to find the article ID from the DOI
-    article = requests.get(f"https://api.figshare.com/v2/articles?doi={doi}").json()[0]
-    article_id = article["id"]
-    # With the ID, we can get a list of files and their download links
-    response = requests.get(f"https://api.figshare.com/v2/articles/{article_id}/files")
-    response.raise_for_status()
-    files = {item["name"]: item for item in response.json()}
-    if file_name not in files:
         raise ValueError(
-            f"File '{file_name}' not found in data archive {archive_url} (doi:{doi})."
+            f"File '{file_name}' not found in data archive {self.archive_url} (doi:{self.doi})."
         )
-    download_url = files[file_name]["download_url"]
-    return download_url
+
+    def populate_registry(self, pooch):
+        """
+        Populate the registry using the data repository's API
+
+        Parameters
+        ----------
+        pooch : Pooch
+            The pooch instance that the registry will be added to.
+        """
+
+        for filedata in self.api_response.json()["data"]["latestVersion"]["files"]:
+            pooch.registry[
+                filedata["dataFile"]["filename"]
+            ] = f"md5:{filedata['dataFile']['md5']}"


=====================================
pooch/hashes.py
=====================================
@@ -208,11 +208,9 @@ def make_registry(directory, output, recursive=True):
         pattern = "*"
 
     files = sorted(
-        [
-            str(path.relative_to(directory))
-            for path in directory.glob(pattern)
-            if path.is_file()
-        ]
+        str(path.relative_to(directory))
+        for path in directory.glob(pattern)
+        if path.is_file()
     )
 
     hashes = [file_hash(str(directory / fname)) for fname in files]


=====================================
pooch/processors.py
=====================================
@@ -79,17 +79,36 @@ class ExtractorProcessor:  # pylint: disable=too-few-public-methods
         else:
             archive_dir = fname.rsplit(os.path.sep, maxsplit=1)[0]
             self.extract_dir = os.path.join(archive_dir, self.extract_dir)
-        if action in ("update", "download") or not os.path.exists(self.extract_dir):
+        if (
+            (action in ("update", "download"))
+            or (not os.path.exists(self.extract_dir))
+            or (
+                (self.members is not None)
+                and (
+                    not all(
+                        os.path.exists(os.path.join(self.extract_dir, m))
+                        for m in self.members
+                    )
+                )
+            )
+        ):
             # Make sure that the folder with the extracted files exists
             os.makedirs(self.extract_dir, exist_ok=True)
             self._extract_file(fname, self.extract_dir)
+
         # Get a list of all file names (including subdirectories) in our folder
-        # of unzipped files.
-        fnames = [
-            os.path.join(path, fname)
-            for path, _, files in os.walk(self.extract_dir)
-            for fname in files
-        ]
+        # of unzipped files, filtered by the given members list
+        fnames = []
+        for path, _, files in os.walk(self.extract_dir):
+            for filename in files:
+                relpath = os.path.normpath(
+                    os.path.join(os.path.relpath(path, self.extract_dir), filename)
+                )
+                if self.members is None or any(
+                    relpath.startswith(os.path.normpath(m)) for m in self.members
+                ):
+                    fnames.append(os.path.join(path, filename))
+
         return fnames
 
     def _extract_file(self, fname, extract_dir):
@@ -153,7 +172,9 @@ class Unzip(ExtractorProcessor):  # pylint: disable=too-few-public-methods
                     # Based on:
                     # https://stackoverflow.com/questions/8008829/extract-only-a-single-directory-from-tar
                     subdir_members = [
-                        name for name in zip_file.namelist() if name.startswith(member)
+                        name
+                        for name in zip_file.namelist()
+                        if os.path.normpath(name).startswith(os.path.normpath(member))
                     ]
                     # Extract the data file from within the archive
                     zip_file.extractall(members=subdir_members, path=extract_dir)
@@ -216,7 +237,9 @@ class Untar(ExtractorProcessor):  # pylint: disable=too-few-public-methods
                     subdir_members = [
                         info
                         for info in tar_file.getmembers()
-                        if info.name.startswith(member)
+                        if os.path.normpath(info.name).startswith(
+                            os.path.normpath(member)
+                        )
                     ]
                     # Extract the data file from within the archive
                     tar_file.extractall(members=subdir_members, path=extract_dir)


=====================================
pooch/tests/data/registry-spaces.txt
=====================================
@@ -0,0 +1,2 @@
+"file with spaces.txt" baee0894dba14b12085eacb204284b97e362f4f3e5a5807693cc90ef415c1b2d
+other\ with\ spaces.txt baee0894dba14b12085eacb204284b97e362f4f3e5a5807693cc90ef415c1b2d


=====================================
pooch/tests/test_core.py
=====================================
@@ -21,12 +21,15 @@ from ..hashes import file_hash, hash_matches
 
 # Import the core module so that we can monkeypatch some functions
 from .. import core
-from ..downloaders import HTTPDownloader
+from ..downloaders import HTTPDownloader, FTPDownloader
 
 from .utils import (
     pooch_test_url,
+    data_over_ftp,
     pooch_test_figshare_url,
     pooch_test_zenodo_url,
+    pooch_test_zenodo_with_slash_url,
+    pooch_test_dataverse_url,
     pooch_test_registry,
     check_tiny_data,
     check_large_data,
@@ -39,6 +42,8 @@ REGISTRY = pooch_test_registry()
 BASEURL = pooch_test_url()
 FIGSHAREURL = pooch_test_figshare_url()
 ZENODOURL = pooch_test_zenodo_url()
+ZENODOURL_W_SLASH = pooch_test_zenodo_with_slash_url()
+DATAVERSEURL = pooch_test_dataverse_url()
 REGISTRY_CORRUPTED = {
     # The same data file but I changed the hash manually to a wrong one
     "tiny-data.txt": "098h0894dba14b12085eacb204284b97e362f4f3e5a5807693cc90ef415c1b2d"
@@ -134,7 +139,9 @@ def test_pooch_local(data_dir_mirror):
 
 @pytest.mark.network
 @pytest.mark.parametrize(
-    "url", [BASEURL, FIGSHAREURL, ZENODOURL], ids=["https", "figshare", "zenodo"]
+    "url",
+    [BASEURL, FIGSHAREURL, ZENODOURL, DATAVERSEURL],
+    ids=["https", "figshare", "zenodo", "dataverse"],
 )
 def test_pooch_custom_url(url):
     "Have pooch download the file from URL that is not base_url"
@@ -158,7 +165,9 @@ def test_pooch_custom_url(url):
 
 @pytest.mark.network
 @pytest.mark.parametrize(
-    "url", [BASEURL, FIGSHAREURL, ZENODOURL], ids=["https", "figshare", "zenodo"]
+    "url",
+    [BASEURL, FIGSHAREURL, ZENODOURL, DATAVERSEURL],
+    ids=["https", "figshare", "zenodo", "dataverse"],
 )
 def test_pooch_download(url):
     "Setup a pooch that has no local data and needs to download"
@@ -373,6 +382,15 @@ def test_pooch_update_disallowed_environment():
         os.environ.pop(variable_name)
 
 
+def test_pooch_create_base_url_no_trailing_slash():
+    """
+    Test if pooch.create appends a trailing slash to the base url if missing
+    """
+    base_url = "https://mybase.url"
+    pup = create(base_url=base_url, registry=None, path=DATA_DIR)
+    assert pup.base_url == base_url + "/"
+
+
 @pytest.mark.network
 def test_pooch_corrupted(data_dir_mirror):
     "Raise an exception if the file hash doesn't match the registry"
@@ -457,6 +475,14 @@ def test_pooch_load_registry_invalid_line():
         pup.load_registry(os.path.join(DATA_DIR, "registry-invalid.txt"))
 
 
+def test_pooch_load_registry_with_spaces():
+    "Should check that spaces in filenames are allowed in registry files"
+    pup = Pooch(path="", base_url="")
+    pup.load_registry(os.path.join(DATA_DIR, "registry-spaces.txt"))
+    assert "file with spaces.txt" in pup.registry
+    assert "other with spaces.txt" in pup.registry
+
+
 @pytest.mark.network
 def test_check_availability():
     "Should correctly check availability of existing and non existing files"
@@ -473,21 +499,38 @@ def test_check_availability():
     assert not pup.is_available("not-a-real-data-file.txt")
 
 
- at pytest.mark.network
-def test_check_availability_on_ftp():
+def test_check_availability_on_ftp(ftpserver):
     "Should correctly check availability of existing and non existing files"
-    # Check available remote file on FTP server
-    pup = Pooch(
-        path=DATA_DIR,
-        base_url="ftp://data-out.unavco.org/pub/products/velocity/rel_201712/",
-        registry={
-            "pbo.final_igs08.20171202.vel": "md5:0b75d4049dedd0e179615f4b5e956156",
-            "doesnot_exist.zip": "jdjdjdjdflld",
-        },
-    )
-    assert pup.is_available("pbo.final_igs08.20171202.vel")
-    # Check non available remote file
-    assert not pup.is_available("doesnot_exist.zip")
+    with data_over_ftp(ftpserver, "tiny-data.txt") as url:
+        # Check available remote file on FTP server
+        pup = Pooch(
+            path=DATA_DIR,
+            base_url=url.replace("tiny-data.txt", ""),
+            registry={
+                "tiny-data.txt": "baee0894dba14b12085eacb204284b97e362f4f3e5a5807693cc90ef415c1b2d",
+                "doesnot_exist.zip": "jdjdjdjdflld",
+            },
+        )
+        downloader = FTPDownloader(port=ftpserver.server_port)
+        assert pup.is_available("tiny-data.txt", downloader=downloader)
+        # Check non available remote file
+        assert not pup.is_available("doesnot_exist.zip", downloader=downloader)
+
+
+def test_check_availability_invalid_downloader():
+    "Should raise an exception if the downloader doesn't support this"
+
+    def downloader(url, output, pooch):  # pylint: disable=unused-argument
+        "A downloader that doesn't support check_only"
+        return None
+
+    pup = Pooch(path=DATA_DIR, base_url=BASEURL, registry=REGISTRY)
+    # First check that everything works without the custom downloader
+    assert pup.is_available("tiny-data.txt")
+    # Now use the bad one
+    with pytest.raises(NotImplementedError) as error:
+        pup.is_available("tiny-data.txt", downloader=downloader)
+        assert "does not support availability checks" in str(error)
 
 
 @pytest.mark.network
@@ -582,3 +625,56 @@ def test_stream_download(fname):
         stream_download(url, destination, known_hash, downloader, pooch=None)
         assert destination.exists()
         check_tiny_data(str(destination))
+
+
+ at pytest.mark.parametrize(
+    "url",
+    [FIGSHAREURL, ZENODOURL, DATAVERSEURL],
+    ids=["figshare", "zenodo", "dataverse"],
+)
+def test_load_registry_from_doi(url):
+    """Check that the registry is correctly populated from the API"""
+    with TemporaryDirectory() as local_store:
+        path = os.path.abspath(local_store)
+        pup = Pooch(path=path, base_url=url)
+        pup.load_registry_from_doi()
+
+        # Check the existence of all files in the registry
+        assert len(pup.registry) == 2
+        assert "tiny-data.txt" in pup.registry
+        assert "store.zip" in pup.registry
+
+        # Ensure that all files have correct checksums by fetching them
+        for filename in pup.registry:
+            pup.fetch(filename)
+
+
+def test_load_registry_from_doi_zenodo_with_slash():
+    """
+    Check that the registry is correctly populated from the Zenodo API when
+    the filename contains a slash
+    """
+    url = ZENODOURL_W_SLASH
+    with TemporaryDirectory() as local_store:
+        path = os.path.abspath(local_store)
+        pup = Pooch(path=path, base_url=url)
+        pup.load_registry_from_doi()
+
+        # Check the existence of all files in the registry
+        assert len(pup.registry) == 1
+        assert "santisoler/pooch-test-data-v1.zip" in pup.registry
+
+        # Ensure that all files have correct checksums by fetching them
+        for filename in pup.registry:
+            pup.fetch(filename)
+
+
+def test_wrong_load_registry_from_doi():
+    """Check that non-DOI URLs produce an error"""
+
+    pup = Pooch(path="", base_url=BASEURL)
+
+    with pytest.raises(ValueError) as exc:
+        pup.load_registry_from_doi()
+
+    assert "only implemented for DOIs" in str(exc.value)


=====================================
pooch/tests/test_downloaders.py
=====================================
@@ -29,10 +29,12 @@ from ..downloaders import (
     SFTPDownloader,
     DOIDownloader,
     choose_downloader,
-    figshare_download_url,
-    zenodo_download_url,
+    FigshareRepository,
+    ZenodoRepository,
+    DataverseRepository,
     doi_to_url,
 )
+from ..processors import Unzip
 from .utils import (
     pooch_test_url,
     check_large_data,
@@ -40,12 +42,16 @@ from .utils import (
     data_over_ftp,
     pooch_test_figshare_url,
     pooch_test_zenodo_url,
+    pooch_test_zenodo_with_slash_url,
+    pooch_test_dataverse_url,
 )
 
 
 BASEURL = pooch_test_url()
 FIGSHAREURL = pooch_test_figshare_url()
 ZENODOURL = pooch_test_zenodo_url()
+ZENODOURL_W_SLASH = pooch_test_zenodo_with_slash_url()
+DATAVERSEURL = pooch_test_dataverse_url()
 
 
 @pytest.mark.skipif(tqdm is None, reason="requires tqdm")
@@ -97,22 +103,28 @@ def test_doi_url_not_found():
 
 
 @pytest.mark.parametrize(
-    "converter,doi",
+    "repository,doi",
     [
-        (figshare_download_url, "10.6084/m9.figshare.14763051.v1"),
-        (zenodo_download_url, "10.5281/zenodo.4924875"),
+        (FigshareRepository, "10.6084/m9.figshare.14763051.v1"),
+        (ZenodoRepository, "10.5281/zenodo.4924875"),
+        (DataverseRepository, "10.11588/data/TKCFEF"),
     ],
-    ids=["figshare", "zenodo"],
+    ids=["figshare", "zenodo", "dataverse"],
 )
-def test_figshare_url_file_not_found(converter, doi):
+def test_figshare_url_file_not_found(repository, doi):
     "Should fail if the file is not found in the archive"
     with pytest.raises(ValueError) as exc:
         url = doi_to_url(doi)
-        converter(archive_url=url, file_name="bla.txt", doi=doi)
+        repo = repository.initialize(doi, url)
+        repo.download_url(file_name="bla.txt")
     assert "File 'bla.txt' not found" in str(exc.value)
 
 
- at pytest.mark.parametrize("url", [FIGSHAREURL, ZENODOURL], ids=["figshare", "zenodo"])
+ at pytest.mark.parametrize(
+    "url",
+    [FIGSHAREURL, ZENODOURL, DATAVERSEURL],
+    ids=["figshare", "zenodo", "dataverse"],
+)
 def test_doi_downloader(url):
     "Test the DOI downloader"
     # Use the test data we have on the repository
@@ -123,6 +135,75 @@ def test_doi_downloader(url):
         check_tiny_data(outfile)
 
 
+ at pytest.mark.network
+def test_zenodo_downloader_with_slash_in_fname():
+    """
+    Test the Zenodo downloader when the path contains a forward slash
+
+    Related to issue #336
+    """
+    # Use the test data we have on the repository
+    with TemporaryDirectory() as local_store:
+        base_url = ZENODOURL_W_SLASH + "santisoler/pooch-test-data-v1.zip"
+        downloader = DOIDownloader()
+        outfile = os.path.join(local_store, "test-data.zip")
+        downloader(base_url, outfile, None)
+        # unpack the downloaded zip file so we can check the integrity of
+        # tiny-data.txt
+        fnames = Unzip()(outfile, action="download", pooch=None)
+        (fname,) = [f for f in fnames if "tiny-data.txt" in f]
+        check_tiny_data(fname)
+
+
+ at pytest.mark.network
+def test_figshare_unspecified_version():
+    """
+    Test if passing a Figshare url without a version warns about it, but still
+    downloads it.
+    """
+    url = FIGSHAREURL
+    # Remove the last bits of the doi, where the version is specified and
+    url = url[: url.rindex(".")] + "/"
+    # Create expected warning message
+    doi = url[4:-1]
+    warning_msg = f"The Figshare DOI '{doi}' doesn't specify which version of "
+    with TemporaryDirectory() as local_store:
+        downloader = DOIDownloader()
+        outfile = os.path.join(local_store, "tiny-data.txt")
+        with pytest.warns(UserWarning, match=warning_msg):
+            downloader(url + "tiny-data.txt", outfile, None)
+
+
+ at pytest.mark.network
+ at pytest.mark.parametrize(
+    "version, missing, present",
+    [
+        (
+            1,
+            "LC08_L2SP_218074_20190114_20200829_02_T1-cropped.tar.gz",
+            "cropped-before.tar.gz",
+        ),
+        (
+            2,
+            "cropped-before.tar.gz",
+            "LC08_L2SP_218074_20190114_20200829_02_T1-cropped.tar.gz",
+        ),
+    ],
+)
+def test_figshare_data_repository_versions(version, missing, present):
+    """
+    Test if setting the version in Figshare DOI works as expected
+    """
+    # Use a Figshare repo as example (we won't download files from it since
+    # they are too big)
+    doi = f"10.6084/m9.figshare.21665630.v{version}"
+    url = f"https://doi.org/{doi}/"
+    figshare = FigshareRepository(doi, url)
+    filenames = [item["name"] for item in figshare.api_response]
+    assert present in filenames
+    assert missing not in filenames
+
+
 @pytest.mark.network
 def test_ftp_downloader(ftpserver):
     "Test ftp downloader"


=====================================
pooch/tests/test_processors.py
=====================================
@@ -169,6 +169,57 @@ def test_unpacking(processor_class, extension, target_path, archive, members):
             check_tiny_data(fname)
 
 
+ at pytest.mark.network
+ at pytest.mark.parametrize(
+    "processor_class,extension",
+    [(Unzip, ".zip"), (Untar, ".tar.gz")],
+)
+def test_multiple_unpacking(processor_class, extension):
+    "Test that multiple subsequent calls to a processor yield correct results"
+
+    with TemporaryDirectory() as local_store:
+        pup = Pooch(path=Path(local_store), base_url=BASEURL, registry=REGISTRY)
+
+        # Do a first fetch with the one member only
+        processor1 = processor_class(members=["store/tiny-data.txt"])
+        filenames1 = pup.fetch("store" + extension, processor=processor1)
+        assert len(filenames1) == 1
+        check_tiny_data(filenames1[0])
+
+        # Do a second fetch with the other member
+        processor2 = processor_class(
+            members=["store/tiny-data.txt", "store/subdir/tiny-data.txt"]
+        )
+        filenames2 = pup.fetch("store" + extension, processor=processor2)
+        assert len(filenames2) == 2
+        check_tiny_data(filenames2[0])
+        check_tiny_data(filenames2[1])
+
+        # Do a third fetch, again with one member and assert
+        # that only this member was returned
+        filenames3 = pup.fetch("store" + extension, processor=processor1)
+        assert len(filenames3) == 1
+        check_tiny_data(filenames3[0])
+
+
+ at pytest.mark.network
+ at pytest.mark.parametrize(
+    "processor_class,extension",
+    [(Unzip, ".zip"), (Untar, ".tar.gz")],
+)
+def test_unpack_members_with_leading_dot(processor_class, extension):
+    "Test that unpack members can also be specifed both with a leading ./"
+
+    with TemporaryDirectory() as local_store:
+        pup = Pooch(path=Path(local_store), base_url=BASEURL, registry=REGISTRY)
+
+        # Do a first fetch with the one member only
+        processor1 = processor_class(members=["./store/tiny-data.txt"])
+        filenames1 = pup.fetch("store" + extension, processor=processor1)
+        assert len(filenames1) == 1
+        check_tiny_data(filenames1[0])
+
+
 def _check_logs(log_file, expected_lines):
     """
     Assert that the lines in the log match the expected ones.


=====================================
pooch/tests/test_utils.py
=====================================
@@ -141,8 +141,16 @@ def test_local_storage_newfile_permissionerror(monkeypatch):
                 "path": "/dike.json",
             },
         ),
+        (
+            r"doi:10.5281/zenodo.7632643/santisoler/pooch-test-data-v1.zip",
+            {
+                "protocol": "doi",
+                "netloc": "10.5281/zenodo.7632643",
+                "path": "/santisoler/pooch-test-data-v1.zip",
+            },
+        ),
     ],
-    ids=["http", "ftp", "doi"],
+    ids=["http", "ftp", "doi", "zenodo-doi-with-slash"],
 )
 def test_parse_url(url, output):
     "Parse URL into 3 components"


=====================================
pooch/tests/utils.py
=====================================
@@ -98,6 +98,37 @@ def pooch_test_zenodo_url():
     return url
 
 
+def pooch_test_zenodo_with_slash_url():
+    """
+    Get base URL for test data in Zenodo, where the file name contains a slash
+
+    The URL contains the DOI for the Zenodo dataset that has a slash in the
+    filename (created with the GitHub-Zenodo integration service), using the
+    appropriate version for this version of Pooch.
+
+    Returns
+    -------
+    url
+        The URL for pooch's test data.
+
+    """
+    url = "doi:10.5281/zenodo.7632643/"
+    return url
+
+
+def pooch_test_dataverse_url():
+    """
+    Get the base URL for the test data stored on a DataVerse instance.
+
+    Returns
+    -------
+    url
+        The URL for pooch's test data.
+    """
+    url = "doi:10.11588/data/TKCFEF/"
+    return url
+
+
 def pooch_test_registry():
     """
     Get a registry for the test data used in Pooch itself.


=====================================
pooch/utils.py
=====================================
@@ -16,7 +16,7 @@ from urllib.parse import urlsplit
 from contextlib import contextmanager
 import warnings
 
-import appdirs
+import platformdirs
 from packaging.version import Version
 
 
@@ -74,10 +74,10 @@ def os_cache(project):
     r"""
     Default cache location based on the operating system.
 
-    The folder locations are defined by the ``appdirs``  package
+    The folder locations are defined by the ``platformdirs``  package
     using the ``user_cache_dir`` function.
     Usually, the locations will be following (see the
-    `appdirs documentation <https://github.com/ActiveState/appdirs>`__):
+    `platformdirs documentation <https://platformdirs.readthedocs.io>`__):
 
     * Mac: ``~/Library/Caches/<AppName>``
     * Unix: ``~/.cache/<AppName>`` or the value of the ``XDG_CACHE_HOME``
@@ -96,7 +96,7 @@ def os_cache(project):
         not expanded.
 
     """
-    return Path(appdirs.user_cache_dir(project))
+    return Path(platformdirs.user_cache_dir(project))
 
 
 def check_version(version, fallback="master"):
@@ -158,7 +158,12 @@ def parse_url(url):
     * doi:10.6084/m9.figshare.923450.v1/test.nc
 
     The DOI is a special case. The protocol will be "doi", the netloc will be
-    the DOI, and the path is what comes after the second "/".
+    the DOI, and the path is what comes after the last "/".
+    The only exception are Zenodo dois: the protocol will be "doi", the netloc
+    will be composed by the "prefix/suffix" and the path is what comes after
+    the second "/". This allows to support special cases of Zenodo dois where
+    the path contains forward slashes "/", created by the GitHub-Zenodo
+    integration service.
 
     Parameters
     ----------
@@ -179,8 +184,12 @@ def parse_url(url):
     if url.startswith("doi:"):
         protocol = "doi"
         parts = url[4:].split("/")
-        netloc = "/".join(parts[:2])
-        path = "/" + "/".join(parts[2:])
+        if "zenodo" in parts[1].lower():
+            netloc = "/".join(parts[:2])
+            path = "/" + "/".join(parts[2:])
+        else:
+            netloc = "/".join(parts[:-1])
+            path = "/" + parts[-1]
     else:
         parsed_url = urlsplit(url)
         protocol = parsed_url.scheme or "file"


=====================================
setup.cfg
=====================================
@@ -2,8 +2,8 @@
 name = pooch
 fullname = Pooch
 description = "Pooch manages your Python library's sample data files: it automatically downloads and stores them in a local directory, with support for versioning and corruption checks."
-long_description = file: README.rst
-long_description_content_type = text/x-rst
+long_description = file: README.md
+long_description_content_type = text/markdown
 author = The Pooch Developers
 author_email = fatiandoaterra at protonmail.com
 maintainer = "Leonardo Uieda"
@@ -23,7 +23,6 @@ classifiers =
     Topic :: Scientific/Engineering
     Topic :: Software Development :: Libraries
     Programming Language :: Python :: 3 :: Only
-    Programming Language :: Python :: 3.6
     Programming Language :: Python :: 3.7
     Programming Language :: Python :: 3.8
     Programming Language :: Python :: 3.9
@@ -39,10 +38,10 @@ project_urls =
 zip_safe = True
 include_package_data = True
 packages = find:
-python_requires = >=3.6
+python_requires = >=3.7
 setup_requires =
 install_requires =
-    appdirs>=1.3.0
+    platformdirs>=2.5.0
     packaging>=20.0
     requests>=2.19.0
 



View it on GitLab: https://salsa.debian.org/debian-gis-team/pooch/-/commit/178bd166be85589bfd1a90d518c5136b3ac184d3

-- 
View it on GitLab: https://salsa.debian.org/debian-gis-team/pooch/-/commit/178bd166be85589bfd1a90d518c5136b3ac184d3
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/pkg-grass-devel/attachments/20230611/f95a4268/attachment-0001.htm>


More information about the Pkg-grass-devel mailing list