[med-svn] [Git][med-team/biomaj3-download][upstream] New upstream version 3.2.4

Sun Jan 17 08:53:36 GMT 2021


Andreas Tille pushed to branch upstream at Debian Med / biomaj3-download


Commits:
a2b7eff2 by Andreas Tille at 2021-01-17T09:42:17+01:00
New upstream version 3.2.4
- - - - -


17 changed files:

- .travis.yml
- CHANGES.txt
- README.md
- bin/biomaj_download_consumer.py
- biomaj_download/biomaj_download_web.py
- biomaj_download/download/curl.py
- biomaj_download/download/direct.py
- biomaj_download/download/interface.py
- biomaj_download/download/localcopy.py
- biomaj_download/download/protocolirods.py
- biomaj_download/download/rsync.py
- biomaj_download/downloadclient.py
- biomaj_download/downloadservice.py
- requirements.txt
- setup.py
- tests/biomaj_tests.py
- tests/testhttp.properties


Changes:

=====================================
.travis.yml
=====================================
@@ -1,7 +1,9 @@
+arch:
+- amd64
+- ppc64le
 language: python
 sudo: false
 python:
-- '2.7'
 - '3.6'
 - '3.7'
 - '3.8'


=====================================
CHANGES.txt
=====================================
@@ -1,40 +1,80 @@
+3.2.4:
+  #39 directhttp download issues
+    biomaj sends file name instead of file dict, workaround this
+  #28 CurlDownload crashses if cURL doesn't support SFTP
+  Minor python doc and error message updates
+  Suppress yaml warnings
+
+3.2.3:
+  #30: raise errors when something in list() fail
+  DirectHTTP(s)/DirectFTP(s): do not raise error on list step as HEAD may not be supported
+  #35: allow redirections (closes #33)
+
+3.2.2:
+  #31 fix URL with multiple slashes
+  Update demo password for ftps web site tests
+  Remove python2 support
+
+3.2.1:
+  #26 Accept new keys for SFTP servers  (Closes #25)
+  Strip extra slash characters in remote file list (due to regexp parsing)
+  #20 Add a configurable mechanism to retry download when it fails
+  #24 Speed up IRODDownload
+  Introduce a method to perform configuration before network methods. Adapt implementation of generic methods and subclasses.
+  Resolve bug when the parser analyse also the Message Of The Day when it wants only list of file. (#23)
+
 3.1.2:
   #18 Add a protocol option to set CURLOPT_FTP_FILEMETHOD
   #19 Rename protocol options to options
   Fix copy of production files instead of download when files are in subdirectories
+
 3.1.1:
   #17 Support MDTM command in directftp
+
 3.1.0:
   #16 Don't change name after download in DirectHTTPDownloader
   PR #7 Refactor downloaders (*WARNING* breaks API)
+
 3.0.27:
   Fix previous release broken with a bug in direct protocols
+
 3.0.26:
   Change default download timeout to 1h
   #12 Allow FTPS protocol
   #14 Add mechanism for protocol specific options
+
 3.0.25:
   Allow to use hardlinks in LocalDownload
+
 3.0.24:
   Remove debug logs
+
 3.0.23:
   Support spaces in remote file names
+
 3.0.22:
   Fix **/* remote.files parsing
+
 3.0.21:
   Fix traefik labels
+
 3.0.20:
   Update pika dependency release
   Add tags for traefik support
+
 3.0.19:
   Check archives after download
   Fix python regexps syntax (deprecation)
+
 3.0.18:
   Rename protobuf and use specific package to avoid conflicts
+
 3.0.17:
   Regenerate protobuf message desc, failing on python3
+
 3.0.16:
   Add missing req in setup.py
+
 3.0.15:
   Fix progress download control where could have infinite loop
   Add irods download


=====================================
README.md
=====================================
@@ -7,6 +7,8 @@ Microservice to manage the downloads of biomaj.
 A protobuf interface is available in biomaj_download/message/message_pb2.py to exchange messages between BioMAJ and the download service.
 Messages go through RabbitMQ (to be installed).
 
+Python3 support only, python2 support is dropped
+
 # Protobuf
 
 To compile protobuf, in biomaj_download/message:
@@ -15,7 +17,7 @@ To compile protobuf, in biomaj_download/message:
 
 # Development
 
-    flake8  biomaj_download/\*.py biomaj_download/download
+    flake8 --ignore E501 biomaj_download/\*.py biomaj_download/download
 
 # Test
 
@@ -59,46 +61,154 @@ Web processes should be behind a proxy/load balancer, API base url /api/download
 
 Prometheus endpoint metrics are exposed via /metrics on web server
 
+# Retrying
+
+A common problem when downloading a large number of files is the handling of temporary failures (network issues, server too busy to answer, etc.).
+Since version 3.1.2, `biomaj-download` uses the [Tenacity library](https://github.com/jd/tenacity) which is designed to handle this.
+This mechanism is configurable through 2 downloader-specific options (see [Download options](#download-options)): **stop_condition** and **wait_policy**.
+
+When working on python code, you can pass instances of Tenacity's `stop_base` and `wait_base` respectively.
+This includes classes defined in Tenacity or your own derived classes.
+
+For bank configuration those options also parse strings read from the configuration file.
+This parsing is based on the [Simple Eval library](https://github.com/danthedeckie/simpleeval).
+The rules are straightforward:
+
+  * All concrete stop and wait classes defined in Tenacity (i.e. classes inheriting from `stop_base` and `wait_base` respectively) can be used
+    by calling their constructor with the expected parameters.
+    For example, the string `"stop_after_attempt(5)"` will create the desired object.
+	Note that stop and wait classes that need no argument must be used as constants (i.e. use `"stop_never"` and not `"stop_never()"`).
+	Currently, this is the case for `"stop_never"` (as in Tenacity) and `"wait_none"` (this slightly differs from Tenacity where it is `"wait_none()"`).
+  * You can use classes that allow to combine other stop conditions (namely `stop_all` and `stop_any`) or wait policies (namely `wait_combine`).
+  * Operator `+` can be used to add wait policies (similar to `wait_combine`).
+  * Operators `&` and `|` can be used to compose stop conditions (similar to `wait_all` and `wait_none` respectively).
+
+However, in this case, you can't use your own conditions.
+The complete list of stop conditions is:
+
+* `stop_never` (although its use is discouraged)
+* `stop_after_attempt`
+* `stop_after_delay`
+* `stop_when_event_set`
+* `stop_all`
+* `stop_any`
+
+The complete list of wait policies is:
+
+* `wait_none`
+* `wait_fixed`
+* `wait_random`
+* `wait_incrementing`
+* `wait_exponential`
+* `wait_random_exponential`
+* `wait_combine`
+* `wait_chain`
+
+Please refer to [Tenacity doc](https://tenacity.readthedocs.io/en/latest/) for their meaning and their parameters.
+
+Examples (inspired by Tenacity doc):
+
+  * `"wait_fixed(3) + wait_random(0, 2)"` and `"wait_combine(wait_fixed(3), wait_random(0, 2))"` are equivalent and will wait 3 seconds + up to 2 seconds of random delay
+  * `"wait_chain(*([wait_fixed(3) for i in range(3)] + [wait_fixed(7) for i in range(2)] + [wait_fixed(9)]))"` will wait 3s for 3 attempts, 7s for the next 2 attempts and 9s for all attempts thereafter (here `+` is the list concatenation).
+  * `"wait_none + wait_random(1,2)"` will wait between 1s and 2s (since `wait_none` doesn't wait).
+  * `"stop_never | stop_after_attempt(5)"` will stop after 5 attempts (since `stop_never` never stops).
+
+Note that some protocols (e.g. FTP) classify errors as temporary or permanent (for example trying to download inexisting file).
+More generally, we could distinguish permanent errors based on error codes, etc. and not retry in this case.
+However in our experience, so called permanent errors may well be temporary.
+Therefore downloaders always retry whatever the error.
+In some cases, this is a waste of time but generally this is worth it.
+
+# Host keys
+
+When using the `sftp` protocol, `biomaj-download` must check the host key.
+Those keys are stored in a file (for instance `~/.ssh/known_hosts`).
+
+Two options are available to configure this:
+
+  - **ssh_hosts_file** which sets the file to use
+  - **ssh_new_host** which sets what to do for a new host
+
+When the host and the key are found in the file, the connection is accepted.
+If the host is found but the key missmatches, the connection is rejected
+(this usually indicates a problem or a change of configuration on the remote server).
+When the host is not found, the decision depends on the value of **ssh_new_host**:
+
+  - `reject` means that the connection is rejected
+  - `accept` means that the connection is accepted
+  - `add` means that the connection is accepted and the key is added to the file
+
+See the description of the options in [Download options](#download-options).
+
 # Download options
 
 Since version 3.0.26, you can use the `set_options` method to pass a dictionary of downloader-specific options.
 The following list shows some options and their effect (the option to set is the key and the parameter is the associated value):
 
+  * **stop_condition**:
+    * parameter: an instance of Tenacity `stop_base` or a string (see [Retrying](#retrying)).
+    * downloader(s): all (except `LocalDownload`).
+    * effect: sets the condition on which we should stop retrying to download a file.
+    * default: `stop_after_attempt(3)` (i.e. stop after 3 attempts).
+    * note: introduced in version 3.2.1.
+  * **wait_policy**:
+    * parameter: an instance of Tenacity `wait_base` or a string (see [Retrying](#retrying)).
+    * downloader(s): all (except `LocalDownload`).
+    * effect: sets the wait policy between download attempts.
+    * default: `wait_fixed(3)` (i.e. wait 3 seconds between attempts).
+    * note: introduced in version 3.2.1.
   * **skip_check_uncompress**:
     * parameter: bool.
-    * downloader(s): all.
-    * effect: If true, don't test the archives after download.
+    * downloader(s): all (except `LocalDownload`).
+    * effect: if true, don't test the archives after download.
     * default: false (i.e. test the archives).
   * **ssl_verifyhost**:
     * parameter: bool.
-    * downloader(s): `CurlDownloader`, `DirectFTPDownload`, `DirectHTTPDownload`.
-    * effect: If false, don't check that the name of the remote server is the same than in the SSL certificate.
+    * downloader(s): `CurlDownload` (and derived classes: `DirectFTPDownload`, `DirectHTTPDownload`).
+    * effect: if false, don't check that the name of the remote server is the same than in the SSL certificate.
     * default: true (i.e. check host name).
-    * note: It's generally a bad idea to disable this verification. However some servers are badly configured. See [here](https://curl.haxx.se/libcurl/c/CURLOPT_SSL_VERIFYHOST.html) for the corresponding cURL option.
+    * note: it's generally a bad idea to disable this verification. However some servers are badly configured. See [here](https://curl.haxx.se/libcurl/c/CURLOPT_SSL_VERIFYHOST.html) for the corresponding cURL option.
   * **ssl_verifypeer**:
     * parameter: bool.
-    * downloader(s): `CurlDownloader`, `DirectFTPDownload`, `DirectHTTPDownload`.
-    * effect: If false, don't check the authenticity of the peer's certificate.
+    * downloader(s): `CurlDownload` (and derived classes: `DirectFTPDownload`, `DirectHTTPDownload`).
+    * effect: if false, don't check the authenticity of the peer's certificate.
     * default: true (i.e. check authenticity).
-    * note: It's generally a bad idea to disable this verification. However some servers are badly configured. See [here](https://curl.haxx.se/libcurl/c/CURLOPT_SSL_VERIFYPEER.html) for the corresponding cURL option.
+    * note: it's generally a bad idea to disable this verification. However some servers are badly configured. See [here](https://curl.haxx.se/libcurl/c/CURLOPT_SSL_VERIFYPEER.html) for the corresponding cURL option.
   * **ssl_server_cert**:
-    * parameter: filename of the certificate.
-    * downloader(s): `CurlDownloader`, `DirectFTPDownload`, `DirectHTTPDownload`.
-    * effect: Pass a file holding one or more certificates to verify the peer with.
+    * parameter: path of the certificate file.
+    * downloader(s): `CurlDownload` (and derived classes: `DirectFTPDownload`, `DirectHTTPDownload`).
+    * effect: use the certificate(s) in this file to verify the peer with.
     * default: use OS certificates.
-    * note: See [here](https://curl.haxx.se/libcurl/c/CURLOPT_CAINFO.html) for the corresponding cURL option.
-  * **tcp_keepalive**:
+    * note: see [here](https://curl.haxx.se/libcurl/c/CURLOPT_CAINFO.html) for the corresponding cURL option.
     * parameter: int.
-    * downloader(s): `CurlDownloader`, `DirectFTPDownload`, `DirectHTTPDownload`.
-    * effect: Sets the interval, in seconds, that the operating system will wait between sending keepalive probes.
+    * downloader(s): `CurlDownload` (and derived classes: `DirectFTPDownload`, `DirectHTTPDownload`).
+    * effect: sets the interval, in seconds, that the operating system will wait between sending keepalive probes.
     * default: cURL default (60s at the time of this writing).
-    * note: See [here](https://curl.haxx.se/libcurl/c/CURLOPT_TCP_KEEPINTVL.html) for the corresponding cURL option.
+    * note: see [here](https://curl.haxx.se/libcurl/c/CURLOPT_TCP_KEEPINTVL.html) for the corresponding cURL option.
   * **ftp_method**:
     * parameter: one of `default`, `multicwd`, `nocwd`, `singlecwd` (case insensitive).
-    * downloader(s): `CurlDownloader`, `DirectFTPDownload`, `DirectHTTPDownload`.
-    * effect: Sets the method to use to reach a file on a FTP(S) server (`nocwd` and `singlecwd` are usually faster but not always supported).
-    * default: `default` (which is `multicwd` at the time of this writing)
-    * note: See [here](https://curl.haxx.se/libcurl/c/CURLOPT_FTP_FILEMETHOD.html) for the corresponding cURL option.
+    * downloader(s): `CurlDownload` (and derived classes: `DirectFTPDownload`, `DirectHTTPDownload`) - only used for `FTP(S)`.
+    * effect: sets the method used to reach a file on a FTP(S) server (`nocwd` and `singlecwd` are usually faster but not always supported).
+    * default: `default` (which is `multicwd` at the time of this writing as in cURL).
+    * note: see [here](https://curl.haxx.se/libcurl/c/CURLOPT_FTP_FILEMETHOD.html) for the corresponding cURL option; introduced in version 3.1.2.
+  * **ssh_hosts_file**:
+    * parameter: path of the known hosts file.
+    * downloader(s): `CurlDownload` (and derived classes: `DirectFTPDownload`, `DirectHTTPDownload`) - only used for `SFTP`.
+    * effect: sets the file used to read/store host keys for `SFTP`.
+    * default: `~/.ssh/known_hosts` (where `~` is the home directory of the current user).
+    * note: see [here](https://curl.haxx.se/libcurl/c/CURLOPT_SSH_KNOWNHOSTS.html) for the corresponding cURL option and the option below; introduced in version 3.2.1.
+  * **ssh_new_host**:
+    * parameter: one of `reject`, `accept`, `add`.
+    * downloader(s): `CurlDownload` (and derived classes: `DirectFTPDownload`, `DirectHTTPDownload`) - only used for `SFTP`.
+    * effect: sets the policy to use for an unknown host.
+    * default: `reject` (i.e. refuse new hosts - you must add them in the file for instance with `ssh` or `sftp`).
+    * note: see [here](https://curl.haxx.se/libcurl/c/CURLOPT_SSH_KEYFUNCTION.html) for the corresponding cURL option and the option above; introduced in version 3.2.1.
+  * *allow_redirections*:
+    * parameter: bool.
+    * downloader(s): `CurlDownload` (and derived classes: `DirectFTPDownload`, `DirectHTTPDownload`) - only used for `HTTPS(S)`.
+    * effect: sets the policy for `HTTP` redirections.
+    * default: `true` (i.e. follow redirections).
+    * note: see [here](https://curl.haxx.se/libcurl/c/CURLOPT_FOLLOWLOCATION.html) for the corresponding cURL option; introduced in version 3.2.3.
 
 Those options can be set in bank properties.
 See file `global.properties.example` in [biomaj module](https://github.com/genouest/biomaj).


=====================================
bin/biomaj_download_consumer.py
=====================================
@@ -18,7 +18,7 @@ if 'BIOMAJ_CONFIG' in os.environ:
 
 config = None
 with open(config_file, 'r') as ymlfile:
-    config = yaml.load(ymlfile)
+    config = yaml.load(ymlfile, Loader=yaml.FullLoader)
     Utils.service_config_override(config)
 
 


=====================================
biomaj_download/biomaj_download_web.py
=====================================
@@ -34,7 +34,7 @@ if 'BIOMAJ_CONFIG' in os.environ:
 
 config = None
 with open(config_file, 'r') as ymlfile:
-    config = yaml.load(ymlfile)
+    config = yaml.load(ymlfile, Loader=yaml.FullLoader)
     Utils.service_config_override(config)
 
 


=====================================
biomaj_download/download/curl.py
=====================================
@@ -1,10 +1,10 @@
-import sys
-import os
 import re
 from datetime import datetime
 import hashlib
 import time
 import stat
+from urllib.parse import urlencode
+from io import BytesIO
 
 import pycurl
 import ftputil
@@ -12,59 +12,9 @@ import ftputil
 import humanfriendly
 
 from biomaj_core.utils import Utils
-from biomaj_download.download.interface import DownloadInterface
-
-if sys.version_info[0] < 3:
-    from urllib import urlencode
-else:
-    from urllib.parse import urlencode
-
-try:
-    from io import BytesIO
-except ImportError:
-    from StringIO import StringIO as BytesIO
-
-# We use stat.filemode to convert from mode octal value to string.
-# In python < 3.3, stat.filmode is not defined.
-# This code is copied from the current implementation of stat.filemode.
-if 'filemode' not in stat.__dict__:
-    _filemode_table = (
-        ((stat.S_IFLNK,                "l"),    # noqa: E241
-         (stat.S_IFREG,                "-"),    # noqa: E241
-         (stat.S_IFBLK,                "b"),    # noqa: E241
-         (stat.S_IFDIR,                "d"),    # noqa: E241
-         (stat.S_IFCHR,                "c"),    # noqa: E241
-         (stat.S_IFIFO,                "p")),   # noqa: E241
-        ((stat.S_IRUSR,                "r"),),  # noqa: E241
-        ((stat.S_IWUSR,                "w"),),  # noqa: E241
-        ((stat.S_IXUSR | stat.S_ISUID, "s"),    # noqa: E241
-         (stat.S_ISUID,                "S"),    # noqa: E241
-         (stat.S_IXUSR,                "x")),   # noqa: E241
-        ((stat.S_IRGRP,                "r"),),  # noqa: E241
-        ((stat.S_IWGRP,                "w"),),  # noqa: E241
-        ((stat.S_IXGRP | stat.S_ISGID, "s"),    # noqa: E241
-         (stat.S_ISGID,                "S"),    # noqa: E241
-         (stat.S_IXGRP,                "x")),   # noqa: E241
-        ((stat.S_IROTH,                "r"),),  # noqa: E241
-        ((stat.S_IWOTH,                "w"),),  # noqa: E241
-        ((stat.S_IXOTH | stat.S_ISVTX, "t"),    # noqa: E241
-         (stat.S_ISVTX,                "T"),    # noqa: E241
-         (stat.S_IXOTH,                "x"))    # noqa: E241
-    )
-
-    def _filemode(mode):
-        """Convert a file's mode to a string of the form '-rwxrwxrwx'."""
-        perm = []
-        for table in _filemode_table:
-            for bit, char in table:
-                if mode & bit == bit:
-                    perm.append(char)
-                    break
-            else:
-                perm.append("-")
-        return "".join(perm)
+from biomaj_core.config import BiomajConfig
 
-    stat.filemode = _filemode
+from biomaj_download.download.interface import DownloadInterface
 
 
 class HTTPParse(object):
@@ -121,6 +71,13 @@ class CurlDownload(DownloadInterface):
         "singlecwd": pycurl.FTPMETHOD_SINGLECWD,
     }
 
+    # Valid values for ssh_new_host options as string and int
+    VALID_SSH_NEW_HOST = {
+        "reject": pycurl.KHSTAT_REJECT,
+        "accept": pycurl.KHSTAT_FINE,
+        "add": pycurl.KHSTAT_FINE_ADD_TO_FILE,
+    }
+
     def __init__(self, curl_protocol, host, rootdir, http_parse=None):
         """
         Initialize a CurlDownloader.
@@ -139,6 +96,19 @@ class CurlDownload(DownloadInterface):
         """
         DownloadInterface.__init__(self)
         self.logger.debug('Download')
+
+        # Check for ssh support
+        curl_opts_info = pycurl.version_info()
+        curl_opts = []
+        for opt in curl_opts_info:
+            if isinstance(opt, tuple):
+                for o in opt:
+                    curl_opts.append(o)
+            else:
+                curl_opts.append(opt)
+        if 'sftp' not in curl_opts:
+            CurlDownload.ALL_PROTOCOLS = CurlDownload.FTP_PROTOCOL_FAMILY + CurlDownload.HTTP_PROTOCOL_FAMILY
+            self.logger.warning("sftp not supported by curl: %s" % str(curl_opts_info))
         # Initialize curl_protocol.
         # Note that we don't change that field in set_protocol since this
         # method uses the protocol from the configuration file. It's not clear
@@ -151,15 +121,15 @@ class CurlDownload(DownloadInterface):
         if self.curl_protocol in self.FTP_PROTOCOL_FAMILY:
             self.protocol_family = "ftp"
             self._parse_result = self._ftp_parse_result
-            self.ERRCODE_OK = 226
+            self.ERRCODE_OK = [221, 226]
         elif self.curl_protocol in self.HTTP_PROTOCOL_FAMILY:
             self.protocol_family = "http"
             self._parse_result = self._http_parse_result
-            self.ERRCODE_OK = 200
+            self.ERRCODE_OK = [200]
         elif self.curl_protocol in self.SFTP_PROTOCOL_FAMILY:
             self.protocol_family = "sftp"
             self._parse_result = self._ftp_parse_result
-            self.ERRCODE_OK = 0
+            self.ERRCODE_OK = [0]
         else:  # Should not happen since we check before
             raise ValueError("Unknown protocol")
         self.rootdir = rootdir
@@ -182,11 +152,32 @@ class CurlDownload(DownloadInterface):
         self.tcp_keepalive = 0
         # FTP method (cURL --ftp-method option)
         self.ftp_method = pycurl.FTPMETHOD_DEFAULT  # Use cURL default
-
-    def _basic_curl_configuration(self):
+        # TODO: Don't store default values in BiomajConfig.DEFAULTS for
+        # ssh_hosts_file and ssh_new_hosts
+        # known_hosts file
+        self.ssh_hosts_file = BiomajConfig.DEFAULTS["ssh_hosts_file"]
+        # How to treat unknown host
+        self.ssh_new_host = self.VALID_SSH_NEW_HOST[BiomajConfig.DEFAULTS["ssh_new_host"]]
+        # Allow redirections
+        self.allow_redirections = True
+
+    def _accept_new_hosts(self, known_key, found_key, match):
+        # Key found in file: we can accept it
+        # Don't use KHSTAT_FINE_ADD_TO_FILE because the key would be duplicated
+        # See https://github.com/curl/curl/issues/4953.
+        if match == pycurl.KHMATCH_OK:
+            return pycurl.KHSTAT_FINE
+        # Key not found in file: use the ssh_new_host option
+        elif match == pycurl.KHMATCH_MISSING:
+            return self.ssh_new_host
+        # Key missmatch: the best option is to reject it
+        else:
+            return pycurl.KHSTAT_REJECT
+
+    def _network_configuration(self):
         """
         Perform basic configuration (i.e. that doesn't depend on the
-        operation: _download or list). This method shoulmd be called before any
+        operation: _download or list). This method should be called before any
         operation.
         """
         # Reset cURL options before setting them
@@ -200,11 +191,19 @@ class CurlDownload(DownloadInterface):
         if self.credentials is not None:
             self.crl.setopt(pycurl.USERPWD, self.credentials)
 
+        # Hosts file & function to decide for new hosts
+        if self.curl_protocol in self.SFTP_PROTOCOL_FAMILY:
+            self.crl.setopt(pycurl.SSH_KNOWNHOSTS, self.ssh_hosts_file)
+            self.crl.setopt(pycurl.SSH_KEYFUNCTION, self._accept_new_hosts)
+
         # Configure TCP keepalive
         if self.tcp_keepalive:
-            self.crl.setopt(pycurl.TCP_KEEPALIVE, True)
-            self.crl.setopt(pycurl.TCP_KEEPIDLE, self.tcp_keepalive * 2)
-            self.crl.setopt(pycurl.TCP_KEEPINTVL, self.tcp_keepalive)
+            try:
+                self.crl.setopt(pycurl.TCP_KEEPALIVE, True)
+                self.crl.setopt(pycurl.TCP_KEEPIDLE, self.tcp_keepalive * 2)
+                self.crl.setopt(pycurl.TCP_KEEPINTVL, self.tcp_keepalive)
+            except Exception as e:
+                self.logger.exception("TCP keepalive option failed: " + str(e))
 
         # Configure SSL verification (on some platforms, disabling
         # SSL_VERIFYPEER implies disabling SSL_VERIFYHOST so we set
@@ -223,6 +222,9 @@ class CurlDownload(DownloadInterface):
         # Configure ftp method
         self.crl.setopt(pycurl.FTP_FILEMETHOD, self.ftp_method)
 
+        # Configure redirections
+        self.crl.setopt(pycurl.FOLLOWLOCATION, self.allow_redirections)
+
         # Configure timeouts
         self.crl.setopt(pycurl.CONNECTTIMEOUT, 300)
         self.crl.setopt(pycurl.TIMEOUT, self.timeout)
@@ -280,6 +282,15 @@ class CurlDownload(DownloadInterface):
             if raw_val not in self.VALID_FTP_FILEMETHOD:
                 raise ValueError("Invalid value for ftp_method")
             self.ftp_method = self.VALID_FTP_FILEMETHOD[raw_val]
+        if "ssh_hosts_file" in options:
+            self.ssh_hosts_file = options["ssh_hosts_file"]
+        if "ssh_new_host" in options:
+            raw_val = options["ssh_new_host"].lower()
+            if raw_val not in self.VALID_SSH_NEW_HOST:
+                raise ValueError("Invalid value for ssh_new_host")
+            self.ssh_new_host = self.VALID_SSH_NEW_HOST[raw_val]
+        if "allow_redirections" in options:
+            self.allow_redirections = Utils.to_bool(options["allow_redirections"])
 
     def _append_file_to_download(self, rfile):
         # Add url and root to the file if needed (for safety)
@@ -292,66 +303,74 @@ class CurlDownload(DownloadInterface):
     def _file_url(self, rfile):
         # rfile['root'] is set to self.rootdir if needed but may be different.
         # We don't use os.path.join because rfile['name'] may starts with /
-        return self.url + '/' + rfile['root'] + rfile['name']
+        url = self.url + '/' + rfile['root'] + rfile['name']
+        url_elts = url.split('://')
+        if len(url_elts) == 2:
+            url_elts[1] = re.sub("/{2,}", "/", url_elts[1])
+            return '://'.join(url_elts)
+        return re.sub("/{2,}", "/", url)
 
     def _download(self, file_path, rfile):
         """
+        Download one file and return False in case of success and True
+        otherwise.
+
         This method is designed to work for FTP(S), HTTP(S) and SFTP.
         """
         error = True
-        nbtry = 1
         # Forge URL of remote file
         file_url = self._file_url(rfile)
-        while(error is True and nbtry < 3):
 
-            self._basic_curl_configuration()
-
-            try:
-                self.crl.setopt(pycurl.URL, file_url)
-            except Exception:
-                self.crl.setopt(pycurl.URL, file_url.encode('ascii', 'ignore'))
-
-            # Create file and assign it to the pycurl object
-            fp = open(file_path, "wb")
-            self.crl.setopt(pycurl.WRITEFUNCTION, fp.write)
-
-            # This is specific to HTTP
-            if self.method == 'POST':
-                # Form data must be provided already urlencoded.
-                postfields = urlencode(self.param)
-                # Sets request method to POST,
-                # Content-Type header to application/x-www-form-urlencoded
-                # and data to send in request body.
-                self.crl.setopt(pycurl.POSTFIELDS, postfields)
-
-            # Try download
-            try:
-                self.crl.perform()
-                errcode = self.crl.getinfo(pycurl.RESPONSE_CODE)
-                if int(errcode) != self.ERRCODE_OK:
-                    error = True
-                    self.logger.error('Error while downloading ' + file_url + ' - ' + str(errcode))
-                else:
-                    error = False
-            except Exception as e:
-                self.logger.error('Could not get errcode:' + str(e))
+        try:
+            self.crl.setopt(pycurl.URL, file_url)
+        except Exception:
+            self.crl.setopt(pycurl.URL, file_url.encode('ascii', 'ignore'))
+
+        # Create file and assign it to the pycurl object
+        fp = open(file_path, "wb")
+        self.crl.setopt(pycurl.WRITEFUNCTION, fp.write)
+
+        # This is specific to HTTP
+        if self.method == 'POST':
+            # Form data must be provided already urlencoded.
+            postfields = urlencode(self.param)
+            # Sets request method to POST,
+            # Content-Type header to application/x-www-form-urlencoded
+            # and data to send in request body.
+            self.crl.setopt(pycurl.POSTFIELDS, postfields)
+
+        # Try download (we don't raise errors here since its the return value
+        # ('error') that matters for the calling method; this is set to True
+        # only in case of success).
+        try:
+            self.crl.perform()
+            errcode = self.crl.getinfo(pycurl.RESPONSE_CODE)
+            if int(errcode) not in self.ERRCODE_OK:
+                error = True
+                self.logger.error('Error while downloading ' + file_url + ' - ' + str(errcode))
+            else:
+                error = False
+        except Exception as e:
+            self.logger.error('Error while downloading ' + file_url + ' - ' + str(e))
 
-            # Close file
-            fp.close()
+        # Check if we were redirected
+        if self.curl_protocol in self.HTTP_PROTOCOL_FAMILY:
+            n_redirect = self.crl.getinfo(pycurl.REDIRECT_COUNT)
+            if n_redirect:
+                real_url = self.crl.getinfo(pycurl.EFFECTIVE_URL)
+                redirect_time = self.crl.getinfo(pycurl.REDIRECT_TIME)
+                msg_fmt = 'Download was redirected to %s (%i redirection(s), took %ss)'
+                msg = msg_fmt % (real_url, n_redirect, redirect_time)
+                self.logger.info(msg)
 
-            # Check that the archive is correct
-            if not error and not self.skip_check_uncompress:
-                archive_status = Utils.archive_check(file_path)
-                if not archive_status:
-                    self.logger.error('Archive is invalid or corrupted, deleting file and retrying download')
-                    error = True
-                    if os.path.exists(file_path):
-                        os.remove(file_path)
+        # Close file
+        fp.close()
 
-            # Increment retry counter
-            nbtry += 1
+        if error:
+            return error
 
-        return error
+        # Our part is done so call parent _download
+        return super(CurlDownload, self)._download(file_path, rfile)
 
     def list(self, directory=''):
         '''
@@ -362,10 +381,11 @@ class CurlDownload(DownloadInterface):
         This is a generic method for HTTP and FTP. The protocol-specific parts
         are done in _<protocol>_parse_result.
         '''
-        dir_url = self.url + self.rootdir + directory
+        dirbase = re.sub('//+', '/', self.rootdir + directory)
+        dir_url = self.url + dirbase
         self.logger.debug('Download:List:' + dir_url)
 
-        self._basic_curl_configuration()
+        self._network_configuration()
 
         try:
             self.crl.setopt(pycurl.URL, dir_url)
@@ -379,8 +399,25 @@ class CurlDownload(DownloadInterface):
         # Try to list
         try:
             self.crl.perform()
+            errcode = self.crl.getinfo(pycurl.RESPONSE_CODE)
+            if int(errcode) not in self.ERRCODE_OK:
+                msg = 'Error while listing ' + dir_url + ' - ' + str(errcode)
+                self.logger.error(msg)
+                raise Exception(msg)
         except Exception as e:
-            self.logger.error('Could not get errcode:' + str(e))
+            msg = 'Error while listing ' + dir_url + ' - ' + str(e)
+            self.logger.error(msg)
+            raise e
+
+        # Check if we were redirected
+        if self.curl_protocol in self.HTTP_PROTOCOL_FAMILY:
+            n_redirect = self.crl.getinfo(pycurl.REDIRECT_COUNT)
+            if n_redirect:
+                real_url = self.crl.getinfo(pycurl.EFFECTIVE_URL)
+                redirect_time = self.crl.getinfo(pycurl.REDIRECT_TIME)
+                msg_fmt = 'Download was redirected to %s (%i redirection(s), took %ss)'
+                msg = msg_fmt % (real_url, n_redirect, redirect_time)
+                self.logger.info(msg)
 
         # Figure out what encoding was sent with the response, if any.
         # Check against lowercased header name.


=====================================
biomaj_download/download/direct.py
=====================================
@@ -1,9 +1,9 @@
 """
 Subclasses for direct download (i.e. downloading without regexp). The usage is
 a bit different: instead of calling method:`list` and method:`match`, client
-code explicitely calls method:`set_files_to_download` (passing a list
+code explicitly calls method:`set_files_to_download` (passing a list
 containing only the file name). method:`list` is used to get more information
-about the file (if possile). method:`match` matches everything.
+about the file (if possible). method:`match` matches everything.
 Also client code can use method:`set_save_as` to indicate the name of the file
 to save.
 
@@ -21,22 +21,13 @@ import datetime
 import pycurl
 import re
 import hashlib
-import sys
 import os
+from urllib.parse import urlencode
+from io import BytesIO
 
 from biomaj_download.download.curl import CurlDownload
 from biomaj_core.utils import Utils
 
-if sys.version_info[0] < 3:
-    from urllib import urlencode
-else:
-    from urllib.parse import urlencode
-
-try:
-    from io import BytesIO
-except ImportError:
-    from StringIO import StringIO as BytesIO
-
 
 class DirectFTPDownload(CurlDownload):
     '''
@@ -45,29 +36,38 @@ class DirectFTPDownload(CurlDownload):
 
     ALL_PROTOCOLS = ["ftp", "ftps"]
 
-    def _append_file_to_download(self, filename):
+    def _append_file_to_download(self, rfile):
         '''
         Initialize the files in list with today as last-modification date.
         Size is also preset to zero.
         '''
+        filename = None
+        # workaround to handle file dict info or file name
+        # this is dirty, we expect to handle dicts now,
+        # biomaj workflow should fix this
+        if isinstance(rfile, dict):
+            filename = rfile['name']
+        else:
+            # direct protocol send directly some filename
+            filename = rfile
         today = datetime.date.today()
-        rfile = {}
-        rfile['root'] = self.rootdir
-        rfile['permissions'] = ''
-        rfile['group'] = ''
-        rfile['user'] = ''
-        rfile['size'] = 0
-        rfile['month'] = today.month
-        rfile['day'] = today.day
-        rfile['year'] = today.year
+        new_rfile = {}
+        new_rfile['root'] = self.rootdir
+        new_rfile['permissions'] = ''
+        new_rfile['group'] = ''
+        new_rfile['user'] = ''
+        new_rfile['size'] = 0
+        new_rfile['month'] = today.month
+        new_rfile['day'] = today.day
+        new_rfile['year'] = today.year
         if filename.endswith('/'):
-            rfile['name'] = filename[:-1]
+            new_rfile['name'] = filename[:-1]
         else:
-            rfile['name'] = filename
-        rfile['hash'] = None
+            new_rfile['name'] = filename
+        new_rfile['hash'] = None
         # Use self.save_as even if we use it in list(). This is important.
-        rfile['save_as'] = self.save_as
-        super(DirectFTPDownload, self)._append_file_to_download(rfile)
+        new_rfile['save_as'] = self.save_as
+        super(DirectFTPDownload, self)._append_file_to_download(new_rfile)
 
     def set_files_to_download(self, files_to_download):
         if len(files_to_download) > 1:
@@ -80,13 +80,26 @@ class DirectFTPDownload(CurlDownload):
     def _file_url(self, rfile):
         # rfile['root'] is set to self.rootdir if needed but may be different.
         # We don't use os.path.join because rfile['name'] may starts with /
-        return self.url + '/' + rfile['root'] + rfile['name']
+        url = self.url + '/' + rfile['root'] + rfile['name']
+        url_elts = url.split('://')
+        url_elts[1] = re.sub("/{2,}", "/", url_elts[1])
+        return '://'.join(url_elts)
 
     def list(self, directory=''):
         '''
         FTP protocol does not give us the possibility to get file date from remote url
         '''
-        self._basic_curl_configuration()
+        self._network_configuration()
+        # Specific configuration
+        # With those options, cURL will issue a sequence of commands (SIZE,
+        # MDTM) to get the file size and last modification time and then issue
+        # a REST command. This usually ends with code 350. Therefore we
+        # explicitly handle this in this method.
+        # Note that very old servers may not support the MDTM command.
+        # Therefore, cURL will raise an error (although we can probably
+        # download the file).
+        self.crl.setopt(pycurl.OPT_FILETIME, True)
+        self.crl.setopt(pycurl.NOBODY, True)
         for rfile in self.files_to_download:
             if self.save_as is None:
                 self.save_as = os.path.basename(rfile['name'])
@@ -97,19 +110,20 @@ class DirectFTPDownload(CurlDownload):
             except Exception:
                 self.crl.setopt(pycurl.URL, file_url.encode('ascii', 'ignore'))
             self.crl.setopt(pycurl.URL, file_url)
-            self.crl.setopt(pycurl.OPT_FILETIME, True)
-            self.crl.setopt(pycurl.NOBODY, True)
-
-            # Very old servers may not support the MDTM commands. Therefore,
-            # cURL will raise an error. In that case, we simply skip the rest
-            # of the function as it was done before. Download will work however.
-            # Note that if the file does not exist, it will be skipped too
-            # (that was the case before too). Of course, download will fail in
-            # this case.
+
             try:
                 self.crl.perform()
-            except Exception:
-                continue
+                errcode = int(self.crl.getinfo(pycurl.RESPONSE_CODE))
+                # As explained, 350 is correct. We check against ERRCODE_OK
+                # just in case.
+                if errcode != 350 and errcode not in self.ERRCODE_OK:
+                    msg = 'Error while listing ' + file_url + ' - ' + str(errcode)
+                    self.logger.error(msg)
+                    raise Exception(msg)
+            except Exception as e:
+                msg = 'Error while listing ' + file_url + ' - ' + str(e)
+                self.logger.error(msg)
+                raise e
 
             timestamp = self.crl.getinfo(pycurl.INFO_FILETIME)
             dt = datetime.datetime.fromtimestamp(timestamp)
@@ -148,8 +162,15 @@ class DirectHTTPDownload(DirectFTPDownload):
         '''
         Try to get file headers to get last_modification and size
         '''
-        self._basic_curl_configuration()
+        self._network_configuration()
         # Specific configuration
+        # With those options, cURL will issue a HEAD request. This may not be
+        # supported especially on resources that are accessed using POST. In
+        # this case, HTTP will return code 405. We explicitely handle this case
+        # in this method.
+        # Note also that in many cases, there is no Last-Modified field in
+        # headers since this is usually dynamic content (Content-Length is
+        # usually present).
         self.crl.setopt(pycurl.HEADER, True)
         self.crl.setopt(pycurl.NOBODY, True)
         for rfile in self.files_to_download:
@@ -168,7 +189,24 @@ class DirectHTTPDownload(DirectFTPDownload):
             output = BytesIO()
             self.crl.setopt(pycurl.WRITEFUNCTION, output.write)
 
-            self.crl.perform()
+            try:
+                self.crl.perform()
+                errcode = int(self.crl.getinfo(pycurl.RESPONSE_CODE))
+                if errcode == 405:
+                    # HEAD not supported by the server for this URL so we can
+                    # skip the rest of the loop (we won't have metadata about
+                    # the file but biomaj should be fine).
+                    msg = 'Listing ' + file_url + ' not supported. This is fine, continuing.'
+                    self.logger.info(msg)
+                    continue
+                elif errcode not in self.ERRCODE_OK:
+                    msg = 'Error while listing ' + file_url + ' - ' + str(errcode)
+                    self.logger.error(msg)
+                    raise Exception(msg)
+            except Exception as e:
+                msg = 'Error while listing ' + file_url + ' - ' + str(e)
+                self.logger.error(msg)
+                raise e
 
             # Figure out what encoding was sent with the response, if any.
             # Check against lowercased header name.


=====================================
biomaj_download/download/interface.py
=====================================
@@ -3,8 +3,13 @@ import logging
 import datetime
 import time
 import re
+import copy
+
+import tenacity
+from simpleeval import simple_eval, ast
 
 from biomaj_core.utils import Utils
+from biomaj_core.config import BiomajConfig
 
 
 class _FakeLock(object):
@@ -43,6 +48,96 @@ class DownloadInterface(object):
 
     files_num_threads = 4
 
+    #
+    # Constants to parse retryer
+    #
+    # Note that due to the current implementation of operators, tenacity allows
+    # nonsensical operations. For example the following snippets are valid:
+    # stop_after_attempt(1, 2) + 4
+    # stop_after_attempt(1, 2) + stop_none.
+    # Of course, trying to use those wait policies will raise cryptic errors.
+    # The situation is similar for stop conditions.
+    # See https://github.com/jd/tenacity/issues/211.
+    # To avoid such errors, we test the objects in _set_retryer.
+    #
+    # Another confusing issue is that stop_never is an object (instance of the
+    # class _stop_never). For parsing, if we consider stop_never as a
+    # function then both "stop_never" and "stop_never()" are parsed correctly
+    # but the later raises error. Considering it has a name is slightly more
+    # clear (since then we must write "stop_none" as we do when we use tenacity
+    # directly). For consistency, we create a name for wait_none (as an
+    # instance of the class wait_none).
+    #
+
+    # Functions available when parsing stop condition: those are constructors
+    # of stop conditions classes (then using them will create objects). Note
+    # that there is an exception for stop_never.
+    ALL_STOP_CONDITIONS = {
+        # "stop_never": tenacity.stop._stop_never,  # In case, we want to use it like a function (see above)
+        "stop_when_event_set": tenacity.stop_when_event_set,
+        "stop_after_attempt": tenacity.stop_after_attempt,
+        "stop_after_delay": tenacity.stop_after_delay,
+        "stop_any": tenacity.stop_any,  # Similar to |
+        "stop_all": tenacity.stop_all,  # Similar to &
+    }
+
+    # tenacity.stop_never is an instance of _stop_never, not a class so we
+    # import it as a name.
+    ALL_STOP_NAMES = {
+        "stop_never": tenacity.stop_never,
+    }
+
+    # Operators for stop conditions: | means to stop if one of the conditions
+    # is True, & means to stop if all the conditions are True.
+    ALL_STOP_OPERATORS = {
+        ast.BitOr: tenacity.stop.stop_base.__or__,
+        ast.BitAnd: tenacity.stop.stop_base.__and__,
+    }
+
+    # Functions available when parsing wait policy: those are constructors
+    # of wait policies classes (then using them will create objects). Note
+    # that there is an exception for wait_none.
+    ALL_WAIT_POLICIES = {
+        # "wait_none": tenacity.wait_none,  # In case, we want to use it like a function (see above)
+        "wait_fixed": tenacity.wait_fixed,
+        "wait_random": tenacity.wait_random,
+        "wait_incrementing": tenacity.wait_incrementing,
+        "wait_exponential": tenacity.wait_exponential,
+        "wait_random_exponential": tenacity.wait_random_exponential,
+        "wait_combine": tenacity.wait_combine,  # Sum of wait policies (similar to +)
+        "wait_chain": tenacity.wait_chain,  # Give a list of wait policies (one for each attempt)
+    }
+
+    # Create an instance of wait_none to use it like a constant.
+    ALL_WAIT_NAMES = {
+        "wait_none": tenacity.wait.wait_none()
+    }
+
+    # Operators for wait policies: + means to sum waiting times of wait
+    # policies.
+    ALL_WAIT_OPERATORS = {
+        ast.Add: tenacity.wait.wait_base.__add__
+    }
+
+    @staticmethod
+    def is_true(download_error):
+        """Method used by retryer to determine if we should retry to downlaod a
+        file based on the return value of method:`_download` (passed as the
+        argument): we must retry while this value is True.
+
+        See method:`_set_retryer`.
+        """
+        return download_error is True
+
+    @staticmethod
+    def return_last_value(retry_state):
+        """Method used by the retryer to determine the return value of the
+        retryer: we return the result of the last attempt.
+
+        See method:`_set_retryer`.
+        """
+        return retry_state.outcome.result()
+
     def __init__(self):
         # This variable defines the protocol as passed by the config file (i.e.
         # this is directftp for DirectFTPDownload). It is used by the workflow
@@ -70,6 +165,13 @@ class DownloadInterface(object):
         # Options
         self.options = {}  # This field is used to forge the download message
         self.skip_check_uncompress = False
+        # TODO: Don't store default values in BiomajConfig.DEFAULTS for
+        # wait_policy and stop_condition
+        # Construct default retryer (may be replaced in set_options)
+        self._set_retryer(
+            BiomajConfig.DEFAULTS["stop_condition"],
+            BiomajConfig.DEFAULTS["wait_policy"]
+        )
 
     #
     # Setters for downloader
@@ -138,6 +240,76 @@ class DownloadInterface(object):
         self.options = options
         if "skip_check_uncompress" in options:
             self.skip_check_uncompress = Utils.to_bool(options["skip_check_uncompress"])
+        # If stop_condition or wait_policy is specified, we reconstruct the retryer
+        if "stop_condition" or "wait_policy" in options:
+            stop_condition = options.get("stop_condition", BiomajConfig.DEFAULTS["stop_condition"])
+            wait_policy = options.get("wait_policy", BiomajConfig.DEFAULTS["wait_policy"])
+            self._set_retryer(stop_condition, wait_policy)
+
+    def _set_retryer(self, stop_condition, wait_policy):
+        """
+        Add a retryer to retry the current download if it fails.
+        """
+        # Try to construct stop condition
+        if isinstance(stop_condition, tenacity.stop.stop_base):
+            # Use the value directly
+            stop_cond = stop_condition
+        elif isinstance(stop_condition, str):
+            # Try to parse the string
+            try:
+                stop_cond = simple_eval(stop_condition,
+                                        functions=self.ALL_STOP_CONDITIONS,
+                                        operators=self.ALL_STOP_OPERATORS,
+                                        names=self.ALL_STOP_NAMES)
+                # Check that it is an instance of stop_base
+                if not isinstance(stop_cond, tenacity.stop.stop_base):
+                    raise ValueError(stop_condition + " doesn't yield a stop condition")
+                # Test that this is a correct stop condition by calling it.
+                # We use a deepcopy to be sure to not alter the object (even
+                # if it seems that calling a wait policy doesn't modify it).
+                try:
+                    s = copy.deepcopy(stop_cond)
+                    s(tenacity.compat.make_retry_state(0, 0))
+                except Exception:
+                    raise ValueError(stop_condition + " doesn't yield a stop condition")
+            except Exception as e:
+                raise ValueError("Error while parsing stop condition: %s" % e)
+        else:
+            raise TypeError("Expected tenacity.stop.stop_base or string, got %s" % type(stop_condition))
+        # Try to construct wait policy
+        if isinstance(wait_policy, tenacity.wait.wait_base):
+            # Use the value directly
+            wait_pol = wait_policy
+        elif isinstance(wait_policy, str):
+            # Try to parse the string
+            try:
+                wait_pol = simple_eval(wait_policy,
+                                       functions=self.ALL_WAIT_POLICIES,
+                                       operators=self.ALL_WAIT_OPERATORS,
+                                       names=self.ALL_WAIT_NAMES)
+                # Check that it is an instance of wait_base
+                if not isinstance(wait_pol, tenacity.wait.wait_base):
+                    raise ValueError(wait_policy + " doesn't yield a wait policy")
+                # Test that this is a correct wait policy by calling it.
+                # We use a deepcopy to be sure to not alter the object (even
+                # if it seems that calling a stop condition doesn't modify it).
+                try:
+                    w = copy.deepcopy(wait_pol)
+                    w(tenacity.compat.make_retry_state(0, 0))
+                except Exception:
+                    raise ValueError(wait_policy + " doesn't yield a wait policy")
+            except Exception as e:
+                raise ValueError("Error while parsing wait policy: %s" % e)
+        else:
+            raise TypeError("Expected tenacity.stop.wait_base or string, got %s" % type(wait_policy))
+
+        self.retryer = tenacity.Retrying(
+            stop=stop_cond,
+            wait=wait_pol,
+            retry_error_callback=self.return_last_value,
+            retry=tenacity.retry_if_result(self.is_true),
+            reraise=True
+        )
 
     #
     # File operations (match, list, download) and associated hook methods
@@ -157,6 +329,8 @@ class DownloadInterface(object):
         if self.param:
             if 'param' not in rfile or not rfile['param']:
                 rfile['param'] = self.param
+        # Remove duplicate */* if any
+        rfile['name'] = re.sub('//+', '/', rfile['name'])
         self.files_to_download.append(rfile)
 
     def set_files_to_download(self, files):
@@ -183,7 +357,6 @@ class DownloadInterface(object):
         :type submatch: bool
         '''
         self.logger.debug('Download:File:RegExp:' + str(patterns))
-
         if dir_list is None:
             dir_list = []
 
@@ -232,6 +405,7 @@ class DownloadInterface(object):
                             rfile['name'] = prefix + '/' + rfile['name']
                         self._append_file_to_download(rfile)
                         self.logger.debug('Download:File:MatchRegExp:' + rfile['name'])
+
         if not submatch and len(self.files_to_download) == 0:
             raise Exception('no file found matching expressions')
 
@@ -280,7 +454,6 @@ class DownloadInterface(object):
             for dfile in self.files_to_download:
                 if index < len(new_or_modified_files) and \
                         dfile['name'] == new_or_modified_files[index][0]:
-
                     new_files_to_download.append(dfile)
                     index += 1
                 else:
@@ -310,7 +483,28 @@ class DownloadInterface(object):
     def _download(self, file_path, rfile):
         '''
         Download one file and return False in case of success and True
-        otherwise. This must be implemented in subclasses.
+        otherwise.
+
+        Subclasses that override this method must call this implementation
+        at the end to perform test on archives.
+
+        Note that this method is executed inside a retryer.
+        '''
+        error = False
+        # Check that the archive is correct
+        if not self.skip_check_uncompress:
+            archive_status = Utils.archive_check(file_path)
+            if not archive_status:
+                self.logger.error('Archive is invalid or corrupted, deleting file and retrying download')
+                error = True
+                if os.path.exists(file_path):
+                    os.remove(file_path)
+        return error
+
+    def _network_configuration(self):
+        '''
+        Perform some configuration before network operations (list and
+        download). This must be implemented in subclasses.
         '''
         raise NotImplementedError()
 
@@ -325,6 +519,7 @@ class DownloadInterface(object):
         :return: list of downloaded files
         '''
         self.logger.debug(self.__class__.__name__ + ':Download')
+        self._network_configuration()
         nb_files = len(self.files_to_download)
         cur_files = 1
         self.offline_dir = local_dir
@@ -352,7 +547,7 @@ class DownloadInterface(object):
             cur_files += 1
             start_time = datetime.datetime.now()
             start_time = time.mktime(start_time.timetuple())
-            error = self._download(file_path, rfile)
+            error = self.retryer(self._download, file_path, rfile)
             if error:
                 rfile['download_time'] = 0
                 rfile['error'] = True


=====================================
biomaj_download/download/localcopy.py
=====================================
@@ -8,13 +8,16 @@ from biomaj_download.download.interface import DownloadInterface
 
 class LocalDownload(DownloadInterface):
     '''
-    Base class to copy file from local system
+    Base class to copy file from local system.
 
     protocol=cp
     server=localhost
     remote.dir=/blast/db/FASTA/
 
     remote.files=^alu.*\\.gz$
+
+    Note that we redefine download and list in such a way that we don't need to
+    define _download and _network_configuration.
     '''
 
     def __init__(self, rootdir, use_hardlinks=False):
@@ -57,7 +60,12 @@ class LocalDownload(DownloadInterface):
         rfiles = []
         rdirs = []
 
-        files = [f for f in os.listdir(self.rootdir + directory)]
+        try:
+            files = [f for f in os.listdir(self.rootdir + directory)]
+        except Exception as e:
+            msg = 'Error while listing ' + self.rootdir + ' - ' + str(e)
+            self.logger.error(msg)
+            raise e
         for file_in_files in files:
             rfile = {}
             fstat = os.stat(os.path.join(self.rootdir + directory, file_in_files))


=====================================
biomaj_download/download/protocolirods.py
=====================================
@@ -35,37 +35,48 @@ class IRODSDownload(DownloadInterface):
             self.port = int(param['port'])
 
     def list(self, directory=''):
-        session = iRODSSession(host=self.server, port=self.port, user=self.user, password=self.password, zone=self.zone)
+        self._network_configuration()
         rfiles = []
         rdirs = []
         rfile = {}
         date = None
-        query = session.query(DataObject.name, DataObject.size,
-                              DataObject.owner_name, DataObject.modify_time)
-        results = query.filter(User.name == self.user).get_results()
-        for result in results:
-            # Avoid duplication
-            if rfile != {} and rfile['name'] == str(result[DataObject.name]) \
-               and date == str(result[DataObject.modify_time]).split(" ")[0].split('-'):
-                continue
-            rfile = {}
-            date = str(result[DataObject.modify_time]).split(" ")[0].split('-')
-            rfile['permissions'] = "-rwxr-xr-x"
-            rfile['size'] = int(result[DataObject.size])
-            rfile['month'] = int(date[1])
-            rfile['day'] = int(date[2])
-            rfile['year'] = int(date[0])
-            rfile['name'] = str(result[DataObject.name])
-            rfiles.append(rfile)
-        session.cleanup()
+        # Note that iRODS raise errors when trying to use the results
+        # and not after query(). Therefore, the whole loop is inside
+        # try/catch.
+        try:
+            query = self.session.query(DataObject.name, DataObject.size,
+                                       DataObject.owner_name, DataObject.modify_time)
+            results = query.filter(User.name == self.user).get_results()
+            for result in results:
+                # Avoid duplication
+                if rfile != {} and rfile['name'] == str(result[DataObject.name]) \
+                   and date == str(result[DataObject.modify_time]).split(" ")[0].split('-'):
+                    continue
+                rfile = {}
+                date = str(result[DataObject.modify_time]).split(" ")[0].split('-')
+                rfile['permissions'] = "-rwxr-xr-x"
+                rfile['size'] = int(result[DataObject.size])
+                rfile['month'] = int(date[1])
+                rfile['day'] = int(date[2])
+                rfile['year'] = int(date[0])
+                rfile['name'] = str(result[DataObject.name])
+                rfiles.append(rfile)
+        except Exception as e:
+            msg = 'Error while listing ' + self.remote_dir + ' - ' + repr(e)
+            self.logger.error(msg)
+            raise e
+        finally:
+            self.session.cleanup()
         return (rfiles, rdirs)
 
-    def _download(self, file_dir, rfile):
+    def _network_configuration(self):
+        self.session = iRODSSession(host=self.server, port=self.port,
+                                    user=self.user, password=self.password,
+                                    zone=self.zone)
+
+    def _download(self, file_path, rfile):
         error = False
         self.logger.debug('IRODS:IRODS DOWNLOAD')
-        session = iRODSSession(host=self.server, port=self.port,
-                               user=self.user, password=self.password,
-                               zone=self.zone)
         try:
             # iRODS don't like multiple "/"
             if rfile['root'][-1] == "/":
@@ -74,10 +85,14 @@ class IRODSDownload(DownloadInterface):
                 file_to_get = rfile['root'] + "/" + rfile['name']
             # Write the file to download in the wanted file_dir with the
             # python-irods iget
-            session.data_objects.get(file_to_get, file_dir)
+            self.session.data_objects.get(file_to_get, file_path)
         except iRODSException as e:
             error = True
             self.logger.error(self.__class__.__name__ + ":Download:Error:Can't get irods object " + file_to_get)
             self.logger.error(self.__class__.__name__ + ":Download:Error:" + repr(e))
-        session.cleanup()
-        return(error)
+
+        if error:
+            return error
+
+        # Our part is done so call parent _download
+        return super(IRODSDownload, self)._download(file_path, rfile)


=====================================
biomaj_download/download/rsync.py
=====================================
@@ -2,7 +2,6 @@
 # standard_library.install_aliases()
 # from builtins import str
 import re
-import os
 import subprocess
 
 from biomaj_download.download.interface import DownloadInterface
@@ -33,12 +32,6 @@ class RSYNCDownload(DownloadInterface):
         else:
             self.server = None
             self.rootdir = server
-        # give a working directory to run rsync
-        if self.local_mode:
-            try:
-                os.chdir(self.rootdir)
-            except TypeError:
-                self.logger.error("RSYNC:Could not find local dir " + self.rootdir)
 
     def _append_file_to_download(self, rfile):
         if 'root' not in rfile or not rfile['root']:
@@ -51,7 +44,14 @@ class RSYNCDownload(DownloadInterface):
         url = rfile['root'] + "/" + rfile['name']
         if not self.local_mode:
             url = self.server + ":" + url
-        return url
+        return re.sub("/{2,}", "/", url)
+
+    def _network_configuration(self):
+        '''
+        Perform some configuration before network operations (list and
+        download).
+        '''
+        pass
 
     def _download(self, file_path, rfile):
         error = False
@@ -62,7 +62,7 @@ class RSYNCDownload(DownloadInterface):
             cmd = str(self.real_protocol) + " " + str(self.credentials) + "@" + url + " " + str(file_path)
         else:
             cmd = str(self.real_protocol) + " " + url + " " + str(file_path)
-        self.logger.debug('RSYNC:RSYNC DOwNLOAD:' + cmd)
+        self.logger.debug('RSYNC:RSYNC DOWNLOAD:' + cmd)
         # Launch the command (we are in offline_dir)
         try:
             p = subprocess.Popen(cmd, stdin=subprocess.PIPE, stderr=subprocess.PIPE, stdout=subprocess.PIPE, shell=True)
@@ -75,7 +75,11 @@ class RSYNCDownload(DownloadInterface):
         if err_code != 0:
             self.logger.error('Error while downloading ' + rfile["name"] + ' - ' + str(err_code))
             error = True
-        return(error)
+        if error:
+            return error
+
+        # Our part is done so call parent _download
+        return super(RSYNCDownload, self)._download(file_path, rfile)
 
     def test_stderr_rsync_error(self, stderr):
         stderr = str(stderr.decode('utf-8'))
@@ -105,18 +109,21 @@ class RSYNCDownload(DownloadInterface):
             remote = str(self.server) + ":" + str(self.rootdir) + str(directory)
         if self.credentials:
             remote = str(self.credentials) + "@" + remote
-        cmd = str(self.real_protocol) + " --list-only " + remote
+        cmd = str(self.real_protocol) + " --list-only --no-motd " + remote
         try:
             p = subprocess.Popen(cmd, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
             list_rsync, err = p.communicate()
             self.test_stderr_rsync_message(err)
             self.test_stderr_rsync_error(err)
             err_code = p.returncode
+            if err_code != 0:
+                msg = 'Error while listing ' + remote + ' - ' + str(err_code)
+                self.logger.error(msg)
+                raise Exception(msg)
         except ExceptionRsync as e:
-            self.logger.error("RsyncError:" + str(e))
-        if err_code != 0:
-            self.logger.error('Error while listing ' + str(err_code))
-            return(rfiles, rdirs)
+            msg = 'Error while listing ' + remote + ' - ' + str(e)
+            self.logger.error(msg)
+            raise e
         list_rsync = str(list_rsync.decode('utf-8'))
         lines = list_rsync.rstrip().split("\n")
         for line in lines:


=====================================
biomaj_download/downloadclient.py
=====================================
@@ -3,17 +3,13 @@ import requests
 import logging
 import uuid
 import time
-import sys
+from queue import Queue
+
 import pika
 
 from biomaj_download.download.downloadthreads import DownloadThread
 from biomaj_download.message import downmessage_pb2
 
-if sys.version_info[0] < 3:
-    from Queue import Queue
-else:
-    from queue import Queue
-
 
 class DownloadClient(DownloadService):
 


=====================================
biomaj_download/downloadservice.py
=====================================
@@ -85,7 +85,7 @@ class DownloadService(object):
         self.bank = None
         self.download_callback = None
         with open(config_file, 'r') as ymlfile:
-            self.config = yaml.load(ymlfile)
+            self.config = yaml.load(ymlfile, Loader=yaml.FullLoader)
             Utils.service_config_override(self.config)
 
         Zipkin.set_config(self.config)


=====================================
requirements.txt
=====================================
@@ -15,3 +15,5 @@ biomaj_zipkin
 flake8
 humanfriendly
 python-irodsclient
+simpleeval
+tenacity


=====================================
setup.py
=====================================
@@ -22,8 +22,8 @@ config = {
     'url': 'http://biomaj.genouest.org',
     'download_url': 'http://biomaj.genouest.org',
     'author_email': 'olivier.sallou at irisa.fr',
-    'version': '3.1.2',
-     'classifiers': [
+    'version': '3.2.4',
+    'classifiers': [
         # How mature is this project? Common values are
         #   3 - Alpha
         #   4 - Beta
@@ -39,14 +39,17 @@ config = {
         'License :: OSI Approved :: GNU Affero General Public License v3 or later (AGPLv3+)',
         # Specify the Python versions you support here. In particular, ensure
         # that you indicate whether you support Python 2, Python 3 or both.
-        'Programming Language :: Python :: 3',
-        'Programming Language :: Python :: 3.4'
+        'Programming Language :: Python :: 3 :: Only',
+        'Programming Language :: Python :: 3.6'
     ],
+    'python_requires': '>=3.6, <4',
     'install_requires': [
                          'biomaj_core',
                          'biomaj_zipkin',
                          'pycurl',
                          'ftputil',
+                         'tenacity',
+                         'simpleeval',
                          'py-bcrypt',
                          'pika==0.13.0',
                          'redis',


=====================================
tests/biomaj_tests.py
=====================================
@@ -1,5 +1,9 @@
 """
 Note that attributes 'network' and 'local_irods' are ignored for CI.
+
+To run 'local_irods' tests, you need an iRODS server on localhost (default port,
+user 'rods', password 'rods') and a zone /tempZone/home/rods. See
+UtilsForLocalIRODSTest.
 """
 from nose.plugins.attrib import attr
 
@@ -12,8 +16,11 @@ import stat
 
 from mock import patch
 
+from irods.session import iRODSSession
+
 from biomaj_core.config import BiomajConfig
 from biomaj_core.utils import Utils
+from biomaj_download.download.interface import DownloadInterface
 from biomaj_download.download.curl import CurlDownload, HTTPParse
 from biomaj_download.download.direct import DirectFTPDownload, DirectHTTPDownload
 from biomaj_download.download.localcopy  import LocalDownload
@@ -21,6 +28,8 @@ from biomaj_download.download.rsync import RSYNCDownload
 from biomaj_download.download.protocolirods import IRODSDownload
 
 import unittest
+import tenacity
+
 
 class UtilsForTest():
   """
@@ -63,6 +72,11 @@ class UtilsForTest():
     if self.bank_properties is None:
       self.__copy_test_bank_properties()
 
+    # Create an invalid archive file (empty file). This is deleted by clean().
+    # See TestBiomajRSYNCDownload.test_rsync_download_skip_check_uncompress.
+    self.invalid_archive = os.path.join(self.test_dir, 'invalid.gz')
+    open(self.invalid_archive, 'w').close()
+
   def clean(self):
     """
     Deletes temp directory
@@ -72,13 +86,15 @@ class UtilsForTest():
   def __copy_test_bank_properties(self):
     if self.bank_properties is not None:
       return
-    self.bank_properties = ['alu', 'local', 'testhttp','directhttp']
+    # Copy bank configuration (those bank use external resources so there is no tuning to do)
+    self.bank_properties = ['alu', 'testhttp', 'directhttp', 'multi']
     curdir = os.path.dirname(os.path.realpath(__file__))
     for b in self.bank_properties:
         from_file = os.path.join(curdir, b+'.properties')
         to_file = os.path.join(self.conf_dir, b+'.properties')
         shutil.copyfile(from_file, to_file)
 
+    # Copy bank process
     self.bank_process = ['test.sh']
     curdir = os.path.dirname(os.path.realpath(__file__))
     procdir = os.path.join(curdir, 'bank/process')
@@ -88,11 +104,11 @@ class UtilsForTest():
       shutil.copyfile(from_file, to_file)
       os.chmod(to_file, stat.S_IRWXU)
 
-    # Manage local bank test, use bank test subdir as remote
-    properties = ['multi.properties', 'computederror.properties', 'error.properties', 'local.properties', 'localprocess.properties', 'testhttp.properties', 'computed.properties', 'computed2.properties', 'sub1.properties', 'sub2.properties']
+    # Copy and adapt bank configuration that use local resources: we use the "bank" dir in current test directory as remote
+    properties = ['local', 'localprocess', 'computed', 'computed2', 'sub1', 'sub2', 'computederror', 'error']
     for prop in properties:
-      from_file = os.path.join(curdir, prop)
-      to_file = os.path.join(self.conf_dir, prop)
+      from_file = os.path.join(curdir, prop+'.properties')
+      to_file = os.path.join(self.conf_dir, prop+'.properties')
       fout = open(to_file,'w')
       with open(from_file,'r') as fin:
         for line in fin:
@@ -111,6 +127,7 @@ class UtilsForTest():
     curdir = os.path.dirname(os.path.realpath(__file__))
     global_template = os.path.join(curdir,'global.properties')
     fout = open(self.global_properties,'w')
+    # Adapt directories in global configuration to the current test directory
     with open(global_template,'r') as fin:
         for line in fin:
           if line.startswith('conf.dir'):
@@ -128,44 +145,79 @@ class UtilsForTest():
     fout.close()
 
 
-class TestBiomajUtils(unittest.TestCase):
-
-  def setUp(self):
-    self.utils = UtilsForTest()
+class UtilsForLocalIRODSTest(UtilsForTest):
+    """
+    This class is used to prepare 'local_irods' tests.
+    """
+    SERVER = "localhost"
+    PORT = 1247
+    ZONE = "tempZone"
+    USER = "rods"
+    PASSWORD = "rods"
+    COLLECTION = os.path.join("/" + ZONE, "home/rods/")  # Don't remove or add /
 
-  def tearDown(self):
-    self.utils.clean()
+    def __init__(self):
+        super(UtilsForLocalIRODSTest, self).__init__()
+        self._session = iRODSSession(host=self.SERVER, port=self.PORT,
+                                     user=self.USER, password=self.PASSWORD,
+                                     zone=self.ZONE)
+        self.curdir = os.path.dirname(os.path.realpath(__file__))
+        # Copy some valid archives (bank/test.fasta.gz)
+        file_ = os.path.join(self.curdir, "bank/test.fasta.gz")
+        self._session.data_objects.put(file_, self.COLLECTION)
+        # Copy invalid.gz
+        self._session.data_objects.put(self.invalid_archive, self.COLLECTION)
+
+    def clean(self):
+        super(UtilsForLocalIRODSTest, self).clean()
+        # Remove files on iRODS (use force otherwise the files are put in trash)
+        # Remove test.fasta.gz
+        self._session.data_objects.unlink(os.path.join(self.COLLECTION, "test.fasta.gz"), force=True)
+        # Remove invalid.gz
+        self._session.data_objects.unlink(os.path.join(self.COLLECTION, "invalid.gz"), force=True)
+
+
+class TestDownloadInterface(unittest.TestCase):
+  """
+  Test of the interface.
+  """
 
+  def test_retry_parsing(self):
+    """
+    Test parsing of stop and wait conditions.
+    """
+    downloader = DownloadInterface()
+    # Test some garbage
+    d = dict(stop_condition="stop_after_attempts")  # no param
+    self.assertRaises(ValueError, downloader.set_options, d)
+    d = dict(stop_condition="1 & 1")  # not a stop_condition
+    self.assertRaises(ValueError, downloader.set_options, d)
+    d = dict(stop_condition="stop_after_attempts(5) & 1")  # not a stop_condition
+    self.assertRaises(ValueError, downloader.set_options, d)
+    # Test some garbage
+    d = dict(wait_policy="wait_random")  # no param
+    self.assertRaises(ValueError, downloader.set_options, d)
+    d = dict(wait_policy="I love python")  # not a wait_condition
+    self.assertRaises(ValueError, downloader.set_options, d)
+    d = dict(wait_policy="wait_random(5) + 3")  # not a wait_condition
+    self.assertRaises(ValueError, downloader.set_options, d)
+    # Test operators
+    d = dict(stop_condition="stop_never | stop_after_attempt(5)",
+             wait_policy="wait_none + wait_random(1, 2)")
+    downloader.set_options(d)
+    # Test wait_combine, wait_chain
+    d = dict(wait_policy="wait_combine(wait_fixed(3), wait_random(1, 2))")
+    downloader.set_options(d)
+    d = dict(wait_policy="wait_chain(wait_fixed(3), wait_random(1, 2))")
+    downloader.set_options(d)
+    # Test stop_any and stop_all
+    stop_condition = "stop_any(stop_after_attempt(5), stop_after_delay(10))"
+    d = dict(stop_condition=stop_condition)
+    downloader.set_options(d)
+    stop_condition = "stop_all(stop_after_attempt(5), stop_after_delay(10))"
+    d = dict(stop_condition=stop_condition)
+    downloader.set_options(d)
 
-  def test_mimes(self):
-    fasta_file = os.path.join(os.path.dirname(os.path.realpath(__file__)),'bank/test2.fasta')
-    (mime, encoding) = Utils.detect_format(fasta_file)
-    self.assertTrue('application/fasta' == mime)
-
-  @attr('compress')
-  def test_uncompress(self):
-    from_file = { 'root': os.path.dirname(os.path.realpath(__file__)),
-                  'name': 'bank/test.fasta.gz'
-                  }
-
-    to_dir = self.utils.data_dir
-    Utils.copy_files([from_file], to_dir)
-    Utils.uncompress(os.path.join(to_dir, from_file['name']))
-    self.assertTrue(os.path.exists(to_dir+'/bank/test.fasta'))
-
-  def test_copy_with_regexp(self):
-    from_dir = os.path.dirname(os.path.realpath(__file__))
-    to_dir = self.utils.data_dir
-    Utils.copy_files_with_regexp(from_dir, to_dir, ['.*\.py'])
-    self.assertTrue(os.path.exists(to_dir+'/biomaj_tests.py'))
-
-  def test_copy(self):
-    from_dir = os.path.dirname(os.path.realpath(__file__))
-    local_file = 'biomaj_tests.py'
-    files_to_copy = [ {'root': from_dir, 'name': local_file}]
-    to_dir = self.utils.data_dir
-    Utils.copy_files(files_to_copy, to_dir)
-    self.assertTrue(os.path.exists(to_dir+'/biomaj_tests.py'))
 
 class TestBiomajLocalDownload(unittest.TestCase):
   """
@@ -190,6 +242,16 @@ class TestBiomajLocalDownload(unittest.TestCase):
     locald.close()
     self.assertTrue(len(file_list) > 1)
 
+  def test_local_list_error(self):
+    locald = LocalDownload("/tmp/foo/")
+    # Check that we raise an exception and log a message
+    with self.assertLogs(logger="biomaj", level="ERROR") as cm:
+      with self.assertRaises(Exception):
+        (file_list, dir_list) = locald.list()
+      # Test log message format (we assume that there is only 1 message)
+      self.assertRegex(cm.output[0], "Error while listing")
+    locald.close()
+
   def test_local_download(self):
     locald = LocalDownload(self.examples)
     (file_list, dir_list) = locald.list()
@@ -231,6 +293,7 @@ class TestBiomajLocalDownload(unittest.TestCase):
       msg = "In %s: copy worked but hardlinks were not used." % self.id()
       logging.info(msg)
 
+
 @attr('network')
 @attr('http')
 class TestBiomajHTTPDownload(unittest.TestCase):
@@ -240,14 +303,16 @@ class TestBiomajHTTPDownload(unittest.TestCase):
   def setUp(self):
     self.utils = UtilsForTest()
     BiomajConfig.load_config(self.utils.global_properties, allow_user_config=False)
+    # Create an HTTPParse object used for most tests from the config file testhttp
     self.config = BiomajConfig('testhttp')
-    self.http_parse = HTTPParse(self.config.get('http.parse.dir.line'),
+    self.http_parse = HTTPParse(
+        self.config.get('http.parse.dir.line'),
         self.config.get('http.parse.file.line'),
         int(self.config.get('http.group.dir.name')),
         int(self.config.get('http.group.dir.date')),
         int(self.config.get('http.group.file.name')),
         int(self.config.get('http.group.file.date')),
-        self.config.get('http.group.file.date_format', None),
+        self.config.get('http.group.file.date_format'),
         int(self.config.get('http.group.file.size'))
     )
 
@@ -260,26 +325,37 @@ class TestBiomajHTTPDownload(unittest.TestCase):
     httpd.close()
     self.assertTrue(len(file_list) == 1)
 
+  def test_http_list_error(self):
+    """
+    Test that errors in list are correctly caught.
+    """
+    # Test access to non-existent directory
+    httpd = CurlDownload('http', 'ftp2.fr.debian.org', '/debian/dists/foo/', self.http_parse)
+    # Check that we raise an exception and log a message
+    with self.assertLogs(logger="biomaj", level="ERROR") as cm:
+      with self.assertRaises(Exception):
+        (file_list, dir_list) = httpd.list()
+      # Test log message format (we assume that there is only 1 message)
+      self.assertRegex(cm.output[0], "Error while listing")
+
   def test_http_list_dateregexp(self):
-    #self.http_parse.file_date_format = "%%d-%%b-%%Y %%H:%%M"
-    self.http_parse.file_date_format = "%%Y-%%m-%%d %%H:%%M"
     httpd = CurlDownload('http', 'ftp2.fr.debian.org', '/debian/dists/', self.http_parse)
     (file_list, dir_list) = httpd.list()
     httpd.close()
     self.assertTrue(len(file_list) == 1)
 
   def test_http_download_no_size(self):
-    self.http_parse = HTTPParse(self.config.get('http.parse.dir.line'),
+    # Create a custom http_parse without size
+    http_parse = HTTPParse(self.config.get('http.parse.dir.line'),
         self.config.get('http.parse.file.line'),
         int(self.config.get('http.group.dir.name')),
         int(self.config.get('http.group.dir.date')),
         int(self.config.get('http.group.file.name')),
         int(self.config.get('http.group.file.date')),
-        self.config.get('http.group.file.date_format', None),
+        self.config.get('http.group.file.date_format'),
         -1
     )
-    self.http_parse.file_date_format = "%%Y-%%m-%%d %%H:%%M"
-    httpd = CurlDownload('http', 'ftp2.fr.debian.org', '/debian/dists/', self.http_parse)
+    httpd = CurlDownload('http', 'ftp2.fr.debian.org', '/debian/dists/', http_parse)
     (file_list, dir_list) = httpd.list()
     httpd.match([r'^README$'], file_list, dir_list)
     httpd.download(self.utils.data_dir)
@@ -287,16 +363,17 @@ class TestBiomajHTTPDownload(unittest.TestCase):
     self.assertTrue(len(httpd.files_to_download) == 1)
 
   def test_http_download_no_date(self):
-    self.http_parse = HTTPParse(self.config.get('http.parse.dir.line'),
+    # Create a custom http_parse without date
+    http_parse = HTTPParse(self.config.get('http.parse.dir.line'),
         self.config.get('http.parse.file.line'),
         int(self.config.get('http.group.dir.name')),
         int(self.config.get('http.group.dir.date')),
         int(self.config.get('http.group.file.name')),
         -1,
-        self.config.get('http.group.file.date_format', None),
+        None,
         int(self.config.get('http.group.file.size'))
     )
-    httpd = CurlDownload('http', 'ftp2.fr.debian.org', '/debian/dists/', self.http_parse)
+    httpd = CurlDownload('http', 'ftp2.fr.debian.org', '/debian/dists/', http_parse)
     (file_list, dir_list) = httpd.list()
     httpd.match([r'^README$'], file_list, dir_list)
     httpd.download(self.utils.data_dir)
@@ -304,7 +381,6 @@ class TestBiomajHTTPDownload(unittest.TestCase):
     self.assertTrue(len(httpd.files_to_download) == 1)
 
   def test_http_download(self):
-    self.http_parse.file_date_format = "%%Y-%%m-%%d %%H:%%M"
     httpd = CurlDownload('http', 'ftp2.fr.debian.org', '/debian/dists/', self.http_parse)
     (file_list, dir_list) = httpd.list()
     print(str(file_list))
@@ -314,7 +390,6 @@ class TestBiomajHTTPDownload(unittest.TestCase):
     self.assertTrue(len(httpd.files_to_download) == 1)
 
   def test_http_download_in_subdir(self):
-    self.http_parse.file_date_format = "%%Y-%%m-%%d %%H:%%M"
     httpd = CurlDownload('http', 'ftp2.fr.debian.org', '/debian/', self.http_parse)
     (file_list, dir_list) = httpd.list()
     httpd.match([r'^dists/README$'], file_list, dir_list)
@@ -322,6 +397,41 @@ class TestBiomajHTTPDownload(unittest.TestCase):
     httpd.close()
     self.assertTrue(len(httpd.files_to_download) == 1)
 
+  def test_redirection(self):
+    """
+    Test HTTP redirections
+    """
+    # The site used in this test redirects to https (see #33).
+    http_parse = HTTPParse(
+        r'<img[\s]+src="[\S]+"[\s]+alt="\[[\s]+\]"[\s]*/?>[\s]<a[\s]+href="([\S]+)".*([\d]{4}-[\d]{2}-[\d]{2}\s[\d]{2}:[\d]{2})[\s]+-',
+        r'<img[\s]+src="[\S]+"[\s]+alt="\[[\s]+\]"[\s]*/?>[\s]<a[\s]+href="([\S]+)".*([\d]{4}-[\d]{2}-[\d]{2}\s[\d]{2}:[\d]{2})[\s]+([\d]+)',
+        1,
+        2,
+        1,
+        2,
+        "%%Y-%%m-%%d %%H:%%M",
+        3
+    )
+    # First test: allow redirections
+    httpd = CurlDownload('http', 'plasmodb.org', '/common/downloads/Current_Release/', http_parse)
+    (file_list, dir_list) = httpd.list()
+    httpd.match([r'^Build_number$'], file_list, dir_list)
+    # Check that we have been redirected to HTTPS by inspecting logs
+    with self.assertLogs(logger="biomaj", level="INFO") as cm:
+      httpd.download(self.utils.data_dir)
+      # Test log message format (we assume that there is only 1 message)
+      self.assertRegex(cm.output[0], "Download was redirected to https://")
+    httpd.close()
+    self.assertTrue(len(httpd.files_to_download) == 1)
+    # Second test: disable redirections (hence listing fails)
+    httpd = CurlDownload('http', 'plasmodb.org', '/common/downloads/Current_Release/', http_parse)
+    httpd.set_options({
+      "allow_redirections": False
+    })
+    with self.assertRaises(Exception):
+      (file_list, dir_list) = httpd.list()
+    httpd.close()
+
 
 @attr('network')
 @attr('https')
@@ -337,19 +447,17 @@ class TestBiomajHTTPSDownload(unittest.TestCase):
     self.utils.clean()
 
   def test_download(self):
-    self.utils = UtilsForTest()
-    self.http_parse = HTTPParse(
+    http_parse = HTTPParse(
         "<a[\s]+href=\"([\w\-\.]+\">[\w\-\.]+.tar.gz)<\/a>[\s]+([0-9]{2}-[A-Za-z]{3}-[0-9]{4}[\s][0-9]{2}:[0-9]{2})[\s]+([0-9]+[A-Za-z])",
         "<a[\s]+href=\"[\w\-\.]+\">([\w\-\.]+.tar.gz)<\/a>[\s]+([0-9]{2}-[A-Za-z]{3}-[0-9]{4}[\s][0-9]{2}:[0-9]{2})[\s]+([0-9]+[A-Za-z])",
         1,
         2,
         1,
         2,
-        None,
+        "%%d-%%b-%%Y %%H:%%M",
         3
     )
-    self.http_parse.file_date_format = "%%d-%%b-%%Y %%H:%%M"
-    httpd = CurlDownload('https', 'mirrors.edge.kernel.org', '/pub/software/scm/git/debian/', self.http_parse)
+    httpd = CurlDownload('https', 'mirrors.edge.kernel.org', '/pub/software/scm/git/debian/', http_parse)
     (file_list, dir_list) = httpd.list()
     httpd.match([r'^git-core-0.99.6.tar.gz$'], file_list, dir_list)
     httpd.download(self.utils.data_dir)
@@ -364,17 +472,48 @@ class TestBiomajSFTPDownload(unittest.TestCase):
   Test SFTP downloader
   """
 
-  PROTOCOL = "ftps"
+  PROTOCOL = "sftp"
 
   def setUp(self):
     self.utils = UtilsForTest()
+    # Temporary host key file in test dir (so this is cleaned)
+    (_, self.khfile) = tempfile.mkstemp(dir=self.utils.test_dir)
 
   def tearDown(self):
     self.utils.clean()
 
+  def test_list_error(self):
+    """
+    Test that errors in list are correctly caught.
+    """
+    # Test access to non-existent directory
+    sftpd = CurlDownload(self.PROTOCOL, "test.rebex.net", "/toto")
+    sftpd.set_credentials("demo:password")
+    # Check that we raise an exception and log a message
+    with self.assertLogs(logger="biomaj", level="ERROR") as cm:
+      with self.assertRaises(Exception):
+        (file_list, dir_list) = sftpd.list()
+      # Test log message format (we assume that there is only 1 message)
+      self.assertRegex(cm.output[0], "Error while listing")
+    sftpd.close()
+    # Test with wrong password
+    sftpd = CurlDownload(self.PROTOCOL, "test.rebex.net", "/")
+    sftpd.set_credentials("demo:badpassword")
+    # Check that we raise an exception and log a message
+    with self.assertLogs(logger="biomaj", level="ERROR") as cm:
+      with self.assertRaises(Exception):
+        (file_list, dir_list) = sftpd.list()
+      # Test log message format (we assume that there is only 1 message)
+      self.assertRegex(cm.output[0], "Error while listing")
+    sftpd.close()
+
   def test_download(self):
     sftpd = CurlDownload(self.PROTOCOL, "test.rebex.net", "/")
     sftpd.set_credentials("demo:password")
+    sftpd.set_options({
+        "ssh_hosts_file": self.khfile,
+        "ssh_new_host": "add"
+    })
     (file_list, dir_list) = sftpd.list()
     sftpd.match([r'^readme.txt$'], file_list, dir_list)
     sftpd.download(self.utils.data_dir)
@@ -403,6 +542,22 @@ class TestBiomajDirectFTPDownload(unittest.TestCase):
     ftpd.close()
     self.assertTrue(len(file_list) == 1)
 
+  def test_ftp_list_error(self):
+    """
+    Test that errors in list are correctly caught.
+    """
+    # Test access to non-existent directory
+    file_list = ['/toto/debian/doc/mailing-lists.txt']
+    ftpd = DirectFTPDownload('ftp', 'ftp.fr.debian.org', '')
+    ftpd.set_files_to_download(file_list)
+    # Check that we raise an exception and log a message
+    with self.assertLogs(logger="biomaj", level="ERROR") as cm:
+      with self.assertRaises(Exception):
+        (file_list, dir_list) = ftpd.list()
+      # Test log message format (we assume that there is only 1 message)
+      self.assertRegex(cm.output[0], "Error while listing")
+    ftpd.close()
+
   def test_download(self):
     file_list = ['/debian/doc/mailing-lists.txt']
     ftpd = DirectFTPDownload('ftp', 'ftp.fr.debian.org', '')
@@ -413,7 +568,6 @@ class TestBiomajDirectFTPDownload(unittest.TestCase):
     self.assertTrue(os.path.exists(os.path.join(self.utils.data_dir,'mailing-lists.txt')))
 
 
-
 @attr('directftps')
 @attr('network')
 class TestBiomajDirectFTPSDownload(unittest.TestCase):
@@ -473,6 +627,22 @@ class TestBiomajDirectHTTPDownload(unittest.TestCase):
     self.assertTrue(file_list[0]['size']!=0)
     self.assertFalse(fyear == ftpd.files_to_download[0]['year'] and fmonth == ftpd.files_to_download[0]['month'] and fday == ftpd.files_to_download[0]['day'])
 
+  def test_http_list_error(self):
+    """
+    Test that errors in list are correctly caught.
+    """
+    # Test access to non-existent directory
+    file_list = ['/toto/debian/README.html']
+    ftpd = DirectHTTPDownload('http', 'ftp2.fr.debian.org', '')
+    ftpd.set_files_to_download(file_list)
+    # Check that we raise an exception and log a message
+    with self.assertLogs(logger="biomaj", level="ERROR") as cm:
+      with self.assertRaises(Exception):
+        (file_list, dir_list) = ftpd.list()
+      # Test log message format (we assume that there is only 1 message)
+      self.assertRegex(cm.output[0], "Error while listing")
+    ftpd.close()
+
   def test_download(self):
     file_list = ['/debian/README.html']
     ftpd = DirectHTTPDownload('http', 'ftp2.fr.debian.org', '')
@@ -497,7 +667,6 @@ class TestBiomajDirectHTTPDownload(unittest.TestCase):
       my_json = json.loads(content)
       self.assertTrue(my_json['args']['key1'] == 'value1')
 
-  @attr('test')
   def test_download_save_as(self):
     file_list = ['/debian/README.html']
     ftpd = DirectHTTPDownload('http', 'ftp2.fr.debian.org', '')
@@ -524,6 +693,32 @@ class TestBiomajDirectHTTPDownload(unittest.TestCase):
       content = content_file.read()
       my_json = json.loads(content)
       self.assertTrue(my_json['form']['key1'] == 'value1')
+      
+  def test_redirection(self):
+    """
+    Test HTTP redirections
+    """
+    # The site used in this test redirects to https (see #33).
+    # First test: allow redirections
+    httpd = DirectHTTPDownload('http', 'plasmodb.org', '/common/downloads/Current_Release/')
+    httpd.set_files_to_download(['Build_number'])
+    httpd.download(self.utils.data_dir)
+    # Check that we have been redirected to HTTPS by inspecting logs
+    with self.assertLogs(logger="biomaj", level="INFO") as cm:
+      httpd.download(self.utils.data_dir)
+       # Test log message format (we assume that there is only 1 message)
+      self.assertRegex(cm.output[0], "Download was redirected to https://")
+    httpd.close()
+    self.assertTrue(len(httpd.files_to_download) == 1)
+    # Second test: block redirections
+    httpd = DirectHTTPDownload('http', 'plasmodb.org', '/common/downloads/Current_Release/')
+    httpd.set_files_to_download(['Build_number'])
+    httpd.set_options({
+      "allow_redirections": False
+    })
+    with self.assertRaises(Exception):
+      httpd.download(self.utils.data_dir)
+    httpd.close()
 
 
 @attr('ftp')
@@ -545,7 +740,31 @@ class TestBiomajFTPDownload(unittest.TestCase):
     ftpd.close()
     self.assertTrue(len(file_list) > 1)
 
-  @attr('test')
+  def test_ftp_list_error(self):
+    """
+    Test that errors in list are correctly caught.
+    """
+    # Test access to non-existent directory
+    ftpd = CurlDownload("ftp", "test.rebex.net", "/toto")
+    ftpd.set_credentials("demo:password")
+    # Check that we raise an exception and log a message
+    with self.assertLogs(logger="biomaj", level="ERROR") as cm:
+      with self.assertRaises(Exception):
+        (file_list, dir_list) = ftpd.list()
+      # Test log message format (we assume that there is only 1 message)
+      self.assertRegex(cm.output[0], "Error while listing")
+    ftpd.close()
+    # Test with wrong password
+    ftpd = CurlDownload("ftp", "test.rebex.net", "/")
+    ftpd.set_credentials("demo:badpassword")
+    # Check that we raise an exception and log a message
+    with self.assertLogs(logger="biomaj", level="ERROR") as cm:
+      with self.assertRaises(Exception):
+        (file_list, dir_list) = ftpd.list()
+      # Test log message format (we assume that there is only 1 message)
+      self.assertRegex(cm.output[0], "Error while listing")
+    ftpd.close()
+
   def test_download(self):
     ftpd = CurlDownload('ftp', 'speedtest.tele2.net', '/')
     (file_list, dir_list) = ftpd.list()
@@ -604,6 +823,21 @@ class TestBiomajFTPDownload(unittest.TestCase):
     self.assertTrue(len(ftpd.files_to_download)==2)
     self.assertTrue(len(ftpd.files_to_copy)==2)
 
+  @attr('test')
+  def test_download_or_copy_directhttp(self):
+    ftpd = DirectHTTPDownload('https', 'ftp.fr.debian.org', '/debian/')
+    ftpd.files_to_download = [
+          {'name':'/test1', 'year': '2013', 'month': '11', 'day': '10', 'size': 10},
+    ]
+    available_files = [
+        {'name':'/test1', 'year': '2020', 'month': '11', 'day': '10', 'size': 10},
+      # {"root": "/", "permissions": "", "group": "", "user": "", "size": 23723408, "month": 6, "day": 19, "year": 2018, "name": "/common/downloads/release-38/Pfalciparum3D7/fasta/data/PlasmoDB-38_Pfalciparum3D7_Genome.fasta", "hash": "e58669a71eacff7a9dcceed04a8ecdd1", "save_as": "PlasmoDB-38_Pfalciparum3D7_Genome.fasta", "url": "https://plasmodb.org"}
+    ]
+    ftpd.download_or_copy(available_files, '/biomaj', False)
+    ftpd.close()
+    self.assertTrue(len(ftpd.files_to_download)==1)
+    self.assertTrue(len(ftpd.files_to_copy)==0)
+
   def test_get_more_recent_file(self):
     files = [
           {'name':'/test1', 'year': '2013', 'month': '11', 'day': '10', 'size': 10},
@@ -616,14 +850,47 @@ class TestBiomajFTPDownload(unittest.TestCase):
     self.assertTrue(release['month']=='11')
     self.assertTrue(release['day']=='12')
 
+  def test_download_retry(self):
+    """
+    Try to download fake files to test retry.
+    """
+    n_attempts = 5
+    ftpd = CurlDownload("ftp", "speedtest.tele2.net", "/")
+    # Download a fake file
+    ftpd.set_files_to_download([
+          {'name': 'TOTO.zip', 'year': '2016', 'month': '02', 'day': '19',
+           'size': 1, 'save_as': 'TOTO1KB'}
+    ])
+    ftpd.set_options(dict(stop_condition=tenacity.stop.stop_after_attempt(n_attempts),
+                          wait_condition=tenacity.wait.wait_none()))
+    self.assertRaisesRegex(
+        Exception, "^CurlDownload:Download:Error:",
+        ftpd.download, self.utils.data_dir,
+    )
+    logging.debug(ftpd.retryer.statistics)
+    self.assertTrue(len(ftpd.files_to_download) == 1)
+    self.assertTrue(ftpd.retryer.statistics["attempt_number"] == n_attempts)
+    # Try to download another file to ensure that it retryies
+    ftpd.set_files_to_download([
+          {'name': 'TITI.zip', 'year': '2016', 'month': '02', 'day': '19',
+           'size': 1, 'save_as': 'TOTO1KB'}
+    ])
+    self.assertRaisesRegex(
+        Exception, "^CurlDownload:Download:Error:",
+        ftpd.download, self.utils.data_dir,
+    )
+    self.assertTrue(len(ftpd.files_to_download) == 1)
+    self.assertTrue(ftpd.retryer.statistics["attempt_number"] == n_attempts)
+    ftpd.close()
+
   def test_ms_server(self):
-      ftpd = CurlDownload("ftp", "test.rebex.net", "/")
-      ftpd.set_credentials("demo:password")
-      (file_list, dir_list) = ftpd.list()
-      ftpd.match(["^readme.txt$"], file_list, dir_list)
-      ftpd.download(self.utils.data_dir)
-      ftpd.close()
-      self.assertTrue(len(ftpd.files_to_download) == 1)
+    ftpd = CurlDownload("ftp", "test.rebex.net", "/")
+    ftpd.set_credentials("demo:password")
+    (file_list, dir_list) = ftpd.list()
+    ftpd.match(["^readme.txt$"], file_list, dir_list)
+    ftpd.download(self.utils.data_dir)
+    ftpd.close()
+    self.assertTrue(len(ftpd.files_to_download) == 1)
 
   def test_download_tcp_keepalive(self):
       """
@@ -675,6 +942,31 @@ class TestBiomajFTPSDownload(unittest.TestCase):
     ftpd.close()
     self.assertTrue(len(file_list) == 1)
 
+  def test_ftps_list_error(self):
+    """
+    Test that errors in list are correctly caught.
+    """
+    # Test access to non-existent directory
+    ftpd = CurlDownload("ftps", "test.rebex.net", "/toto")
+    ftpd.set_credentials("demo:password")
+    # Check that we raise an exception and log a message
+    with self.assertLogs(logger="biomaj", level="ERROR") as cm:
+      with self.assertRaises(Exception):
+        (file_list, dir_list) = ftpd.list()
+      # Test log message format (we assume that there is only 1 message)
+      self.assertRegex(cm.output[0], "Error while listing")
+    ftpd.close()
+    # Test with wrong password
+    ftpd = CurlDownload("ftps", "test.rebex.net", "/")
+    ftpd.set_credentials("demo:badpassword")
+    # Check that we raise an exception and log a message
+    with self.assertLogs(logger="biomaj", level="ERROR") as cm:
+      with self.assertRaises(Exception):
+        (file_list, dir_list) = ftpd.list()
+      # Test log message format (we assume that there is only 1 message)
+      self.assertRegex(cm.output[0], "Error while listing")
+    ftpd.close()
+
   def test_download(self):
     ftpd = CurlDownload(self.PROTOCOL, "test.rebex.net", "/")
     ftpd.set_credentials("demo:password")
@@ -688,7 +980,7 @@ class TestBiomajFTPSDownload(unittest.TestCase):
     # This server is misconfigured hence we disable all SSL verification
     SERVER = "demo.wftpserver.com"
     DIRECTORY = "/download/"
-    CREDENTIALS = "demo-user:demo-user"
+    CREDENTIALS = "demo:demo"
     ftpd = CurlDownload(self.PROTOCOL, SERVER, DIRECTORY)
     ftpd.set_options(dict(ssl_verifyhost="False", ssl_verifypeer="False"))
     ftpd.set_credentials(CREDENTIALS)
@@ -700,7 +992,7 @@ class TestBiomajFTPSDownload(unittest.TestCase):
     # This server is misconfigured hence we disable all SSL verification
     SERVER = "demo.wftpserver.com"
     DIRECTORY = "/download/"
-    CREDENTIALS = "demo-user:demo-user"
+    CREDENTIALS = "demo:demo"
     ftpd = CurlDownload(self.PROTOCOL, SERVER, DIRECTORY)
     ftpd.set_options(dict(ssl_verifyhost="False", ssl_verifypeer="False"))
     ftpd.set_credentials(CREDENTIALS)
@@ -710,12 +1002,12 @@ class TestBiomajFTPSDownload(unittest.TestCase):
     ftpd.close()
     self.assertTrue(len(ftpd.files_to_download) == 1)
 
-  def test_download_ssl_certficate(self):
+  def test_download_ssl_certificate(self):
     # This server is misconfigured but we use its certificate
     # The hostname is wrong so we disable host verification
     SERVER = "demo.wftpserver.com"
     DIRECTORY = "/download/"
-    CREDENTIALS = "demo-user:demo-user"
+    CREDENTIALS = "demo:demo"
     ftpd = CurlDownload(self.PROTOCOL, SERVER, DIRECTORY)
     curdir = os.path.dirname(os.path.realpath(__file__))
     cert_file = os.path.join(curdir, "caert.demo.wftpserver.com.pem")
@@ -749,6 +1041,16 @@ class TestBiomajRSYNCDownload(unittest.TestCase):
         (files_list, dir_list) = rsyncd.list()
         self.assertTrue(len(files_list) != 0)
 
+    def test_rsync_list_error(self):
+        # Access a non-existent directory
+        rsyncd = RSYNCDownload("/tmp/foo/", "")
+        # Check that we raise an exception and log a message
+        with self.assertLogs(logger="biomaj", level="ERROR") as cm:
+            with self.assertRaises(Exception):
+                (file_list, dir_list) = rsyncd.list()
+            # Test log message format (we assume that there is only 1 message)
+            self.assertRegex(cm.output[0], "Error while listing")
+
     def test_rsync_match(self):
         rsyncd = RSYNCDownload(self.examples, "")
         (files_list, dir_list) = rsyncd.list()
@@ -786,6 +1088,51 @@ class TestBiomajRSYNCDownload(unittest.TestCase):
         rsyncd.download(self.utils.data_dir)
         self.assertTrue(len(rsyncd.files_to_download) == 3)
 
+    def test_rsync_download_skip_check_uncompress(self):
+        """
+        Download the fake archive file with RSYNC but skip check.
+        """
+        rsyncd = RSYNCDownload(self.utils.test_dir + '/', "")
+        rsyncd.set_options(dict(skip_check_uncompress=True))
+        (file_list, dir_list) = rsyncd.list()
+        rsyncd.match([r'invalid.gz'], file_list, dir_list, prefix='')
+        rsyncd.download(self.utils.data_dir)
+        self.assertTrue(len(rsyncd.files_to_download) == 1)
+
+    def test_rsync_download_retry(self):
+        """
+        Try to download fake files to test retry.
+        """
+        n_attempts = 5
+        rsyncd = RSYNCDownload(self.utils.test_dir + '/', "")
+        rsyncd.set_options(dict(skip_check_uncompress=True))
+        # Download a fake file
+        rsyncd.set_files_to_download([
+              {'name': 'TOTO.zip', 'year': '2016', 'month': '02', 'day': '19',
+               'size': 1, 'save_as': 'TOTO1KB'}
+        ])
+        rsyncd.set_options(dict(stop_condition=tenacity.stop.stop_after_attempt(n_attempts),
+                                wait_condition=tenacity.wait.wait_none()))
+        self.assertRaisesRegex(
+            Exception, "^RSYNCDownload:Download:Error:",
+            rsyncd.download, self.utils.data_dir,
+        )
+        logging.debug(rsyncd.retryer.statistics)
+        self.assertTrue(len(rsyncd.files_to_download) == 1)
+        self.assertTrue(rsyncd.retryer.statistics["attempt_number"] == n_attempts)
+        # Try to download another file to ensure that it retryies
+        rsyncd.set_files_to_download([
+              {'name': 'TITI.zip', 'year': '2016', 'month': '02', 'day': '19',
+               'size': 1, 'save_as': 'TOTO1KB'}
+        ])
+        self.assertRaisesRegex(
+            Exception, "^RSYNCDownload:Download:Error:",
+            rsyncd.download, self.utils.data_dir,
+        )
+        self.assertTrue(len(rsyncd.files_to_download) == 1)
+        self.assertTrue(rsyncd.retryer.statistics["attempt_number"] == n_attempts)
+        rsyncd.close()
+
 
 class iRodsResult(object):
 
@@ -858,6 +1205,7 @@ class MockiRODSSession(object):
         my_test_file = open("tests/test.fasta.gz", "r+")
         return(my_test_file)
 
+
 @attr('irods')
 @attr('roscoZone')
 @attr('network')
@@ -886,18 +1234,107 @@ class TestBiomajIRODSDownload(unittest.TestCase):
         (files_list, dir_list) = irodsd.list()
         self.assertTrue(len(files_list) != 0)
 
-    @attr('local_irods')
+
+ at attr('local_irods')
+ at attr('network')
+class TestBiomajLocalIRODSDownload(unittest.TestCase):
+    """
+    Test with a local iRODS server.
+    """
+
+    def setUp(self):
+        self.utils = UtilsForLocalIRODSTest()
+        self.curdir = os.path.dirname(os.path.realpath(__file__))
+        self.examples = os.path.join(self.curdir,'bank') + '/'
+        BiomajConfig.load_config(self.utils.global_properties, allow_user_config=False)
+
+    def tearDown(self):
+        self.utils.clean()
+
     def test_irods_download(self):
-        # To run this test, you need an iRODS server on localhost (default
-        # port, user 'rods', password 'rods'), and populate a zone
-        # /tempZone/home/rods with a file that matches r'^test.*\.gz$' (for
-        # instance, by copying tests/bank/test/test.fasta.gz).
-        irodsd = IRODSDownload("localhost", "/tempZone/home/rods")
+        irodsd = IRODSDownload(self.utils.SERVER, self.utils.COLLECTION)
         irodsd.set_param(dict(
-            user='rods',
-            password='rods',
+            user=self.utils.USER,
+            password=self.utils.PASSWORD,
         ))
         (file_list, dir_list) = irodsd.list()
         irodsd.match([r'^test.*\.gz$'], file_list, dir_list, prefix='')
         irodsd.download(self.utils.data_dir)
         self.assertTrue(len(irodsd.files_to_download) == 1)
+
+    def test_irods_download_skip_check_uncompress(self):
+        """
+        Download the fake archive file with iRODS but skip check.
+        """
+        irodsd = IRODSDownload(self.utils.SERVER, self.utils.COLLECTION)
+        irodsd.set_options(dict(skip_check_uncompress=True))
+        irodsd.set_param(dict(
+            user=self.utils.USER,
+            password=self.utils.PASSWORD,
+        ))
+        (file_list, dir_list) = irodsd.list()
+        irodsd.match([r'invalid.gz$'], file_list, dir_list, prefix='')
+        irodsd.download(self.utils.data_dir)
+        self.assertTrue(len(irodsd.files_to_download) == 1)
+
+    def test_irods_download_retry(self):
+        """
+        Try to download fake files to test retry.
+        """
+        n_attempts = 5
+        irodsd = IRODSDownload(self.utils.SERVER, self.utils.COLLECTION)
+        irodsd.set_options(dict(skip_check_uncompress=True))
+        irodsd.set_param(dict(
+            user=self.utils.USER,
+            password=self.utils.PASSWORD,
+        ))
+        # Download a fake file
+        irodsd.set_files_to_download([
+              {'name': 'TOTO.zip', 'year': '2016', 'month': '02', 'day': '19',
+               'size': 1, 'save_as': 'TOTO1KB'}
+        ])
+        irodsd.set_options(dict(stop_condition=tenacity.stop.stop_after_attempt(n_attempts),
+                                wait_condition=tenacity.wait.wait_none()))
+        self.assertRaisesRegex(
+            Exception, "^IRODSDownload:Download:Error:",
+            irodsd.download, self.utils.data_dir,
+        )
+        logging.debug(irodsd.retryer.statistics)
+        self.assertTrue(len(irodsd.files_to_download) == 1)
+        self.assertTrue(irodsd.retryer.statistics["attempt_number"] == n_attempts)
+        # Try to download another file to ensure that it retryies
+        irodsd.set_files_to_download([
+              {'name': 'TITI.zip', 'year': '2016', 'month': '02', 'day': '19',
+               'size': 1, 'save_as': 'TOTO1KB'}
+        ])
+        self.assertRaisesRegex(
+            Exception, "^IRODSDownload:Download:Error:",
+            irodsd.download, self.utils.data_dir,
+        )
+        self.assertTrue(len(irodsd.files_to_download) == 1)
+        self.assertTrue(irodsd.retryer.statistics["attempt_number"] == n_attempts)
+        irodsd.close()
+
+    def test_irods_list_error(self):
+        # Non-existing collection
+        irodsd = IRODSDownload(self.utils.SERVER, "fake_collection")
+        irodsd.set_param(dict(
+            user=self.utils.USER,
+            password=self.utils.PASSWORD,
+        ))
+        with self.assertLogs(logger="biomaj", level="ERROR") as cm:
+            with self.assertRaises(Exception):
+                (file_list, dir_list) = irodsd.list()
+            # Test log message format (we assume that there is only 1 message)
+            self.assertRegex(cm.output[0], "Error while listing")
+        # Test with wrong password
+        irodsd = IRODSDownload(self.utils.SERVER, self.utils.COLLECTION)
+        irodsd.set_param(dict(
+            user=self.utils.USER,
+            password="badpassword",
+        ))
+        with self.assertLogs(logger="biomaj", level="ERROR") as cm:
+            with self.assertRaises(Exception):
+                (file_list, dir_list) = irodsd.list()
+            # Test log message format (we assume that there is only 1 message)
+            self.assertRegex(cm.output[0], "Error while listing")


=====================================
tests/testhttp.properties
=====================================
@@ -38,7 +38,7 @@ db.post.process=
 
 http.parse.dir.line=<img[\s]+src="[\S]+"[\s]+alt="\[DIR\]"?.*<a[\s]+href="([\S]+)\/"[\s]*>.*([\d]{4}-[\w\d]{2,5}-[\d]{2}\s[\d]{2}:[\d]{2})
 http.parse.file.line=<img[\s]+src="[\S]+"[\s]+alt="\[[\s]+\]"[\s]*\/?><\/td><td><a[\s]+href="([\S]+)".*([\d]{4}-[\d]{2}-[\d]{2}\s[\d]{2}:[\d]{2}).*>([\d\.]+[MKG]{0,1})
-http.group.file.date_format="%%Y-%%m-%%d %%H:%%M"
+http.group.file.date_format=%%Y-%%m-%%d %%H:%%M
 ### Deployment ###
 
 keep.old.version=1



View it on GitLab: https://salsa.debian.org/med-team/biomaj3-download/-/commit/a2b7eff2398dec143609bc6f8e48c8ea505f272d

-- 
View it on GitLab: https://salsa.debian.org/med-team/biomaj3-download/-/commit/a2b7eff2398dec143609bc6f8e48c8ea505f272d
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20210117/a9b491db/attachment-0001.html>