PARTIALLY REMOVING MAXAGE (was: [PATCH v4] make maxage use UIDs to avoid timezone issues)

Nicolas Sebrecht nicolas.s-dev at laposte.net
Tue Mar 31 15:42:47 BST 2015


On Tue, Mar 31, 2015 at 04:53:29AM -0400, Janna Martl wrote:
> remove IMAP-IMAP support for maxage; add startdate, folder_startdate to compensate
> 
> maxage was fundamentally wrong in the IMAP-IMAP case: it assumed that
> remote messages have UIDs in the same order as their local counterparts,
> which could be false, e.g. when messages are copied in quick succession.
> So, remove support for maxage in the IMAP-IMAP case.
> 
> Instead, support a special case of date restriction in IMAP-IMAP syncs:
> suppose there's a pre-existing IMAP folder that you want to sync to a
> new, empty folder, but you only want to sync the messages after a
> certain date. This date can be specified by the configuration option
> folder_startdate. On all subsequent syncs, the messages in the original
> folder after folder_startdate, and all the messages in the new folder,
> are considered.
> 
> Also add the option startdate, which is similar to maxage but involves
> syncing messages after a fixed date, instead of a fixed number of days
> ago. This is not supported in the IMAP-IMAP case for the same reasons as
> maxage.

Good idea.

I find the wording of the options to be confusing: folder_startdate
/sounds/ to be the same thing as startdate but at a folder level. At
least, that's what the "folder" prefix currently mean.

While still sticking with your proposal, it should be more something
like

  - maxage -> maxage (new limitations)
  - folder_startdate -> localstartdate
  - startdate -> startdate

That being said, maxage and startdate are really the same thing as they
both have the same limitations.  So, I would improve maxage rather than
inserting another option:

1. is integer: historical meaning.
2. is date format yyyy-mm-dd: fixed reference.

Then, we can make things less confusing:

  - startdate -> maxage (2 possible forms, requires Maildir)
  - folder_startdate -> startdate (1 form, other limitations)

Later comments are against the current proposal while I think the only 2
above options is better.

Side note: currently we have a lot of options that are applied to a
arbitrary level (current levels: general -> account -> repository,
mbnames).  In the future, it would be nice to apply each option to the
lowest level it makes sense and fallback to the upper levels when not
set. Also, supporting per-folder options would be great but this is
another topic.

> ---
>  offlineimap.conf              |  43 +++++++++-
>  offlineimap/accounts.py       | 187 ++++++++++++++++++++++++++++++------------
>  offlineimap/folder/Base.py    |  28 +++++++
>  offlineimap/folder/Gmail.py   |   4 +-
>  offlineimap/folder/IMAP.py    |  79 ++++++------------
>  offlineimap/folder/Maildir.py |  76 ++++++++---------
>  6 files changed, 267 insertions(+), 150 deletions(-)
> 
> diff --git a/offlineimap.conf b/offlineimap.conf
> index af382fb..c7751b2 100644
> --- a/offlineimap.conf
> +++ b/offlineimap.conf
> @@ -260,7 +260,7 @@ remoterepository = RemoteExample
>  # This option stands in the [Account Test] section.
>  #
>  # OfflineImap can replace a number of full updates by quick synchronizations.
> -# This option is ignored if maxage is used.
> +# This option is ignored if maxage, startdate, or folder_startdate are used.
>  #
>  # It only synchronizes a folder if
>  #
> @@ -342,6 +342,9 @@ remoterepository = RemoteExample
>  #
>  # Known edge cases are described in the offlineimap(1).
>  #
> +# maxage is not supported for syncing two IMAP (or Gmail) accounts, and
> +# may not be used in conjunction with startdate or folder_startdate.

# maxage is allowed only when local is of Maildir type. It can't be used
# with startdate or folder_startdate.

> +#
>  # The maxage option expects an integer (for the number of days).
>  #
>  #maxage = 3
> @@ -349,6 +352,22 @@ remoterepository = RemoteExample
>  
>  # This option stands in the [Account Test] section.
>  #
> +# When you are starting to sync an already existing account you can tell
> +# OfflineIMAP to only sync messages starting at a particular date. When you do
> +# this, messages older than that date will be completely ignored. This can be
> +# useful for importing existing accounts when you do not want to download large
> +# amounts of archive email.

I would change all the above because it's confusing. Just:

# The same as maxage but from a fixed date, instead.

> +#
> +# startdate is not supported for syncing two IMAP (or Gmail) accounts,
> +# and may not be used in conjunction with maxage or folder_startdate.
> +#
> +# The startdate option expects a date in the format yyyy-mm-dd.
> +#
> +#startdate = 2015-03-20
> +
> +
> +# This option stands in the [Account Test] section.
> +#
>  # Maildir file format uses colon (:) separator between uniq name and info.
>  # Unfortunatelly colon is not allowed character in windows file name. If you
>  # enable maildir-windows-compatible option, OfflineIMAP will be able to store
> @@ -451,6 +470,28 @@ localfolders = ~/Test
>  
>  # This option stands in the [Repository LocalExample] section.
>  #
> +# Use this option if LocalExample is a pre-existing account that you
> +# want to clone to a new, empty account, but you only want to clone
> +# messages later than a particular date. If you are using this option
> +# for the first time, RemoteExample must be empty. On all subsequent
> +# syncs (for which this option is enabled), offlineimap will consider
> +# the messages in LocalExample later than folder_startdate, and will
> +# consider all messages in RemoteExample. Older messages in LocalExample
> +# will be completely ignored.
> +#
> +# This option is mainly useful for IMAP-IMAP syncs, because maxage and
> +# startdate are not supported in that case.

Would change the description to just:

# startdate sync mails starting from the given date.
#
# It applies the date restriction to the local IMAP server only.
# Remote repository MUST be empty at first sync.

> +#
> +# folder_startdate may not be used in conjunction with maxage or
> +# startdate.
> +#
> +# The folder_startdate option expects a date in the format yyyy-mm-dd.
> +#
> +#folder_startdate = 2015-03-20
> +
> +
> +# This option stands in the [Repository LocalExample] section.
> +#
>  # Some users may not want the atime (last access time) of folders to be
>  # modified by OfflineIMAP.  If 'restoreatime' is set to yes, OfflineIMAP
>  # will restore the atime of the "new" and "cur" folders in each maildir
> diff --git a/offlineimap/accounts.py b/offlineimap/accounts.py
> index a7ad623..e4da366 100644
> --- a/offlineimap/accounts.py
> +++ b/offlineimap/accounts.py
> @@ -17,10 +17,11 @@
>  from subprocess import Popen, PIPE
>  from threading import Event
>  import os
> +import time
>  from sys import exc_info
>  import traceback
>  
> -from offlineimap import mbnames, CustomConfig, OfflineImapError
> +from offlineimap import mbnames, CustomConfig, OfflineImapError, imaplibutil
>  from offlineimap import globals
>  from offlineimap.repository import Repository
>  from offlineimap.ui import getglobalui
> @@ -402,6 +403,96 @@ def syncfolder(account, remotefolder, quick):
>  
>      Filtered folders on the remote side will not invoke this function."""
>  
> +
> +    def sanity_check(date):

nitpick: s,sanity_check,is_date_sane,

> +        if date[0] < 1900:
> +            raise OfflineImapError("date restriction led to year %d. "
> +                "Abort syncing."% min_date[0],
> +                OfflineImapError.ERROR.REPO)
> +
> +    def check_uid_validity(localfolder, remotefolder, statusfolder):
> +        # If either the local or the status folder has messages and
> +        # there is a UID validity problem, warn and abort.  If there are
> +        # no messages, UW IMAPd loses UIDVALIDITY.  But we don't really
> +        # need it if both local folders are empty.  So, in that case,
> +        # just save it off.
> +        if localfolder.getmessagecount() or statusfolder.getmessagecount():

I'm confused about what it means. Are we looking for

  getmessagecount() != 0, not being None, not raising an exception, whatever?

I know python allows being lazy and sometime Python experts encourage
such pratices. I consider this BAD. It just allows to insert more subtle
bugs, makes code reading harder and more subject to wrong interpretations.

> +            if not localfolder.check_uidvalidity():
> +                ui.validityproblem(localfolder)
> +                localrepos.restore_atime()
> +                return
> +            if not remotefolder.check_uidvalidity():
> +                ui.validityproblem(remotefolder)
> +                localrepos.restore_atime()
> +                return
> +        else:
> +            # Both folders empty, just save new UIDVALIDITY
> +            localfolder.save_uidvalidity()
> +            remotefolder.save_uidvalidity()
> +
> +    def save_min_uid(folder, min_uid):
> +        uidfile = folder.get_min_uid_file()
> +        fd = open(uidfile, 'wt')
> +        fd.write(str(min_uid) + "\n")
> +        fd.close()

Knock knock! :-)

save_min_uid() method must belong to a folder class!

> +
> +    def cachemessagelists_by_date(localfolder, remotefolder, date):
> +        """ Returns messages with uid > min(uids of within-date
> +            messages)."""

messages, date included)."""

> +
> +        localfolder.cachemessagelist(min_date=date)
> +        check_uid_validity(localfolder, remotefolder, statusfolder)
> +        # local messagelist had date restriction applied already. Restrict
> +        # sync to messages with UIDs >= min_uid from this list.
> +        #
> +        # local messagelist might contain new messages (with uid's < 0).
> +        positive_uids = filter(
> +            lambda uid: uid > 0, localfolder.getmessageuidlist())
> +        if len(positive_uids) > 0:
> +            remotefolder.cachemessagelist(min_uid=min(positive_uids))
> +        else:
> +            # No messages with UID > 0 in range in localfolder.
> +            # date restriction was applied with respect to local dates but
> +            # remote folder timezone might be different from local, so be
> +            # safe and make sure the range isn't bigger than in local.

Adding 24h below makes the range bigger, though.

> +            remotefolder.cachemessagelist(
> +                min_date=time.gmtime(time.mktime(date) + 24*60*60))
> +
> +    def cachemessagelists_folder_startdate(new, partial, datestr):
> +        """ Retrieve messagelists when folder_startdate has been set for
> +        the folder 'partial'.
> +
> +        Idea: suppose you want to clone the messages after date in one
> +        account (partial) to a new one (new). If new is empty, then copy
> +        messages in partial newer than date to new, and keep track of the
> +        min uid. On subsequent syncs, sync all the messages in new against
> +        those after that min uid in partial. This is a partial replacement
> +        for maxage in the IMAP-IMAP sync case, where maxage doesn't work:
> +        the UIDs of the messages in localfolder might not be in the same
> +        order as those of corresponding messages in remotefolder, so if L in
> +        local corresponds to R in remote, the ranges [L, ...] and [R, ...]
> +        might not correspond. But, if we're cloning a folder into a new one,
> +        [min_uid, ...] does correspond to [1, ...].
> +
> +        This is just for IMAP-IMAP. For Maildir-IMAP, use startdate instead.
> +        """
> +
> +        new.cachemessagelist()
> +        if not new.getmessageuidlist():

Ditto, "not inst.getlist()" may mean too much things. Be precise about
what you expect.

> +            # First sync
> +            date = time.strptime(datestr, "%Y-%m-%d")

This might badly fail (see later).

> +            partial.cachemessagelist(min_date=date)
> +            uids = partial.getmessageuidlist()
> +            if len(uids) > 0:
> +                min_uid = min(uids)
> +            else:
> +                min_uid = 1
> +            save_min_uid(partial, min_uid)
> +        else:
> +            min_uid = partial.get_min_uid()
> +            partial.cachemessagelist(min_uid=min_uid)
> +
> +
>      remoterepos = account.remoterepos
>      localrepos = account.localrepos
>      statusrepos = account.statusrepos
> @@ -429,68 +520,56 @@ def syncfolder(account, remotefolder, quick):
>  
>          statusfolder.cachemessagelist()
>  
> -        maxage = localfolder.getmaxage()
>  
>          # Load local folder.
>          ui.syncingfolder(remoterepos, remotefolder, localrepos, localfolder)
> -        ui.loadmessagelist(localrepos, localfolder)
> -        # For maxage, returns messages with uid > min(uids of within-maxage messages).
> -        localfolder.cachemessagelist(maxage=maxage)
> -        ui.messagelistloaded(localrepos, localfolder, localfolder.getmessagecount())
> -
> -        if quick:
> -            # IMAP quickchanged isn't compatible with maxage, since the "quick"
> -            # check can only retrieve a full list of UIDs in the folder.
> -            if maxage:
> -                ui.warn("Quick syncs (-q) not supported in conjunction "\
> -                    "with maxage; ignoring -q.")
> -            else:
> +
> +        #Retrieve messagelists, taking into account age-restriction
> +        #options

Is there a reason for not putting a whitespace after '#' sometimes?

> +        maxage = localfolder.getmaxage()
> +        localstart = localfolder.getfolder_startdate()
> +        remotestart = remotefolder.getfolder_startdate()
> +        startdate = localfolder.getstartdate()
> +        if (maxage != None) + (localstart != None) + (remotestart != None)\
> +            + (startdate != None) > 1:
> +            raise OfflineImapError("You can set at most one of the "
> +                "following: maxage, folder_startdate (for the local folder), "
> +                "folder_startdate (for the remote folder), startdate",
> +                OfflineImapError.ERROR.REPO), None, exc_info()[2]
> +            if (maxage != None or localstart or remotestart or startdate)\
> +                and quick:
> +                # IMAP quickchanged isn't compatible with options that
> +                # involve restricting the messagelist, since the "quick"
> +                # check can only retrieve a full list of UIDs in the folder.
> +                ui.warn("Quick syncs (-q) not supported in conjunction "
> +                    "with maxage, folder_startdate, or startdate; ignoring "
> +                    "-q.")
> +        if maxage != None:
> +            timelimit = time.gmtime(time.time() - 60*60*24*maxage)
> +            sanity_check(timelimit)
> +            cachemessagelists_by_date(localfolder, remotefolder, timelimit)
> +        elif startdate != None:
> +            timelimit = time.strptime(startdate, "%Y-%m-%d")
> +            sanity_check(timelimit)
> +            cachemessagelists_by_date(localfolder, remotefolder, timelimit)
> +        elif localstart != None:
> +            cachemessagelists_folder_startdate(remotefolder, localfolder,
> +                localstart)
> +            check_uid_validity(localfolder, remotefolder, statusfolder)
> +        elif remotestart != None:
> +            cachemessagelists_folder_startdate(localfolder, remotefolder,
> +                remotestart)
> +            check_uid_validity(localfolder, remotefolder, statusfolder)
> +        else:
> +            localfolder.cachemessagelist()
> +            if quick:
>                  if (not localfolder.quickchanged(statusfolder) and
>                      not remotefolder.quickchanged(statusfolder)):
>                      ui.skippingfolder(remotefolder)
>                      localrepos.restore_atime()
>                      return
> -
> -        # If either the local or the status folder has messages and
> -        # there is a UID validity problem, warn and abort.  If there are
> -        # no messages, UW IMAPd loses UIDVALIDITY.  But we don't really
> -        # need it if both local folders are empty.  So, in that case,
> -        # just save it off.
> -        if localfolder.getmessagecount() or statusfolder.getmessagecount():
> -            if not localfolder.check_uidvalidity():
> -                ui.validityproblem(localfolder)
> -                localrepos.restore_atime()
> -                return
> -            if not remotefolder.check_uidvalidity():
> -                ui.validityproblem(remotefolder)
> -                localrepos.restore_atime()
> -                return
> -        else:
> -            # Both folders empty, just save new UIDVALIDITY
> -            localfolder.save_uidvalidity()
> -            remotefolder.save_uidvalidity()
> -
> -        # Load remote folder.
> -        ui.loadmessagelist(remoterepos, remotefolder)
> -        if maxage != None:
> -            # local messagelist was applied maxage, already. Restrict sync for
> -            # messages with UIDs >= min_uid from this list.
> -            #
> -            # local messagelist might contain new messages (with uid's < 0).
> -            positive_uids = filter(
> -                lambda uid: uid > 0, localfolder.getmessageuidlist())
> -            if len(positive_uids) > 0:
> -                remotefolder.cachemessagelist(min_uid=min(positive_uids))
> -            else:
> -                # No messages with UID > 0 in range in localfolder.
> -                # maxage was applied with respect to local dates but
> -                # remote folder timezone might be different from local, so be
> -                # safe and make sure the range isn't bigger than in local.
> -                remotefolder.cachemessagelist(maxage=maxage - 1)
> -        else:
> +            check_uid_validity(localfolder, remotefolder, statusfolder)
>              remotefolder.cachemessagelist()
> -        ui.messagelistloaded(remoterepos, remotefolder,
> -                             remotefolder.getmessagecount())
>  
>          # Synchronize remote changes.
>          if not localrepos.getconfboolean('readonly', False):
> diff --git a/offlineimap/folder/Base.py b/offlineimap/folder/Base.py
> index 4af0361..4dbcebd 100644
> --- a/offlineimap/folder/Base.py
> +++ b/offlineimap/folder/Base.py
> @@ -306,6 +306,34 @@ class BaseFolder(object):
>          return self.config.getdefaultint("Account %s"%
>              self.accountname, "maxsize", None)
>  
> +    def getstartdate(self):
> +        return self.config.getdefault("Account %s"%
> +            self.accountname, "startdate", None)
> +
> +    def getfolder_startdate(self):
> +        """ Retrieve the value of the configuration option folder_startdate """
> +        return self.config.getdefault("Repository " + self.repository.name,
> +            'folder_startdate', None)

We are exposing any further use of this value to user craziness. Also,
we should return

  time.strptime(datestr, "%Y-%m-%d")

since that's what we want to work with.

> +
> +    def get_min_uid_file(self):
> +        startuiddir = os.path.join(self.config.getmetadatadir(),
> +            'Repository-' + self.repository.name, 'StartUID')
> +        if not os.path.exists(startuiddir):
> +            os.mkdir(startuiddir, 0o700)
> +        return os.path.join(startuiddir, self.getfolderbasename())
> +
> +    def get_min_uid(self):

Nitpick about the name:

  - get/set
  - read/write
  - save/?

retrieve?

> +        uidfile = self.get_min_uid_file()
> +        try:
> +            fd = open(uidfile, 'rt')
> +            min_uid = long(fd.readline().strip())
> +            fd.close()
> +            return min_uid
> +        except:
> +            raise IOError("Can't read %s. To start using folder_startdate, "\
> +                "folder must be empty"% uidfile)

s,folder must be empty,,

> +
> +
>      def savemessage(self, uid, content, flags, rtime):
>          """Writes a new message, with the specified uid.
>  
> diff --git a/offlineimap/folder/Gmail.py b/offlineimap/folder/Gmail.py
> index 93e8eee..7f239a3 100644
> --- a/offlineimap/folder/Gmail.py
> +++ b/offlineimap/folder/Gmail.py
> @@ -121,9 +121,9 @@ class GmailFolder(IMAPFolder):
>  
>      # TODO: merge this code with the parent's cachemessagelist:
>      # TODO: they have too much common logics.
> -    def cachemessagelist(self, maxage=None, min_uid=None):
> +    def cachemessagelist(self, min_date=None, min_uid=None):
>          if not self.synclabels:
> -            return super(GmailFolder, self).cachemessagelist(maxage=maxage,
> +            return super(GmailFolder, self).cachemessagelist(min_date=min_date,
>                  min_uid=min_uid)
>  
>          self.messagelist = {}
> diff --git a/offlineimap/folder/IMAP.py b/offlineimap/folder/IMAP.py
> index 25dd26d..cabf5b9 100644
> --- a/offlineimap/folder/IMAP.py
> +++ b/offlineimap/folder/IMAP.py
> @@ -18,6 +18,7 @@
>  import random
>  import binascii
>  import re
> +import os
>  import time
>  from sys import exc_info
>  
> @@ -79,6 +80,18 @@ class IMAPFolder(BaseFolder):
>      def waitforthread(self):
>          self.imapserver.connectionwait()
>  
> +    def getmaxage(self):
> +        if self.config.getdefaultint("Account %s"%
> +                self.accountname, "maxage", None):
> +            raise OfflineImapError("maxage is not supported on IMAP-IMAP sync",
> +                OfflineImapError.ERROR.REPO), None, exc_info()[2]

Aren't you forbidding any use of maxage? When did you check that local
repository is not a Maildir?

<...>


I might be missing other (more important) things. All those successive
incremental patches tend to make reviewing harder.

I believe this patch v7 should be squashed to patch v4 (its current
parent) for a fresh patch v8 which would supersede all previous patches
on next round. ,-)

-- 
Nicolas Sebrecht




More information about the OfflineIMAP-project mailing list