PARTIALLY REMOVING MAXAGE (was: [PATCH v4] make maxage use UIDs to avoid timezone issues)

Nicolas Sebrecht nicolas.s-dev at laposte.net
Thu Apr 2 08:50:21 UTC 2015


This is PATCH v8.

On Wed, Apr 01, 2015 at 05:11:08PM -0400, Janna Martl wrote:

> 1. When using maxage, local and remote messagelists are supposed to only
> contain messages from at most maxage days ago. But local and remote used
> different timezones to calculate what "maxage days ago" means, resulting
> in loss of mail.

s,loss of mail,mail removals on one side,

>                  Now, we ask the local folder for maxage days' worth of
> mail, find the lowest UID, and then ask the remote folder for all UID's
> starting with that lowest one.
> 
> 2. maxage was fundamentally wrong in the IMAP-IMAP case: it assumed that
> remote messages have UIDs in the same order as their local counterparts,
> which could be false, e.g. when messages are copied in quick succession.
> So, remove support for maxage in the IMAP-IMAP case.
> 
> 3. Add startdate option for IMAP-IMAP syncs: use messages from the given
> repository starting at startdate, and all messages from the other
> repository. In the first sync, the other repository must be empty.
> 
> 4. Allow maxage to be specified either as number of days to sync (as
> previously) or as a fixed date.
> ---
>  docs/offlineimap.txt               |  18 ++++-
>  offlineimap.conf                   |  46 ++++++++---
>  offlineimap/accounts.py            | 151 +++++++++++++++++++++++++++++--------
>  offlineimap/folder/Base.py         |  67 ++++++++++++++++
>  offlineimap/folder/Gmail.py        |   8 +-
>  offlineimap/folder/GmailMaildir.py |   4 +-
>  offlineimap/folder/IMAP.py         |  95 ++++++++++++-----------
>  offlineimap/folder/Maildir.py      |  82 +++++++++++++-------
>  8 files changed, 345 insertions(+), 126 deletions(-)
> 
> diff --git a/docs/offlineimap.txt b/docs/offlineimap.txt
> index 858fc0b..618f2ab 100644
> --- a/docs/offlineimap.txt
> +++ b/docs/offlineimap.txt
> @@ -135,7 +135,8 @@ Ignore any autorefresh setting in the configuration file.
>    Run only quick synchronizations.
>  +
>  Ignore any flag updates on IMAP servers. If a flag on the remote IMAP changes,
> -and we have the message locally, it will be left untouched in a quick run.
> +and we have the message locally, it will be left untouched in a quick run. This
> +option is ignored if maxage is set.
>  
>  
>  -u <UI>::
> @@ -400,8 +401,19 @@ If you then point your local mutt, or whatever MUA you use to `~/mail/`
>  as root, it should still recognize all folders.
>  
>  
> -Authors
> --------
> +* Edge cases with maxage causing too many messages to be synced.
> ++
> +All messages from at most maxage days ago (+/- a few hours, depending on
> +timezones) are synced, but there are cases in which older messages can also be
> +synced. This happens when a message's UID is significantly higher than those of
> +other messages with similar dates, e.g. when messages are added to the local
> +folder behind offlineimap's back, causing them to get assigned a new UID, or
> +when offlineimap first syncs a pre-existing Maildir. In the latter case, it
> +could appear as if a noticeable and random subset of old messages are synced.
> +
> +
> +Main authors
> +------------
>  
>    John Goerzen, Sebastian Spaetz, Eygene Ryabinkin, Nicolas Sebrecht.
>  
> diff --git a/offlineimap.conf b/offlineimap.conf
> index 5bc48a8..525e3f4 100644
> --- a/offlineimap.conf
> +++ b/offlineimap.conf
> @@ -260,6 +260,8 @@ remoterepository = RemoteExample
>  # This option stands in the [Account Test] section.
>  #
>  # OfflineImap can replace a number of full updates by quick synchronizations.
> +# This option is ignored if maxage or startdate are used.
> +#
>  # It only synchronizes a folder if
>  #
>  #   1) a Maildir folder has changed
> @@ -327,21 +329,26 @@ remoterepository = RemoteExample
>  
>  # This option stands in the [Account Test] section.
>  #
> -# When you are starting to sync an already existing account you can tell
> -# OfflineIMAP to sync messages from only the last x days.  When you do this,
> -# messages older than x days will be completely ignored.  This can be useful for
> -# importing existing accounts when you do not want to download large amounts of
> -# archive email.
> +# maxage enables you to sync only recent messages. There are two ways to specify
> +# what "recent" means: if maxage is given as an integer, then only messages from
> +# the last maxage days will be synced. If maxage is given as a date, then only
> +# messages later than that date will be synced.
> +#
> +# Messages older than the cutoff will not be synced, their flags will not be
> +# changed, they will not be deleted, etc. For OfflineIMAP it will be like these
> +# messages do not exist. This will perform an IMAP search in the case of IMAP or
> +# Gmail and therefore requires that the server support server side searching.
> +#
> +# Known edge cases are described in offlineimap(1).
>  #
> -# Messages older than maxage days will not be synced, their flags will not be
> -# changed, they will not be deleted, etc.  For OfflineIMAP it will be like these
> -# messages do not exist.  This will perform an IMAP search in the case of IMAP
> -# or Gmail and therefore requires that the server support server side searching.
> -# This will calculate the earliest day that would be included in the search and
> -# include all messages from that day until today. The maxage option expects an
> -# integer (for the number of days).
> +# maxage is allowed only when the local folder is of type Maildir. It can't be
> +# used with startdate.
> +#
> +# The maxage option expects an integer (for the number of days) or a date of the
> +# form yyyy-mm-dd.
>  #
>  #maxage = 3
> +#maxage = 2015-04-01
>  
>  
>  # This option stands in the [Account Test] section.
> @@ -448,6 +455,21 @@ localfolders = ~/Test
>  
>  # This option stands in the [Repository LocalExample] section.
>  #
> +# startdate syncs mails starting from a given date. It applies the date
> +# restriction to LocalExample only. The remote repository MUST be empty
> +# at the first sync where this option is used.
> +#
> +# Unlike maxage, this is supported for IMAP-IMAP sync.
> +#
> +# startdate can't be used with maxage.
> +#
> +# The startdate option expects a date in the format yyyy-mm-dd.
> +#
> +#startdate = 2015-04-01
> +
> +
> +# This option stands in the [Repository LocalExample] section.
> +#
>  # Some users may not want the atime (last access time) of folders to be
>  # modified by OfflineIMAP.  If 'restoreatime' is set to yes, OfflineIMAP
>  # will restore the atime of the "new" and "cur" folders in each maildir
> diff --git a/offlineimap/accounts.py b/offlineimap/accounts.py
> index cac4d88..b192eca 100644
> --- a/offlineimap/accounts.py
> +++ b/offlineimap/accounts.py
> @@ -17,10 +17,11 @@
>  from subprocess import Popen, PIPE
>  from threading import Event
>  import os
> +import time
>  from sys import exc_info
>  import traceback
>  
> -from offlineimap import mbnames, CustomConfig, OfflineImapError
> +from offlineimap import mbnames, CustomConfig, OfflineImapError, imaplibutil
>  from offlineimap import globals
>  from offlineimap.repository import Repository
>  from offlineimap.ui import getglobalui
> @@ -402,6 +403,87 @@ def syncfolder(account, remotefolder, quick):
>  
>      Filtered folders on the remote side will not invoke this function."""
>  
> +    def check_uid_validity(localfolder, remotefolder, statusfolder):
> +        # If either the local or the status folder has messages and
> +        # there is a UID validity problem, warn and abort.  If there are
> +        # no messages, UW IMAPd loses UIDVALIDITY.  But we don't really
> +        # need it if both local folders are empty.  So, in that case,
> +        # just save it off.
> +        if localfolder.getmessagecount() or statusfolder.getmessagecount():

Shouldn't it be:
        if localfolder.getmessagecount() > 0 and statusfolder.getmessagecount() > 0:
                                             ^^^
?

> +            if not localfolder.check_uidvalidity():
> +                ui.validityproblem(localfolder)
> +                localrepos.restore_atime()
> +                return
> +            if not remotefolder.check_uidvalidity():
> +                ui.validityproblem(remotefolder)
> +                localrepos.restore_atime()
> +                return
> +        else:
> +            # Both folders empty, just save new UIDVALIDITY
> +            localfolder.save_uidvalidity()
> +            remotefolder.save_uidvalidity()
> +
> +    def save_min_uid(folder, min_uid):
> +        uidfile = folder.get_min_uid_file()
> +        fd = open(uidfile, 'wt')
> +        fd.write(str(min_uid) + "\n")
> +        fd.close()
> +
> +    def cachemessagelists_by_date(localfolder, remotefolder, date):

    def cachemessagelists_upto_date(localfolder, remotefolder, date):
or
    def cachemessagelists_within_date(localfolder, remotefolder, date):

"by" reads as sorting of creating a dict with dates for keys.

> +        """ Returns messages with uid > min(uids of within-date
> +            messages)."""
> +
> +        localfolder.cachemessagelist(min_date=date)
> +        check_uid_validity(localfolder, remotefolder, statusfolder)
> +        # local messagelist had date restriction applied already. Restrict
> +        # sync to messages with UIDs >= min_uid from this list.
> +        #
> +        # local messagelist might contain new messages (with uid's < 0).
> +        positive_uids = filter(
> +            lambda uid: uid > 0, localfolder.getmessageuidlist())
> +        if len(positive_uids) > 0:
> +            remotefolder.cachemessagelist(min_uid=min(positive_uids))
> +        else:
> +            # No messages with UID > 0 in range in localfolder.
> +            # date restriction was applied with respect to local dates but
> +            # remote folder timezone might be different from local, so be
> +            # safe and make sure the range isn't bigger than in local.
> +            remotefolder.cachemessagelist(
> +                min_date=time.gmtime(time.mktime(date) + 24*60*60))
> +
> +    def cachemessagelists_startdate(new, partial, date):
> +        """ Retrieve messagelists when startdate has been set for
> +        the folder 'partial'.
> +
> +        Idea: suppose you want to clone the messages after date in one
> +        account (partial) to a new one (new). If new is empty, then copy
> +        messages in partial newer than date to new, and keep track of the
> +        min uid. On subsequent syncs, sync all the messages in new against
> +        those after that min uid in partial. This is a partial replacement
> +        for maxage in the IMAP-IMAP sync case, where maxage doesn't work:
> +        the UIDs of the messages in localfolder might not be in the same
> +        order as those of corresponding messages in remotefolder, so if L in
> +        local corresponds to R in remote, the ranges [L, ...] and [R, ...]
> +        might not correspond. But, if we're cloning a folder into a new one,
> +        [min_uid, ...] does correspond to [1, ...].
> +
> +        This is just for IMAP-IMAP. For Maildir-IMAP, use maxage instead.
> +        """
> +
> +        new.cachemessagelist()
> +        if not new.getmessageuidlist():

All those y.getmessageuidlist() checks should be in the form:

  len(y.getmessageuidlist()) < 0

to make it clear what we are checking.

That being said, this check looks wrong. We'd rather check for the
existance of a cached min_uid.

This is because we want to remove [min_uid:*] on partial if all messages
on new has been removed. Syncing with an empty new.getmessageuidlist()
against partial is a valid use case.

> +            partial.cachemessagelist(min_date=date)
> +            uids = partial.getmessageuidlist()
> +            if len(uids) > 0:
> +                min_uid = min(uids)
> +            else:
> +                min_uid = 1
> +            save_min_uid(partial, min_uid)
> +        else:
> +            min_uid = partial.retrieve_min_uid()
> +            partial.cachemessagelist(min_uid=min_uid)
> +
> +
>      remoterepos = account.remoterepos
>      localrepos = account.localrepos
>      statusrepos = account.statusrepos
> @@ -429,43 +511,46 @@ def syncfolder(account, remotefolder, quick):
>  
>          statusfolder.cachemessagelist()
>  
> -        if quick:
> -            if (not localfolder.quickchanged(statusfolder) and
> -                not remotefolder.quickchanged(statusfolder)):
> -                ui.skippingfolder(remotefolder)
> -                localrepos.restore_atime()
> -                return
>  
>          # Load local folder.
>          ui.syncingfolder(remoterepos, remotefolder, localrepos, localfolder)
> -        ui.loadmessagelist(localrepos, localfolder)
> -        localfolder.cachemessagelist()
> -        ui.messagelistloaded(localrepos, localfolder, localfolder.getmessagecount())
>  
> -        # If either the local or the status folder has messages and
> -        # there is a UID validity problem, warn and abort.  If there are
> -        # no messages, UW IMAPd loses UIDVALIDITY.  But we don't really
> -        # need it if both local folders are empty.  So, in that case,
> -        # just save it off.
> -        if localfolder.getmessagecount() or statusfolder.getmessagecount():
> -            if not localfolder.check_uidvalidity():
> -                ui.validityproblem(localfolder)
> -                localrepos.restore_atime()
> -                return
> -            if not remotefolder.check_uidvalidity():
> -                ui.validityproblem(remotefolder)
> -                localrepos.restore_atime()
> -                return
> +        # Retrieve messagelists, taking into account age-restriction
> +        # options
> +        maxage = localfolder.getmaxage()
> +        localstart = localfolder.getstartdate()
> +        remotestart = remotefolder.getstartdate()
> +        if (maxage != None) + (localstart != None) + (remotestart != None) > 1:
> +            raise OfflineImapError("You can set at most one of the "
> +                "following: maxage, startdate (for the local folder), "
> +                "startdate (for the remote folder)",
> +                OfflineImapError.ERROR.REPO), None, exc_info()[2]
> +        if (maxage != None or localstart or remotestart) and quick:
> +            # IMAP quickchanged isn't compatible with options that
> +            # involve restricting the messagelist, since the "quick"
> +            # check can only retrieve a full list of UIDs in the folder.
> +            ui.warn("Quick syncs (-q) not supported in conjunction "
> +                "with maxage or startdate; ignoring -q.")
> +        if maxage != None:
> +            cachemessagelists_by_date(localfolder, remotefolder, maxage)

Shouldn't we check_uid_validity() here, too?

> +        elif localstart != None:
> +            cachemessagelists_startdate(remotefolder, localfolder,
> +                localstart)
> +            check_uid_validity(localfolder, remotefolder, statusfolder)
> +        elif remotestart != None:
> +            cachemessagelists_startdate(localfolder, remotefolder,
> +                remotestart)
> +            check_uid_validity(localfolder, remotefolder, statusfolder)
>          else:
> -            # Both folders empty, just save new UIDVALIDITY
> -            localfolder.save_uidvalidity()
> -            remotefolder.save_uidvalidity()
> -
> -        # Load remote folder.
> -        ui.loadmessagelist(remoterepos, remotefolder)
> -        remotefolder.cachemessagelist()
> -        ui.messagelistloaded(remoterepos, remotefolder,
> -                             remotefolder.getmessagecount())
> +            localfolder.cachemessagelist()
> +            if quick:
> +                if (not localfolder.quickchanged(statusfolder) and
> +                    not remotefolder.quickchanged(statusfolder)):
> +                    ui.skippingfolder(remotefolder)
> +                    localrepos.restore_atime()
> +                    return
> +            check_uid_validity(localfolder, remotefolder, statusfolder)
> +            remotefolder.cachemessagelist()
>  
>          # Synchronize remote changes.
>          if not localrepos.getconfboolean('readonly', False):
> diff --git a/offlineimap/folder/Base.py b/offlineimap/folder/Base.py
> index 16b5819..e52bec8 100644
> --- a/offlineimap/folder/Base.py
> +++ b/offlineimap/folder/Base.py
> @@ -17,6 +17,7 @@
>  
>  import os.path
>  import re
> +import time
>  from sys import exc_info
>  
>  from offlineimap import threadutil, emailutil
> @@ -298,6 +299,72 @@ class BaseFolder(object):
>  
>          raise NotImplementedError
>  
> +    def getmaxage(self):
> +        """ maxage is allowed to be either an integer or a date of the
> +        form YYYY-mm-dd. This returns a time_struct. """

Good!

> +
> +        maxagestr = self.config.getdefault("Account %s"%
> +            self.accountname, "maxage", None)
> +        if not maxagestr:

    if maxagestr == None:

> +            return None
> +        # is it a number?
> +        try:
> +            maxage = int(maxagestr)

Handle maxage < 1, perhaps 2.

> +            return time.gmtime(time.time() - 60*60*24*maxage)
> +        except ValueError:
> +            pass
> +        # is it a date string?
> +        try:
> +            date = time.strptime(maxagestr, "%Y-%m-%d")
> +            if date[0] < 1900:
> +                raise OfflineImapError("maxage led to year %d. "
> +                    "Abort syncing."% date[0],
> +                    OfflineImapError.ERROR.MESSAGE)
> +            return date
> +        except ValueError:
> +            raise OfflineImapError("invalid maxage value %s",
> +                OfflineImapError.ERROR.MESSAGE)
> +
> +    def getmaxsize(self):
> +        return self.config.getdefaultint("Account %s"%
> +            self.accountname, "maxsize", None)
> +
> +    def getstartdate(self):
> +        """ Retrieve the value of the configuration option startdate """
> +        datestr = self.config.getdefault("Repository " + self.repository.name,
> +            'startdate', None)
> +        try:
> +            if not datestr:
> +                return None
> +            date = time.strptime(datestr, "%Y-%m-%d")
> +            if date[0] < 1900:
> +                raise OfflineImapError("startdate led to year %d. "
> +                    "Abort syncing."% date[0],
> +                    OfflineImapError.ERROR.MESSAGE)
> +            return date
> +        except ValueError:
> +            raise OfflineImapError("invalid startdate value %s",
> +                OfflineImapError.ERROR.MESSAGE)
> +
> +    def get_min_uid_file(self):
> +        startuiddir = os.path.join(self.config.getmetadatadir(),
> +            'Repository-' + self.repository.name, 'StartUID')
> +        if not os.path.exists(startuiddir):
> +            os.mkdir(startuiddir, 0o700)
> +        return os.path.join(startuiddir, self.getfolderbasename())
> +
> +    def retrieve_min_uid(self):
> +        uidfile = self.get_min_uid_file()
> +        try:
> +            fd = open(uidfile, 'rt')
> +            min_uid = long(fd.readline().strip())
> +            fd.close()
> +            return min_uid
> +        except:
> +            raise IOError("Can't read %s. To start using startdate, "\
> +                "folder must be empty"% uidfile)
> +
> +
>      def savemessage(self, uid, content, flags, rtime):
>          """Writes a new message, with the specified uid.
>  
> diff --git a/offlineimap/folder/Gmail.py b/offlineimap/folder/Gmail.py
> index 1afbe47..354d544 100644
> --- a/offlineimap/folder/Gmail.py
> +++ b/offlineimap/folder/Gmail.py
> @@ -121,16 +121,18 @@ class GmailFolder(IMAPFolder):
>  
>      # TODO: merge this code with the parent's cachemessagelist:
>      # TODO: they have too much common logics.
> -    def cachemessagelist(self):
> +    def cachemessagelist(self, min_date=None, min_uid=None):
>          if not self.synclabels:
> -            return super(GmailFolder, self).cachemessagelist()
> +            return super(GmailFolder, self).cachemessagelist(min_date=min_date,
> +                min_uid=min_uid)
>  
>          self.messagelist = {}
>  
>          self.ui.collectingdata(None, self)
>          imapobj = self.imapserver.acquireconnection()
>          try:
> -            msgsToFetch = self._msgs_to_fetch(imapobj)
> +            msgsToFetch = self._msgs_to_fetch(imapobj, min_date=min_date, 
> +                min_uid=min_uid)
>              if not msgsToFetch:
>                  return # No messages to sync
>  
> diff --git a/offlineimap/folder/GmailMaildir.py b/offlineimap/folder/GmailMaildir.py
> index 894792d..0ae00bf 100644
> --- a/offlineimap/folder/GmailMaildir.py
> +++ b/offlineimap/folder/GmailMaildir.py
> @@ -64,9 +64,9 @@ class GmailMaildirFolder(MaildirFolder):
>                  'filename': '/no-dir/no-such-file/', 'mtime': 0}
>  
>  
> -    def cachemessagelist(self):
> +    def cachemessagelist(self, maxage=None, min_uid=None):
>          if self.ismessagelistempty():
> -            self.messagelist = self._scanfolder()
> +            self.messagelist = self._scanfolder(maxage=maxage, min_uid=min_uid)
>  
>          # Get mtimes
>          if self.synclabels:
> diff --git a/offlineimap/folder/IMAP.py b/offlineimap/folder/IMAP.py
> index 4b470a2..253ac97 100644
> --- a/offlineimap/folder/IMAP.py
> +++ b/offlineimap/folder/IMAP.py
> @@ -18,6 +18,7 @@
>  import random
>  import binascii
>  import re
> +import os
>  import time
>  from sys import exc_info
>  
> @@ -79,6 +80,12 @@ class IMAPFolder(BaseFolder):
>      def waitforthread(self):
>          self.imapserver.connectionwait()
>  
> +    def getmaxage(self):
> +        if self.config.getdefault("Account %s"%
> +                self.accountname, "maxage", None):
> +            raise OfflineImapError("maxage is not supported on IMAP-IMAP sync",
> +                OfflineImapError.ERROR.REPO), None, exc_info()[2]
> +
>      # Interface from BaseFolder
>      def getcopyinstancelimit(self):
>          return 'MSGCOPY_' + self.repository.getname()
> @@ -143,8 +150,7 @@ class IMAPFolder(BaseFolder):
>              return True
>          return False
>  
> -
> -    def _msgs_to_fetch(self, imapobj):
> +    def _msgs_to_fetch(self, imapobj, min_date=None, min_uid=None):
>          """Determines sequence numbers of messages to be fetched.
>  
>          Message sequence numbers (MSNs) are more easily compacted
> @@ -152,57 +158,55 @@ class IMAPFolder(BaseFolder):
>  
>          Arguments:
>          - imapobj: instance of IMAPlib
> +        - min_date (optional): a time_struct; only fetch messages newer than this
> +        - min_uid (optional): only fetch messages with UID >= min_uid
> +
> +        This function should be called with at MOST one of min_date OR
> +        min_uid set but not BOTH.
>  
>          Returns: range(s) for messages or None if no messages
>          are to be fetched."""
>  
> -        res_type, imapdata = imapobj.select(self.getfullname(), True, True)
> -        if imapdata == [None] or imapdata[0] == '0':
> -            # Empty folder, no need to populate message list
> -            return None
> +        def search(search_conditions):
> +            """Actually request the server with the specified conditions.
>  
> -        # By default examine all messages in this folder
> -        msgsToFetch = '1:*'
> -
> -        maxage = self.config.getdefaultint(
> -            "Account %s"% self.accountname, "maxage", -1)
> -        maxsize = self.config.getdefaultint(
> -            "Account %s"% self.accountname, "maxsize", -1)
> -
> -        # Build search condition
> -        if (maxage != -1) | (maxsize != -1):
> -            search_cond = "(";
> -
> -            if(maxage != -1):
> -                #find out what the oldest message is that we should look at
> -                oldest_struct = time.gmtime(time.time() - (60*60*24*maxage))
> -                if oldest_struct[0] < 1900:
> -                    raise OfflineImapError("maxage setting led to year %d. "
> -                        "Abort syncing."% oldest_struct[0],
> -                        OfflineImapError.ERROR.REPO)
> -                search_cond += "SINCE %02d-%s-%d"% (
> -                    oldest_struct[2],
> -                    MonthNames[oldest_struct[1]],
> -                    oldest_struct[0])
> -
> -            if(maxsize != -1):
> -                if(maxage != -1): # There are two conditions, add space
> -                    search_cond += " "
> -                search_cond += "SMALLER %d"% maxsize
> -
> -            search_cond += ")"
> -
> -            res_type, res_data = imapobj.search(None, search_cond)
> +            Returns: range(s) for messages or None if no messages
> +            are to be fetched."""
> +            res_type, res_data = imapobj.search(None, search_conditions)
>              if res_type != 'OK':
>                  raise OfflineImapError("SEARCH in folder [%s]%s failed. "
>                      "Search string was '%s'. Server responded '[%s] %s'"% (
>                      self.getrepository(), self, search_cond, res_type, res_data),
>                      OfflineImapError.ERROR.FOLDER)
> +            return res_data[0].split()
>  
> -            # Resulting MSN are separated by space, coalesce into ranges
> -            msgsToFetch = imaputil.uid_sequence(res_data[0].split())
> +        res_type, imapdata = imapobj.select(self.getfullname(), True, True)
> +        if imapdata == [None] or imapdata[0] == '0':
> +            # Empty folder, no need to populate message list.
> +            return None
>  
> -        return msgsToFetch
> +        conditions = []
> +        # 1. min_uid condition.
> +        if min_uid != None:
> +            conditions.append("UID %d:*"% min_uid)
> +        # 2. date condition.
> +        elif min_date != None:
> +            # Find out what the oldest message is that we should look at.
> +            conditions.append("SINCE %02d-%s-%d"% (
> +                min_date[2], MonthNames[min_date[1]], min_date[0]))
> +        # 3. maxsize condition.
> +        maxsize = self.getmaxsize()
> +        if maxsize != None:
> +            conditions.append("SMALLER %d"% maxsize)
> +
> +        if len(conditions) >= 1:
> +            # Build SEARCH command.
> +            search_cond = "(%s)"% ' '.join(conditions)
> +            search_result = search(search_cond)
> +            return imaputil.uid_sequence(search_result)
> +
> +        # By default consider all messages in this folder.
> +        return '1:*'
>  
>      # Interface from BaseFolder
>      def msglist_item_initializer(self, uid):
> @@ -210,19 +214,21 @@ class IMAPFolder(BaseFolder):
>  
>  
>      # Interface from BaseFolder
> -    def cachemessagelist(self):
> +    def cachemessagelist(self, min_date=None, min_uid=None):
> +        self.ui.loadmessagelist(self.repository, self)
>          self.messagelist = {}
>  
>          imapobj = self.imapserver.acquireconnection()
>          try:
> -            msgsToFetch = self._msgs_to_fetch(imapobj)
> +            msgsToFetch = self._msgs_to_fetch(
> +                imapobj, min_date=min_date, min_uid=min_uid)
>              if not msgsToFetch:
>                  return # No messages to sync
>  
>              # Get the flags and UIDs for these. single-quotes prevent
>              # imaplib2 from quoting the sequence.
>              res_type, response = imapobj.fetch("'%s'"%
> -                msgsToFetch, '(FLAGS UID)')
> +                msgsToFetch, '(FLAGS UID INTERNALDATE)')
>              if res_type != 'OK':
>                  raise OfflineImapError("FETCHING UIDs in folder [%s]%s failed. "
>                      "Server responded '[%s] %s'"% (self.getrepository(), self,
> @@ -247,6 +253,7 @@ class IMAPFolder(BaseFolder):
>                  flags = imaputil.flagsimap2maildir(options['FLAGS'])
>                  rtime = imaplibutil.Internaldate2epoch(messagestr)
>                  self.messagelist[uid] = {'uid': uid, 'flags': flags, 'time': rtime}
> +        self.ui.messagelistloaded(self.repository, self, self.getmessagecount())
>  
>      def dropmessagelistcache(self):
>          self.messagelist = {}
> diff --git a/offlineimap/folder/Maildir.py b/offlineimap/folder/Maildir.py
> index 79c34a7..d400a3f 100644
> --- a/offlineimap/folder/Maildir.py
> +++ b/offlineimap/folder/Maildir.py
> @@ -91,25 +91,17 @@ class MaildirFolder(BaseFolder):
>          token."""
>          return 42
>  
> -    # Checks to see if the given message is within the maximum age according
> -    # to the maildir name which should begin with a timestamp
> -    def _iswithinmaxage(self, messagename, maxage):
> -        # In order to have the same behaviour as SINCE in an IMAP search
> -        # we must convert this to the oldest time and then strip off hrs/mins
> -        # from that day.
> -        oldest_time_utc = time.time() - (60*60*24*maxage)
> -        oldest_time_struct = time.gmtime(oldest_time_utc)
> -        oldest_time_today_seconds = ((oldest_time_struct[3] * 3600) \
> -            + (oldest_time_struct[4] * 60) \
> -            + oldest_time_struct[5])
> -        oldest_time_utc -= oldest_time_today_seconds
> +    def _iswithintime(self, messagename, date):
> +        """Check to see if the given message is newer than date (a
> +        time_struct) according to the maildir name which should begin
> +        with a timestamp."""
>  
>          timestampmatch = re_timestampmatch.search(messagename)
>          if not timestampmatch:
>              return True
>          timestampstr = timestampmatch.group()
>          timestamplong = long(timestampstr)
> -        if(timestamplong < oldest_time_utc):
> +        if(timestamplong < time.mktime(date)):
>              return False
>          else:
>              return True
> @@ -150,18 +142,21 @@ class MaildirFolder(BaseFolder):
>              flags = set((c for c in flagmatch.group(1) if not c.islower()))
>          return prefix, uid, fmd5, flags
>  
> -    def _scanfolder(self):
> +    def _scanfolder(self, min_date=None, min_uid=None):
>          """Cache the message list from a Maildir.
>  
> +        If min_date is set, this finds the min UID of all messages newer than
> +        min_date and uses it as the real cutoff for considering messages.
> +        This handles the edge cases where the date is much earlier than messages
> +        with similar UID's (e.g. the UID was reassigned much later).
> +
>          Maildir flags are: R (replied) S (seen) T (trashed) D (draft) F
>          (flagged).
>          :returns: dict that can be used as self.messagelist.
>          """
>  
> -        maxage = self.config.getdefaultint("Account " + self.accountname,
> -                                           "maxage", None)
> -        maxsize = self.config.getdefaultint("Account " + self.accountname,
> -                                            "maxsize", None)
> +        maxsize = self.getmaxsize()
> +
>          retval = {}
>          files = []
>          nouidcounter = -1          # Messages without UIDs get negative UIDs.
> @@ -170,12 +165,11 @@ class MaildirFolder(BaseFolder):
>              files.extend((dirannex, filename) for
>                           filename in os.listdir(fulldirname))
>  
> +        date_excludees = {}
>          for dirannex, filename in files:
>              # We store just dirannex and filename, ie 'cur/123...'
>              filepath = os.path.join(dirannex, filename)
> -            # Check maxage/maxsize if this message should be considered.
> -            if maxage and not self._iswithinmaxage(filename, maxage):
> -                continue
> +            # Check maxsize if this message should be considered.
>              if maxsize and (os.path.getsize(os.path.join(
>                          self.getfullname(), filepath)) > maxsize):
>                  continue
> @@ -192,16 +186,43 @@ class MaildirFolder(BaseFolder):
>                      nouidcounter -= 1
>                  else:
>                      uid = long(uidmatch.group(1))
> -            # 'filename' is 'dirannex/filename', e.g. cur/123,U=1,FMD5=1:2,S
> -            retval[uid] = self.msglist_item_initializer(uid)
> -            retval[uid]['flags'] = flags
> -            retval[uid]['filename'] = filepath
> +            if min_uid != None and uid > 0 and uid < min_uid:
> +                continue
> +            if min_date != None and not self._iswithintime(filename, min_date):
> +                # Keep track of messages outside of the time limit, because they
> +                # still might have UID > min(UIDs of within-min_date). We hit
> +                # this case for maxage if any message had a known/valid datetime
> +                # and was re-uploaded because the UID in the filename got lost
> +                # (e.g. local copy/move). On next sync, it was assigned a new
> +                # UID from the server and will be included in the SEARCH
> +                # condition. So, we must re-include them later in this method
> +                # in order to avoid inconsistent lists of messages.
> +                date_excludees[uid] = self.msglist_item_initializer(uid)
> +                date_excludees[uid]['flags'] = flags
> +                date_excludees[uid]['filename'] = filepath
> +            else:
> +                # 'filename' is 'dirannex/filename', e.g. cur/123,U=1,FMD5=1:2,S
> +                retval[uid] = self.msglist_item_initializer(uid)
> +                retval[uid]['flags'] = flags
> +                retval[uid]['filename'] = filepath
> +        if min_date != None:
> +            # Re-include messages with high enough uid's.
> +            positive_uids = filter(lambda uid: uid > 0, retval)
> +            if positive_uids:
> +                min_uid = min(positive_uids)
> +                for uid in date_excludees.keys():
> +                    if uid > min_uid:
> +                        # This message was originally excluded because of
> +                        # its date. It is re-included now because we want all
> +                        # messages with UID > min_uid.
> +                        retval[uid] = date_excludees[uid]
>          return retval
>  
>      # Interface from BaseFolder
>      def quickchanged(self, statusfolder):
> -        """Returns True if the Maildir has changed"""
> -        self.cachemessagelist()
> +        """Returns True if the Maildir has changed
> +
> +        Assumes cachemessagelist() has already been called """
>          # Folder has different uids than statusfolder => TRUE.
>          if sorted(self.getmessageuidlist()) != \
>                  sorted(statusfolder.getmessageuidlist()):
> @@ -218,9 +239,12 @@ class MaildirFolder(BaseFolder):
>          return {'flags': set(), 'filename': '/no-dir/no-such-file/'}
>  
>      # Interface from BaseFolder
> -    def cachemessagelist(self):
> +    def cachemessagelist(self, min_date=None, min_uid=None):
>          if self.ismessagelistempty():
> -            self.messagelist = self._scanfolder()
> +            self.ui.loadmessagelist(self.repository, self)
> +            self.messagelist = self._scanfolder(min_date=min_date,
> +                min_uid=min_uid)
> +            self.ui.messagelistloaded(self.repository, self, self.getmessagecount())
>  
>      # Interface from BaseFolder
>      def getmessagelist(self):
-- 
Nicolas Sebrecht



More information about the OfflineIMAP-project mailing list