PARTIALLY REMOVING MAXAGE (was: [PATCH v4] make maxage use UIDs to avoid timezone issues)

Janna Martl janna.martl109 at gmail.com
Wed Apr 1 22:11:08 BST 2015


1. When using maxage, local and remote messagelists are supposed to only
contain messages from at most maxage days ago. But local and remote used
different timezones to calculate what "maxage days ago" means, resulting
in loss of mail. Now, we ask the local folder for maxage days' worth of
mail, find the lowest UID, and then ask the remote folder for all UID's
starting with that lowest one.

2. maxage was fundamentally wrong in the IMAP-IMAP case: it assumed that
remote messages have UIDs in the same order as their local counterparts,
which could be false, e.g. when messages are copied in quick succession.
So, remove support for maxage in the IMAP-IMAP case.

3. Add startdate option for IMAP-IMAP syncs: use messages from the given
repository starting at startdate, and all messages from the other
repository. In the first sync, the other repository must be empty.

4. Allow maxage to be specified either as number of days to sync (as
previously) or as a fixed date.
---
 docs/offlineimap.txt               |  18 ++++-
 offlineimap.conf                   |  46 ++++++++---
 offlineimap/accounts.py            | 151 +++++++++++++++++++++++++++++--------
 offlineimap/folder/Base.py         |  67 ++++++++++++++++
 offlineimap/folder/Gmail.py        |   8 +-
 offlineimap/folder/GmailMaildir.py |   4 +-
 offlineimap/folder/IMAP.py         |  95 ++++++++++++-----------
 offlineimap/folder/Maildir.py      |  82 +++++++++++++-------
 8 files changed, 345 insertions(+), 126 deletions(-)

diff --git a/docs/offlineimap.txt b/docs/offlineimap.txt
index 858fc0b..618f2ab 100644
--- a/docs/offlineimap.txt
+++ b/docs/offlineimap.txt
@@ -135,7 +135,8 @@ Ignore any autorefresh setting in the configuration file.
   Run only quick synchronizations.
 +
 Ignore any flag updates on IMAP servers. If a flag on the remote IMAP changes,
-and we have the message locally, it will be left untouched in a quick run.
+and we have the message locally, it will be left untouched in a quick run. This
+option is ignored if maxage is set.
 
 
 -u <UI>::
@@ -400,8 +401,19 @@ If you then point your local mutt, or whatever MUA you use to `~/mail/`
 as root, it should still recognize all folders.
 
 
-Authors
--------
+* Edge cases with maxage causing too many messages to be synced.
++
+All messages from at most maxage days ago (+/- a few hours, depending on
+timezones) are synced, but there are cases in which older messages can also be
+synced. This happens when a message's UID is significantly higher than those of
+other messages with similar dates, e.g. when messages are added to the local
+folder behind offlineimap's back, causing them to get assigned a new UID, or
+when offlineimap first syncs a pre-existing Maildir. In the latter case, it
+could appear as if a noticeable and random subset of old messages are synced.
+
+
+Main authors
+------------
 
   John Goerzen, Sebastian Spaetz, Eygene Ryabinkin, Nicolas Sebrecht.
 
diff --git a/offlineimap.conf b/offlineimap.conf
index 5bc48a8..525e3f4 100644
--- a/offlineimap.conf
+++ b/offlineimap.conf
@@ -260,6 +260,8 @@ remoterepository = RemoteExample
 # This option stands in the [Account Test] section.
 #
 # OfflineImap can replace a number of full updates by quick synchronizations.
+# This option is ignored if maxage or startdate are used.
+#
 # It only synchronizes a folder if
 #
 #   1) a Maildir folder has changed
@@ -327,21 +329,26 @@ remoterepository = RemoteExample
 
 # This option stands in the [Account Test] section.
 #
-# When you are starting to sync an already existing account you can tell
-# OfflineIMAP to sync messages from only the last x days.  When you do this,
-# messages older than x days will be completely ignored.  This can be useful for
-# importing existing accounts when you do not want to download large amounts of
-# archive email.
+# maxage enables you to sync only recent messages. There are two ways to specify
+# what "recent" means: if maxage is given as an integer, then only messages from
+# the last maxage days will be synced. If maxage is given as a date, then only
+# messages later than that date will be synced.
+#
+# Messages older than the cutoff will not be synced, their flags will not be
+# changed, they will not be deleted, etc. For OfflineIMAP it will be like these
+# messages do not exist. This will perform an IMAP search in the case of IMAP or
+# Gmail and therefore requires that the server support server side searching.
+#
+# Known edge cases are described in offlineimap(1).
 #
-# Messages older than maxage days will not be synced, their flags will not be
-# changed, they will not be deleted, etc.  For OfflineIMAP it will be like these
-# messages do not exist.  This will perform an IMAP search in the case of IMAP
-# or Gmail and therefore requires that the server support server side searching.
-# This will calculate the earliest day that would be included in the search and
-# include all messages from that day until today. The maxage option expects an
-# integer (for the number of days).
+# maxage is allowed only when the local folder is of type Maildir. It can't be
+# used with startdate.
+#
+# The maxage option expects an integer (for the number of days) or a date of the
+# form yyyy-mm-dd.
 #
 #maxage = 3
+#maxage = 2015-04-01
 
 
 # This option stands in the [Account Test] section.
@@ -448,6 +455,21 @@ localfolders = ~/Test
 
 # This option stands in the [Repository LocalExample] section.
 #
+# startdate syncs mails starting from a given date. It applies the date
+# restriction to LocalExample only. The remote repository MUST be empty
+# at the first sync where this option is used.
+#
+# Unlike maxage, this is supported for IMAP-IMAP sync.
+#
+# startdate can't be used with maxage.
+#
+# The startdate option expects a date in the format yyyy-mm-dd.
+#
+#startdate = 2015-04-01
+
+
+# This option stands in the [Repository LocalExample] section.
+#
 # Some users may not want the atime (last access time) of folders to be
 # modified by OfflineIMAP.  If 'restoreatime' is set to yes, OfflineIMAP
 # will restore the atime of the "new" and "cur" folders in each maildir
diff --git a/offlineimap/accounts.py b/offlineimap/accounts.py
index cac4d88..b192eca 100644
--- a/offlineimap/accounts.py
+++ b/offlineimap/accounts.py
@@ -17,10 +17,11 @@
 from subprocess import Popen, PIPE
 from threading import Event
 import os
+import time
 from sys import exc_info
 import traceback
 
-from offlineimap import mbnames, CustomConfig, OfflineImapError
+from offlineimap import mbnames, CustomConfig, OfflineImapError, imaplibutil
 from offlineimap import globals
 from offlineimap.repository import Repository
 from offlineimap.ui import getglobalui
@@ -402,6 +403,87 @@ def syncfolder(account, remotefolder, quick):
 
     Filtered folders on the remote side will not invoke this function."""
 
+    def check_uid_validity(localfolder, remotefolder, statusfolder):
+        # If either the local or the status folder has messages and
+        # there is a UID validity problem, warn and abort.  If there are
+        # no messages, UW IMAPd loses UIDVALIDITY.  But we don't really
+        # need it if both local folders are empty.  So, in that case,
+        # just save it off.
+        if localfolder.getmessagecount() or statusfolder.getmessagecount():
+            if not localfolder.check_uidvalidity():
+                ui.validityproblem(localfolder)
+                localrepos.restore_atime()
+                return
+            if not remotefolder.check_uidvalidity():
+                ui.validityproblem(remotefolder)
+                localrepos.restore_atime()
+                return
+        else:
+            # Both folders empty, just save new UIDVALIDITY
+            localfolder.save_uidvalidity()
+            remotefolder.save_uidvalidity()
+
+    def save_min_uid(folder, min_uid):
+        uidfile = folder.get_min_uid_file()
+        fd = open(uidfile, 'wt')
+        fd.write(str(min_uid) + "\n")
+        fd.close()
+
+    def cachemessagelists_by_date(localfolder, remotefolder, date):
+        """ Returns messages with uid > min(uids of within-date
+            messages)."""
+
+        localfolder.cachemessagelist(min_date=date)
+        check_uid_validity(localfolder, remotefolder, statusfolder)
+        # local messagelist had date restriction applied already. Restrict
+        # sync to messages with UIDs >= min_uid from this list.
+        #
+        # local messagelist might contain new messages (with uid's < 0).
+        positive_uids = filter(
+            lambda uid: uid > 0, localfolder.getmessageuidlist())
+        if len(positive_uids) > 0:
+            remotefolder.cachemessagelist(min_uid=min(positive_uids))
+        else:
+            # No messages with UID > 0 in range in localfolder.
+            # date restriction was applied with respect to local dates but
+            # remote folder timezone might be different from local, so be
+            # safe and make sure the range isn't bigger than in local.
+            remotefolder.cachemessagelist(
+                min_date=time.gmtime(time.mktime(date) + 24*60*60))
+
+    def cachemessagelists_startdate(new, partial, date):
+        """ Retrieve messagelists when startdate has been set for
+        the folder 'partial'.
+
+        Idea: suppose you want to clone the messages after date in one
+        account (partial) to a new one (new). If new is empty, then copy
+        messages in partial newer than date to new, and keep track of the
+        min uid. On subsequent syncs, sync all the messages in new against
+        those after that min uid in partial. This is a partial replacement
+        for maxage in the IMAP-IMAP sync case, where maxage doesn't work:
+        the UIDs of the messages in localfolder might not be in the same
+        order as those of corresponding messages in remotefolder, so if L in
+        local corresponds to R in remote, the ranges [L, ...] and [R, ...]
+        might not correspond. But, if we're cloning a folder into a new one,
+        [min_uid, ...] does correspond to [1, ...].
+
+        This is just for IMAP-IMAP. For Maildir-IMAP, use maxage instead.
+        """
+
+        new.cachemessagelist()
+        if not new.getmessageuidlist():
+            partial.cachemessagelist(min_date=date)
+            uids = partial.getmessageuidlist()
+            if len(uids) > 0:
+                min_uid = min(uids)
+            else:
+                min_uid = 1
+            save_min_uid(partial, min_uid)
+        else:
+            min_uid = partial.retrieve_min_uid()
+            partial.cachemessagelist(min_uid=min_uid)
+
+
     remoterepos = account.remoterepos
     localrepos = account.localrepos
     statusrepos = account.statusrepos
@@ -429,43 +511,46 @@ def syncfolder(account, remotefolder, quick):
 
         statusfolder.cachemessagelist()
 
-        if quick:
-            if (not localfolder.quickchanged(statusfolder) and
-                not remotefolder.quickchanged(statusfolder)):
-                ui.skippingfolder(remotefolder)
-                localrepos.restore_atime()
-                return
 
         # Load local folder.
         ui.syncingfolder(remoterepos, remotefolder, localrepos, localfolder)
-        ui.loadmessagelist(localrepos, localfolder)
-        localfolder.cachemessagelist()
-        ui.messagelistloaded(localrepos, localfolder, localfolder.getmessagecount())
 
-        # If either the local or the status folder has messages and
-        # there is a UID validity problem, warn and abort.  If there are
-        # no messages, UW IMAPd loses UIDVALIDITY.  But we don't really
-        # need it if both local folders are empty.  So, in that case,
-        # just save it off.
-        if localfolder.getmessagecount() or statusfolder.getmessagecount():
-            if not localfolder.check_uidvalidity():
-                ui.validityproblem(localfolder)
-                localrepos.restore_atime()
-                return
-            if not remotefolder.check_uidvalidity():
-                ui.validityproblem(remotefolder)
-                localrepos.restore_atime()
-                return
+        # Retrieve messagelists, taking into account age-restriction
+        # options
+        maxage = localfolder.getmaxage()
+        localstart = localfolder.getstartdate()
+        remotestart = remotefolder.getstartdate()
+        if (maxage != None) + (localstart != None) + (remotestart != None) > 1:
+            raise OfflineImapError("You can set at most one of the "
+                "following: maxage, startdate (for the local folder), "
+                "startdate (for the remote folder)",
+                OfflineImapError.ERROR.REPO), None, exc_info()[2]
+        if (maxage != None or localstart or remotestart) and quick:
+            # IMAP quickchanged isn't compatible with options that
+            # involve restricting the messagelist, since the "quick"
+            # check can only retrieve a full list of UIDs in the folder.
+            ui.warn("Quick syncs (-q) not supported in conjunction "
+                "with maxage or startdate; ignoring -q.")
+        if maxage != None:
+            cachemessagelists_by_date(localfolder, remotefolder, maxage)
+        elif localstart != None:
+            cachemessagelists_startdate(remotefolder, localfolder,
+                localstart)
+            check_uid_validity(localfolder, remotefolder, statusfolder)
+        elif remotestart != None:
+            cachemessagelists_startdate(localfolder, remotefolder,
+                remotestart)
+            check_uid_validity(localfolder, remotefolder, statusfolder)
         else:
-            # Both folders empty, just save new UIDVALIDITY
-            localfolder.save_uidvalidity()
-            remotefolder.save_uidvalidity()
-
-        # Load remote folder.
-        ui.loadmessagelist(remoterepos, remotefolder)
-        remotefolder.cachemessagelist()
-        ui.messagelistloaded(remoterepos, remotefolder,
-                             remotefolder.getmessagecount())
+            localfolder.cachemessagelist()
+            if quick:
+                if (not localfolder.quickchanged(statusfolder) and
+                    not remotefolder.quickchanged(statusfolder)):
+                    ui.skippingfolder(remotefolder)
+                    localrepos.restore_atime()
+                    return
+            check_uid_validity(localfolder, remotefolder, statusfolder)
+            remotefolder.cachemessagelist()
 
         # Synchronize remote changes.
         if not localrepos.getconfboolean('readonly', False):
diff --git a/offlineimap/folder/Base.py b/offlineimap/folder/Base.py
index 16b5819..e52bec8 100644
--- a/offlineimap/folder/Base.py
+++ b/offlineimap/folder/Base.py
@@ -17,6 +17,7 @@
 
 import os.path
 import re
+import time
 from sys import exc_info
 
 from offlineimap import threadutil, emailutil
@@ -298,6 +299,72 @@ class BaseFolder(object):
 
         raise NotImplementedError
 
+    def getmaxage(self):
+        """ maxage is allowed to be either an integer or a date of the
+        form YYYY-mm-dd. This returns a time_struct. """
+
+        maxagestr = self.config.getdefault("Account %s"%
+            self.accountname, "maxage", None)
+        if not maxagestr:
+            return None
+        # is it a number?
+        try:
+            maxage = int(maxagestr)
+            return time.gmtime(time.time() - 60*60*24*maxage)
+        except ValueError:
+            pass
+        # is it a date string?
+        try:
+            date = time.strptime(maxagestr, "%Y-%m-%d")
+            if date[0] < 1900:
+                raise OfflineImapError("maxage led to year %d. "
+                    "Abort syncing."% date[0],
+                    OfflineImapError.ERROR.MESSAGE)
+            return date
+        except ValueError:
+            raise OfflineImapError("invalid maxage value %s",
+                OfflineImapError.ERROR.MESSAGE)
+
+    def getmaxsize(self):
+        return self.config.getdefaultint("Account %s"%
+            self.accountname, "maxsize", None)
+
+    def getstartdate(self):
+        """ Retrieve the value of the configuration option startdate """
+        datestr = self.config.getdefault("Repository " + self.repository.name,
+            'startdate', None)
+        try:
+            if not datestr:
+                return None
+            date = time.strptime(datestr, "%Y-%m-%d")
+            if date[0] < 1900:
+                raise OfflineImapError("startdate led to year %d. "
+                    "Abort syncing."% date[0],
+                    OfflineImapError.ERROR.MESSAGE)
+            return date
+        except ValueError:
+            raise OfflineImapError("invalid startdate value %s",
+                OfflineImapError.ERROR.MESSAGE)
+
+    def get_min_uid_file(self):
+        startuiddir = os.path.join(self.config.getmetadatadir(),
+            'Repository-' + self.repository.name, 'StartUID')
+        if not os.path.exists(startuiddir):
+            os.mkdir(startuiddir, 0o700)
+        return os.path.join(startuiddir, self.getfolderbasename())
+
+    def retrieve_min_uid(self):
+        uidfile = self.get_min_uid_file()
+        try:
+            fd = open(uidfile, 'rt')
+            min_uid = long(fd.readline().strip())
+            fd.close()
+            return min_uid
+        except:
+            raise IOError("Can't read %s. To start using startdate, "\
+                "folder must be empty"% uidfile)
+
+
     def savemessage(self, uid, content, flags, rtime):
         """Writes a new message, with the specified uid.
 
diff --git a/offlineimap/folder/Gmail.py b/offlineimap/folder/Gmail.py
index 1afbe47..354d544 100644
--- a/offlineimap/folder/Gmail.py
+++ b/offlineimap/folder/Gmail.py
@@ -121,16 +121,18 @@ class GmailFolder(IMAPFolder):
 
     # TODO: merge this code with the parent's cachemessagelist:
     # TODO: they have too much common logics.
-    def cachemessagelist(self):
+    def cachemessagelist(self, min_date=None, min_uid=None):
         if not self.synclabels:
-            return super(GmailFolder, self).cachemessagelist()
+            return super(GmailFolder, self).cachemessagelist(min_date=min_date,
+                min_uid=min_uid)
 
         self.messagelist = {}
 
         self.ui.collectingdata(None, self)
         imapobj = self.imapserver.acquireconnection()
         try:
-            msgsToFetch = self._msgs_to_fetch(imapobj)
+            msgsToFetch = self._msgs_to_fetch(imapobj, min_date=min_date, 
+                min_uid=min_uid)
             if not msgsToFetch:
                 return # No messages to sync
 
diff --git a/offlineimap/folder/GmailMaildir.py b/offlineimap/folder/GmailMaildir.py
index 894792d..0ae00bf 100644
--- a/offlineimap/folder/GmailMaildir.py
+++ b/offlineimap/folder/GmailMaildir.py
@@ -64,9 +64,9 @@ class GmailMaildirFolder(MaildirFolder):
                 'filename': '/no-dir/no-such-file/', 'mtime': 0}
 
 
-    def cachemessagelist(self):
+    def cachemessagelist(self, maxage=None, min_uid=None):
         if self.ismessagelistempty():
-            self.messagelist = self._scanfolder()
+            self.messagelist = self._scanfolder(maxage=maxage, min_uid=min_uid)
 
         # Get mtimes
         if self.synclabels:
diff --git a/offlineimap/folder/IMAP.py b/offlineimap/folder/IMAP.py
index 4b470a2..253ac97 100644
--- a/offlineimap/folder/IMAP.py
+++ b/offlineimap/folder/IMAP.py
@@ -18,6 +18,7 @@
 import random
 import binascii
 import re
+import os
 import time
 from sys import exc_info
 
@@ -79,6 +80,12 @@ class IMAPFolder(BaseFolder):
     def waitforthread(self):
         self.imapserver.connectionwait()
 
+    def getmaxage(self):
+        if self.config.getdefault("Account %s"%
+                self.accountname, "maxage", None):
+            raise OfflineImapError("maxage is not supported on IMAP-IMAP sync",
+                OfflineImapError.ERROR.REPO), None, exc_info()[2]
+
     # Interface from BaseFolder
     def getcopyinstancelimit(self):
         return 'MSGCOPY_' + self.repository.getname()
@@ -143,8 +150,7 @@ class IMAPFolder(BaseFolder):
             return True
         return False
 
-
-    def _msgs_to_fetch(self, imapobj):
+    def _msgs_to_fetch(self, imapobj, min_date=None, min_uid=None):
         """Determines sequence numbers of messages to be fetched.
 
         Message sequence numbers (MSNs) are more easily compacted
@@ -152,57 +158,55 @@ class IMAPFolder(BaseFolder):
 
         Arguments:
         - imapobj: instance of IMAPlib
+        - min_date (optional): a time_struct; only fetch messages newer than this
+        - min_uid (optional): only fetch messages with UID >= min_uid
+
+        This function should be called with at MOST one of min_date OR
+        min_uid set but not BOTH.
 
         Returns: range(s) for messages or None if no messages
         are to be fetched."""
 
-        res_type, imapdata = imapobj.select(self.getfullname(), True, True)
-        if imapdata == [None] or imapdata[0] == '0':
-            # Empty folder, no need to populate message list
-            return None
+        def search(search_conditions):
+            """Actually request the server with the specified conditions.
 
-        # By default examine all messages in this folder
-        msgsToFetch = '1:*'
-
-        maxage = self.config.getdefaultint(
-            "Account %s"% self.accountname, "maxage", -1)
-        maxsize = self.config.getdefaultint(
-            "Account %s"% self.accountname, "maxsize", -1)
-
-        # Build search condition
-        if (maxage != -1) | (maxsize != -1):
-            search_cond = "(";
-
-            if(maxage != -1):
-                #find out what the oldest message is that we should look at
-                oldest_struct = time.gmtime(time.time() - (60*60*24*maxage))
-                if oldest_struct[0] < 1900:
-                    raise OfflineImapError("maxage setting led to year %d. "
-                        "Abort syncing."% oldest_struct[0],
-                        OfflineImapError.ERROR.REPO)
-                search_cond += "SINCE %02d-%s-%d"% (
-                    oldest_struct[2],
-                    MonthNames[oldest_struct[1]],
-                    oldest_struct[0])
-
-            if(maxsize != -1):
-                if(maxage != -1): # There are two conditions, add space
-                    search_cond += " "
-                search_cond += "SMALLER %d"% maxsize
-
-            search_cond += ")"
-
-            res_type, res_data = imapobj.search(None, search_cond)
+            Returns: range(s) for messages or None if no messages
+            are to be fetched."""
+            res_type, res_data = imapobj.search(None, search_conditions)
             if res_type != 'OK':
                 raise OfflineImapError("SEARCH in folder [%s]%s failed. "
                     "Search string was '%s'. Server responded '[%s] %s'"% (
                     self.getrepository(), self, search_cond, res_type, res_data),
                     OfflineImapError.ERROR.FOLDER)
+            return res_data[0].split()
 
-            # Resulting MSN are separated by space, coalesce into ranges
-            msgsToFetch = imaputil.uid_sequence(res_data[0].split())
+        res_type, imapdata = imapobj.select(self.getfullname(), True, True)
+        if imapdata == [None] or imapdata[0] == '0':
+            # Empty folder, no need to populate message list.
+            return None
 
-        return msgsToFetch
+        conditions = []
+        # 1. min_uid condition.
+        if min_uid != None:
+            conditions.append("UID %d:*"% min_uid)
+        # 2. date condition.
+        elif min_date != None:
+            # Find out what the oldest message is that we should look at.
+            conditions.append("SINCE %02d-%s-%d"% (
+                min_date[2], MonthNames[min_date[1]], min_date[0]))
+        # 3. maxsize condition.
+        maxsize = self.getmaxsize()
+        if maxsize != None:
+            conditions.append("SMALLER %d"% maxsize)
+
+        if len(conditions) >= 1:
+            # Build SEARCH command.
+            search_cond = "(%s)"% ' '.join(conditions)
+            search_result = search(search_cond)
+            return imaputil.uid_sequence(search_result)
+
+        # By default consider all messages in this folder.
+        return '1:*'
 
     # Interface from BaseFolder
     def msglist_item_initializer(self, uid):
@@ -210,19 +214,21 @@ class IMAPFolder(BaseFolder):
 
 
     # Interface from BaseFolder
-    def cachemessagelist(self):
+    def cachemessagelist(self, min_date=None, min_uid=None):
+        self.ui.loadmessagelist(self.repository, self)
         self.messagelist = {}
 
         imapobj = self.imapserver.acquireconnection()
         try:
-            msgsToFetch = self._msgs_to_fetch(imapobj)
+            msgsToFetch = self._msgs_to_fetch(
+                imapobj, min_date=min_date, min_uid=min_uid)
             if not msgsToFetch:
                 return # No messages to sync
 
             # Get the flags and UIDs for these. single-quotes prevent
             # imaplib2 from quoting the sequence.
             res_type, response = imapobj.fetch("'%s'"%
-                msgsToFetch, '(FLAGS UID)')
+                msgsToFetch, '(FLAGS UID INTERNALDATE)')
             if res_type != 'OK':
                 raise OfflineImapError("FETCHING UIDs in folder [%s]%s failed. "
                     "Server responded '[%s] %s'"% (self.getrepository(), self,
@@ -247,6 +253,7 @@ class IMAPFolder(BaseFolder):
                 flags = imaputil.flagsimap2maildir(options['FLAGS'])
                 rtime = imaplibutil.Internaldate2epoch(messagestr)
                 self.messagelist[uid] = {'uid': uid, 'flags': flags, 'time': rtime}
+        self.ui.messagelistloaded(self.repository, self, self.getmessagecount())
 
     def dropmessagelistcache(self):
         self.messagelist = {}
diff --git a/offlineimap/folder/Maildir.py b/offlineimap/folder/Maildir.py
index 79c34a7..d400a3f 100644
--- a/offlineimap/folder/Maildir.py
+++ b/offlineimap/folder/Maildir.py
@@ -91,25 +91,17 @@ class MaildirFolder(BaseFolder):
         token."""
         return 42
 
-    # Checks to see if the given message is within the maximum age according
-    # to the maildir name which should begin with a timestamp
-    def _iswithinmaxage(self, messagename, maxage):
-        # In order to have the same behaviour as SINCE in an IMAP search
-        # we must convert this to the oldest time and then strip off hrs/mins
-        # from that day.
-        oldest_time_utc = time.time() - (60*60*24*maxage)
-        oldest_time_struct = time.gmtime(oldest_time_utc)
-        oldest_time_today_seconds = ((oldest_time_struct[3] * 3600) \
-            + (oldest_time_struct[4] * 60) \
-            + oldest_time_struct[5])
-        oldest_time_utc -= oldest_time_today_seconds
+    def _iswithintime(self, messagename, date):
+        """Check to see if the given message is newer than date (a
+        time_struct) according to the maildir name which should begin
+        with a timestamp."""
 
         timestampmatch = re_timestampmatch.search(messagename)
         if not timestampmatch:
             return True
         timestampstr = timestampmatch.group()
         timestamplong = long(timestampstr)
-        if(timestamplong < oldest_time_utc):
+        if(timestamplong < time.mktime(date)):
             return False
         else:
             return True
@@ -150,18 +142,21 @@ class MaildirFolder(BaseFolder):
             flags = set((c for c in flagmatch.group(1) if not c.islower()))
         return prefix, uid, fmd5, flags
 
-    def _scanfolder(self):
+    def _scanfolder(self, min_date=None, min_uid=None):
         """Cache the message list from a Maildir.
 
+        If min_date is set, this finds the min UID of all messages newer than
+        min_date and uses it as the real cutoff for considering messages.
+        This handles the edge cases where the date is much earlier than messages
+        with similar UID's (e.g. the UID was reassigned much later).
+
         Maildir flags are: R (replied) S (seen) T (trashed) D (draft) F
         (flagged).
         :returns: dict that can be used as self.messagelist.
         """
 
-        maxage = self.config.getdefaultint("Account " + self.accountname,
-                                           "maxage", None)
-        maxsize = self.config.getdefaultint("Account " + self.accountname,
-                                            "maxsize", None)
+        maxsize = self.getmaxsize()
+
         retval = {}
         files = []
         nouidcounter = -1          # Messages without UIDs get negative UIDs.
@@ -170,12 +165,11 @@ class MaildirFolder(BaseFolder):
             files.extend((dirannex, filename) for
                          filename in os.listdir(fulldirname))
 
+        date_excludees = {}
         for dirannex, filename in files:
             # We store just dirannex and filename, ie 'cur/123...'
             filepath = os.path.join(dirannex, filename)
-            # Check maxage/maxsize if this message should be considered.
-            if maxage and not self._iswithinmaxage(filename, maxage):
-                continue
+            # Check maxsize if this message should be considered.
             if maxsize and (os.path.getsize(os.path.join(
                         self.getfullname(), filepath)) > maxsize):
                 continue
@@ -192,16 +186,43 @@ class MaildirFolder(BaseFolder):
                     nouidcounter -= 1
                 else:
                     uid = long(uidmatch.group(1))
-            # 'filename' is 'dirannex/filename', e.g. cur/123,U=1,FMD5=1:2,S
-            retval[uid] = self.msglist_item_initializer(uid)
-            retval[uid]['flags'] = flags
-            retval[uid]['filename'] = filepath
+            if min_uid != None and uid > 0 and uid < min_uid:
+                continue
+            if min_date != None and not self._iswithintime(filename, min_date):
+                # Keep track of messages outside of the time limit, because they
+                # still might have UID > min(UIDs of within-min_date). We hit
+                # this case for maxage if any message had a known/valid datetime
+                # and was re-uploaded because the UID in the filename got lost
+                # (e.g. local copy/move). On next sync, it was assigned a new
+                # UID from the server and will be included in the SEARCH
+                # condition. So, we must re-include them later in this method
+                # in order to avoid inconsistent lists of messages.
+                date_excludees[uid] = self.msglist_item_initializer(uid)
+                date_excludees[uid]['flags'] = flags
+                date_excludees[uid]['filename'] = filepath
+            else:
+                # 'filename' is 'dirannex/filename', e.g. cur/123,U=1,FMD5=1:2,S
+                retval[uid] = self.msglist_item_initializer(uid)
+                retval[uid]['flags'] = flags
+                retval[uid]['filename'] = filepath
+        if min_date != None:
+            # Re-include messages with high enough uid's.
+            positive_uids = filter(lambda uid: uid > 0, retval)
+            if positive_uids:
+                min_uid = min(positive_uids)
+                for uid in date_excludees.keys():
+                    if uid > min_uid:
+                        # This message was originally excluded because of
+                        # its date. It is re-included now because we want all
+                        # messages with UID > min_uid.
+                        retval[uid] = date_excludees[uid]
         return retval
 
     # Interface from BaseFolder
     def quickchanged(self, statusfolder):
-        """Returns True if the Maildir has changed"""
-        self.cachemessagelist()
+        """Returns True if the Maildir has changed
+
+        Assumes cachemessagelist() has already been called """
         # Folder has different uids than statusfolder => TRUE.
         if sorted(self.getmessageuidlist()) != \
                 sorted(statusfolder.getmessageuidlist()):
@@ -218,9 +239,12 @@ class MaildirFolder(BaseFolder):
         return {'flags': set(), 'filename': '/no-dir/no-such-file/'}
 
     # Interface from BaseFolder
-    def cachemessagelist(self):
+    def cachemessagelist(self, min_date=None, min_uid=None):
         if self.ismessagelistempty():
-            self.messagelist = self._scanfolder()
+            self.ui.loadmessagelist(self.repository, self)
+            self.messagelist = self._scanfolder(min_date=min_date,
+                min_uid=min_uid)
+            self.ui.messagelistloaded(self.repository, self, self.getmessagecount())
 
     # Interface from BaseFolder
     def getmessagelist(self):
-- 
2.3.5






More information about the OfflineIMAP-project mailing list