[PATCH v4] make maxage use UIDs to avoid timezone issues

Nicolas Sebrecht nicolas.s-dev at laposte.net
Wed Mar 25 14:32:00 UTC 2015


From: Janna Martl <janna.martl109 at gmail.com>

When using maxage, local and remote messagelists are supposed to only
contain messages from at most maxage days ago. But local and remote used
different timezones to calculate what "maxage days ago" means, resulting
in loss of mail. Now, we ask the local folder for maxage days' worth of
mail, find the lowest UID, and then ask the remote folder for all UID's
starting with that lowest one.

Signed-off-by: Janna Martl <janna.martl109 at gmail.com>
Improved-by: Nicolas Sebrecht <nicolas.s-dev at laposte.net>
Signed-off-by: Nicolas Sebrecht <nicolas.s-dev at laposte.net>
---

I forgot to include a floating commit in my previous patch, sorry.


NOTE: I played a bit with this patch and could find inconsistent messages.
Testing case: for an already synced mailbox I've added 'maxage = 2'.
Issue:
  At 1st sync, OfflineIMAP pretends to remove mails not within maxage on remote
  IMAP. Then, I disable maxage and the alleged deleted mails are synced back
  from IMAP to Maildir (no dups on Maildir).
Issue reproducible.


Interdiff:
  diff --git a/offlineimap/folder/IMAP.py b/offlineimap/folder/IMAP.py
  index f825104..25dd26d 100644
  --- a/offlineimap/folder/IMAP.py
  +++ b/offlineimap/folder/IMAP.py
  @@ -195,10 +195,10 @@ class IMAPFolder(BaseFolder):
                   raise OfflineImapError("maxage setting led to year %d. "
                       "Abort syncing."% oldest_struct[0],
                       OfflineImapError.ERROR.REPO)
  -            conditions.append("SINCE %02d-%s-%d"%
  +            conditions.append("SINCE %02d-%s-%d"% (
                   oldest_struct[2],
                   MonthNames[oldest_struct[1]],
  -                oldest_struct[0])
  +                oldest_struct[0]))
           # 3. maxsize condition.
           maxsize = self.getmaxsize()
           if maxsize != None:


 docs/offlineimap.txt               |  18 +++++-
 offlineimap.conf                   |   9 ++-
 offlineimap/accounts.py            |  41 ++++++++++---
 offlineimap/folder/Base.py         |   8 +++
 offlineimap/folder/Gmail.py        |   5 +-
 offlineimap/folder/GmailMaildir.py |   4 +-
 offlineimap/folder/IMAP.py         | 122 ++++++++++++++++++++++++-------------
 offlineimap/folder/Maildir.py      |  66 ++++++++++++++------
 8 files changed, 195 insertions(+), 78 deletions(-)

diff --git a/docs/offlineimap.txt b/docs/offlineimap.txt
index 858fc0b..618f2ab 100644
--- a/docs/offlineimap.txt
+++ b/docs/offlineimap.txt
@@ -135,7 +135,8 @@ Ignore any autorefresh setting in the configuration file.
   Run only quick synchronizations.
 +
 Ignore any flag updates on IMAP servers. If a flag on the remote IMAP changes,
-and we have the message locally, it will be left untouched in a quick run.
+and we have the message locally, it will be left untouched in a quick run. This
+option is ignored if maxage is set.
 
 
 -u <UI>::
@@ -400,8 +401,19 @@ If you then point your local mutt, or whatever MUA you use to `~/mail/`
 as root, it should still recognize all folders.
 
 
-Authors
--------
+* Edge cases with maxage causing too many messages to be synced.
++
+All messages from at most maxage days ago (+/- a few hours, depending on
+timezones) are synced, but there are cases in which older messages can also be
+synced. This happens when a message's UID is significantly higher than those of
+other messages with similar dates, e.g. when messages are added to the local
+folder behind offlineimap's back, causing them to get assigned a new UID, or
+when offlineimap first syncs a pre-existing Maildir. In the latter case, it
+could appear as if a noticeable and random subset of old messages are synced.
+
+
+Main authors
+------------
 
   John Goerzen, Sebastian Spaetz, Eygene Ryabinkin, Nicolas Sebrecht.
 
diff --git a/offlineimap.conf b/offlineimap.conf
index c447d7b..201135a 100644
--- a/offlineimap.conf
+++ b/offlineimap.conf
@@ -260,6 +260,8 @@ remoterepository = RemoteExample
 # This option stands in the [Account Test] section.
 #
 # OfflineImap can replace a number of full updates by quick synchronizations.
+# This option is ignored if maxage is used.
+#
 # It only synchronizes a folder if
 #
 #   1) a Maildir folder has changed
@@ -337,9 +339,10 @@ remoterepository = RemoteExample
 # changed, they will not be deleted, etc.  For OfflineIMAP it will be like these
 # messages do not exist.  This will perform an IMAP search in the case of IMAP
 # or Gmail and therefore requires that the server support server side searching.
-# This will calculate the earliest day that would be included in the search and
-# include all messages from that day until today. The maxage option expects an
-# integer (for the number of days).
+#
+# Known edge cases are described in the offlineimap(1).
+#
+# The maxage option expects an integer (for the number of days).
 #
 #maxage = 3
 
diff --git a/offlineimap/accounts.py b/offlineimap/accounts.py
index cac4d88..a7ad623 100644
--- a/offlineimap/accounts.py
+++ b/offlineimap/accounts.py
@@ -429,19 +429,28 @@ def syncfolder(account, remotefolder, quick):
 
         statusfolder.cachemessagelist()
 
-        if quick:
-            if (not localfolder.quickchanged(statusfolder) and
-                not remotefolder.quickchanged(statusfolder)):
-                ui.skippingfolder(remotefolder)
-                localrepos.restore_atime()
-                return
+        maxage = localfolder.getmaxage()
 
         # Load local folder.
         ui.syncingfolder(remoterepos, remotefolder, localrepos, localfolder)
         ui.loadmessagelist(localrepos, localfolder)
-        localfolder.cachemessagelist()
+        # For maxage, returns messages with uid > min(uids of within-maxage messages).
+        localfolder.cachemessagelist(maxage=maxage)
         ui.messagelistloaded(localrepos, localfolder, localfolder.getmessagecount())
 
+        if quick:
+            # IMAP quickchanged isn't compatible with maxage, since the "quick"
+            # check can only retrieve a full list of UIDs in the folder.
+            if maxage:
+                ui.warn("Quick syncs (-q) not supported in conjunction "\
+                    "with maxage; ignoring -q.")
+            else:
+                if (not localfolder.quickchanged(statusfolder) and
+                    not remotefolder.quickchanged(statusfolder)):
+                    ui.skippingfolder(remotefolder)
+                    localrepos.restore_atime()
+                    return
+
         # If either the local or the status folder has messages and
         # there is a UID validity problem, warn and abort.  If there are
         # no messages, UW IMAPd loses UIDVALIDITY.  But we don't really
@@ -463,7 +472,23 @@ def syncfolder(account, remotefolder, quick):
 
         # Load remote folder.
         ui.loadmessagelist(remoterepos, remotefolder)
-        remotefolder.cachemessagelist()
+        if maxage != None:
+            # local messagelist was applied maxage, already. Restrict sync for
+            # messages with UIDs >= min_uid from this list.
+            #
+            # local messagelist might contain new messages (with uid's < 0).
+            positive_uids = filter(
+                lambda uid: uid > 0, localfolder.getmessageuidlist())
+            if len(positive_uids) > 0:
+                remotefolder.cachemessagelist(min_uid=min(positive_uids))
+            else:
+                # No messages with UID > 0 in range in localfolder.
+                # maxage was applied with respect to local dates but
+                # remote folder timezone might be different from local, so be
+                # safe and make sure the range isn't bigger than in local.
+                remotefolder.cachemessagelist(maxage=maxage - 1)
+        else:
+            remotefolder.cachemessagelist()
         ui.messagelistloaded(remoterepos, remotefolder,
                              remotefolder.getmessagecount())
 
diff --git a/offlineimap/folder/Base.py b/offlineimap/folder/Base.py
index e3c80b7..123508c 100644
--- a/offlineimap/folder/Base.py
+++ b/offlineimap/folder/Base.py
@@ -298,6 +298,14 @@ class BaseFolder(object):
 
         raise NotImplementedError
 
+    def getmaxage(self):
+        return self.config.getdefaultint("Account %s"%
+            self.accountname, "maxage", None)
+
+    def getmaxsize(self):
+        return self.config.getdefaultint("Account %s"%
+            self.accountname, "maxsize", None)
+
     def savemessage(self, uid, content, flags, rtime):
         """Writes a new message, with the specified uid.
 
diff --git a/offlineimap/folder/Gmail.py b/offlineimap/folder/Gmail.py
index 1afbe47..93e8eee 100644
--- a/offlineimap/folder/Gmail.py
+++ b/offlineimap/folder/Gmail.py
@@ -121,9 +121,10 @@ class GmailFolder(IMAPFolder):
 
     # TODO: merge this code with the parent's cachemessagelist:
     # TODO: they have too much common logics.
-    def cachemessagelist(self):
+    def cachemessagelist(self, maxage=None, min_uid=None):
         if not self.synclabels:
-            return super(GmailFolder, self).cachemessagelist()
+            return super(GmailFolder, self).cachemessagelist(maxage=maxage,
+                min_uid=min_uid)
 
         self.messagelist = {}
 
diff --git a/offlineimap/folder/GmailMaildir.py b/offlineimap/folder/GmailMaildir.py
index 894792d..0ae00bf 100644
--- a/offlineimap/folder/GmailMaildir.py
+++ b/offlineimap/folder/GmailMaildir.py
@@ -64,9 +64,9 @@ class GmailMaildirFolder(MaildirFolder):
                 'filename': '/no-dir/no-such-file/', 'mtime': 0}
 
 
-    def cachemessagelist(self):
+    def cachemessagelist(self, maxage=None, min_uid=None):
         if self.ismessagelistempty():
-            self.messagelist = self._scanfolder()
+            self.messagelist = self._scanfolder(maxage=maxage, min_uid=min_uid)
 
         # Get mtimes
         if self.synclabels:
diff --git a/offlineimap/folder/IMAP.py b/offlineimap/folder/IMAP.py
index 4b470a2..25dd26d 100644
--- a/offlineimap/folder/IMAP.py
+++ b/offlineimap/folder/IMAP.py
@@ -144,7 +144,7 @@ class IMAPFolder(BaseFolder):
         return False
 
 
-    def _msgs_to_fetch(self, imapobj):
+    def _msgs_to_fetch(self, imapobj, maxage=None, min_uid=None):
         """Determines sequence numbers of messages to be fetched.
 
         Message sequence numbers (MSNs) are more easily compacted
@@ -152,57 +152,95 @@ class IMAPFolder(BaseFolder):
 
         Arguments:
         - imapobj: instance of IMAPlib
+        - maxage (optional): only fetch messages up to maxage days ago
+        - min_uid (optional): only fetch messages with UID >= min_uid
+
+        This function should be called with at MOST one of maxage OR
+        min_uid set but not BOTH.
 
         Returns: range(s) for messages or None if no messages
         are to be fetched."""
 
-        res_type, imapdata = imapobj.select(self.getfullname(), True, True)
-        if imapdata == [None] or imapdata[0] == '0':
-            # Empty folder, no need to populate message list
-            return None
+        def search(search_conditions):
+            """Actually request the server with the specified conditions.
 
-        # By default examine all messages in this folder
-        msgsToFetch = '1:*'
-
-        maxage = self.config.getdefaultint(
-            "Account %s"% self.accountname, "maxage", -1)
-        maxsize = self.config.getdefaultint(
-            "Account %s"% self.accountname, "maxsize", -1)
-
-        # Build search condition
-        if (maxage != -1) | (maxsize != -1):
-            search_cond = "(";
-
-            if(maxage != -1):
-                #find out what the oldest message is that we should look at
-                oldest_struct = time.gmtime(time.time() - (60*60*24*maxage))
-                if oldest_struct[0] < 1900:
-                    raise OfflineImapError("maxage setting led to year %d. "
-                        "Abort syncing."% oldest_struct[0],
-                        OfflineImapError.ERROR.REPO)
-                search_cond += "SINCE %02d-%s-%d"% (
-                    oldest_struct[2],
-                    MonthNames[oldest_struct[1]],
-                    oldest_struct[0])
-
-            if(maxsize != -1):
-                if(maxage != -1): # There are two conditions, add space
-                    search_cond += " "
-                search_cond += "SMALLER %d"% maxsize
-
-            search_cond += ")"
-
-            res_type, res_data = imapobj.search(None, search_cond)
+            Returns: range(s) for messages or None if no messages
+            are to be fetched."""
+            res_type, res_data = imapobj.search(None, search_conditions)
             if res_type != 'OK':
                 raise OfflineImapError("SEARCH in folder [%s]%s failed. "
                     "Search string was '%s'. Server responded '[%s] %s'"% (
                     self.getrepository(), self, search_cond, res_type, res_data),
                     OfflineImapError.ERROR.FOLDER)
+            return res_data[0].split()
 
+        res_type, imapdata = imapobj.select(self.getfullname(), True, True)
+        if imapdata == [None] or imapdata[0] == '0':
+            # Empty folder, no need to populate message list.
+            return None
+
+        conditions = []
+        # 1. min_uid condition.
+        if min_uid != None:
+            conditions.append("UID %d:*"% min_uid)
+        # 2. maxage condition.
+        elif maxage != None:
+            # Find out what the oldest message is that we should look at.
+            # FIXME: we are checking maxage validity way too late. Also, we are
+            # doing similar computing in MaildirFolder()._iswithinmaxage()
+            # WITHOUT this sanity check. We really want to work with
+            # oldest_struct as soon as possible.
+            oldest_struct = time.gmtime(time.time() - (60*60*24*maxage))
+            if oldest_struct[0] < 1900:
+                raise OfflineImapError("maxage setting led to year %d. "
+                    "Abort syncing."% oldest_struct[0],
+                    OfflineImapError.ERROR.REPO)
+            conditions.append("SINCE %02d-%s-%d"% (
+                oldest_struct[2],
+                MonthNames[oldest_struct[1]],
+                oldest_struct[0]))
+        # 3. maxsize condition.
+        maxsize = self.getmaxsize()
+        if maxsize != None:
+            conditions.append("SMALLER %d"% maxsize)
+
+        if len(conditions) > 1:
+            # Build SEARCH command.
+            if maxage == None:
+                search_cond = "(%s)"% ' '.join(conditions)
+                search_result = search(search_cond)
+            else:
+                # Get the messages within maxage is not enough. We want all
+                # messages with UID > min_uid from these within-maxage messages.
+                # We can't rely on maxage only to get the proper min_uid because
+                # the internal date used by the SINCE command might sightly
+                # diverge from the date and time the message was assigned its
+                # UID. Same logic as applied for the Maildir, we have to
+                # re-include some messages.
+                #
+                # Ordering by UID is the same as ordering by MSN, so we get the
+                # messages with MSN > min_msn of the within-maxage messages.
+                msg_seq_numbers = map(lambda s : int(s), search_result)
+                if len(msg_seq_numbers) < 1:
+                    return None # Nothing to sync.
+                min_msn = min(msg_seq_numbers)
+                # If no maxsize, can just ask for all messages with MSN > min_msn.
+                search_cond = "%d:*"% min_msn
+                if maxsize != None:
+                    # Restrict the range min_msn:* to those with acceptable size.
+                    # Single-quotes prevent imaplib2 from quoting the sequence.
+                    search_cond = "'%s (SMALLER %d)'"% (min_msn, maxsize)
+                # Having to make a second query sucks but this is only for
+                # IMAP/IMAP configurations with maxage enabled. We assume this
+                # is not so common. This time overhead should be acceptable
+                # regarding the benefits introduced by all the avoided sync of
+                # maxage.
+                search_result = search(search_cond)
             # Resulting MSN are separated by space, coalesce into ranges
-            msgsToFetch = imaputil.uid_sequence(res_data[0].split())
+            return imaputil.uid_sequence(search_result)
 
-        return msgsToFetch
+        # By default consider all messages in this folder.
+        return '1:*'
 
     # Interface from BaseFolder
     def msglist_item_initializer(self, uid):
@@ -210,19 +248,19 @@ class IMAPFolder(BaseFolder):
 
 
     # Interface from BaseFolder
-    def cachemessagelist(self):
+    def cachemessagelist(self, maxage=None, min_uid=None):
         self.messagelist = {}
 
         imapobj = self.imapserver.acquireconnection()
         try:
-            msgsToFetch = self._msgs_to_fetch(imapobj)
+            msgsToFetch = self._msgs_to_fetch(imapobj, maxage=maxage, min_uid=min_uid)
             if not msgsToFetch:
                 return # No messages to sync
 
             # Get the flags and UIDs for these. single-quotes prevent
             # imaplib2 from quoting the sequence.
             res_type, response = imapobj.fetch("'%s'"%
-                msgsToFetch, '(FLAGS UID)')
+                msgsToFetch, '(FLAGS UID INTERNALDATE)')
             if res_type != 'OK':
                 raise OfflineImapError("FETCHING UIDs in folder [%s]%s failed. "
                     "Server responded '[%s] %s'"% (self.getrepository(), self,
diff --git a/offlineimap/folder/Maildir.py b/offlineimap/folder/Maildir.py
index 6df8180..ddc66bf 100644
--- a/offlineimap/folder/Maildir.py
+++ b/offlineimap/folder/Maildir.py
@@ -94,7 +94,7 @@ class MaildirFolder(BaseFolder):
     # Checks to see if the given message is within the maximum age according
     # to the maildir name which should begin with a timestamp
     def _iswithinmaxage(self, messagename, maxage):
-        # In order to have the same behaviour as SINCE in an IMAP search
+        # In order to have similar behaviour as SINCE in an IMAP search
         # we must convert this to the oldest time and then strip off hrs/mins
         # from that day.
         oldest_time_utc = time.time() - (60*60*24*maxage)
@@ -150,18 +150,21 @@ class MaildirFolder(BaseFolder):
             flags = set((c for c in flagmatch.group(1) if not c.islower()))
         return prefix, uid, fmd5, flags
 
-    def _scanfolder(self):
+    def _scanfolder(self, maxage=None):
         """Cache the message list from a Maildir.
 
+        If using maxage, this finds the min UID of all messages within maxage
+        and use it as the real cursor up to where we go back in time. This handles
+        the edge cases where the date is much earlier than messages with similar
+        UID's (e.g. the UID was reassigned much later).
+
         Maildir flags are: R (replied) S (seen) T (trashed) D (draft) F
         (flagged).
         :returns: dict that can be used as self.messagelist.
         """
 
-        maxage = self.config.getdefaultint("Account " + self.accountname,
-                                           "maxage", None)
-        maxsize = self.config.getdefaultint("Account " + self.accountname,
-                                            "maxsize", None)
+        maxsize = self.getmaxsize()
+
         retval = {}
         files = []
         nouidcounter = -1          # Messages without UIDs get negative UIDs.
@@ -170,12 +173,11 @@ class MaildirFolder(BaseFolder):
             files.extend((dirannex, filename) for
                          filename in os.listdir(fulldirname))
 
+        maxage_excludees = {}
         for dirannex, filename in files:
             # We store just dirannex and filename, ie 'cur/123...'
             filepath = os.path.join(dirannex, filename)
-            # Check maxage/maxsize if this message should be considered.
-            if maxage and not self._iswithinmaxage(filename, maxage):
-                continue
+            # Check maxsize if this message should be considered.
             if maxsize and (os.path.getsize(os.path.join(
                         self.getfullname(), filepath)) > maxsize):
                 continue
@@ -192,16 +194,41 @@ class MaildirFolder(BaseFolder):
                     nouidcounter -= 1
                 else:
                     uid = long(uidmatch.group(1))
-            # 'filename' is 'dirannex/filename', e.g. cur/123,U=1,FMD5=1:2,S
-            retval[uid] = self.msglist_item_initializer(uid)
-            retval[uid]['flags'] = flags
-            retval[uid]['filename'] = filepath
+            if maxage != None and not self._iswithinmaxage(filename, maxage):
+                # Keep track of messages outside of maxage, because they still
+                # might have UID > min(UIDs of within-maxage). We hit this case
+                # if any message had a known/valid datetime and was re-uploaded
+                # because the UID in the filename got lost (e.g. local
+                # copy/move).  On next sync, it was assigned a new UID from the
+                # server and will be included in the SEARCH condition.  So, we
+                # must re-include them later in this method in order to avoid
+                # inconsistent lists of messages.
+                maxage_excludees[uid] = self.msglist_item_initializer(uid)
+                maxage_excludees[uid]['flags'] = flags
+                maxage_excludees[uid]['filename'] = filepath
+            else:
+                # 'filename' is 'dirannex/filename', e.g. cur/123,U=1,FMD5=1:2,S
+                retval[uid] = self.msglist_item_initializer(uid)
+                retval[uid]['flags'] = flags
+                retval[uid]['filename'] = filepath
+        if maxage != None:
+            # Re-include messages with high enough uid's.
+            positive_uids = filter(lambda uid: uid > 0, retval)
+            if positive_uids:
+                min_uid = min(positive_uids)
+                for uid in maxage_excludees.keys():
+                    if uid > min_uid:
+                        # This message was originally excluded because of
+                        # maxage. It is re-included now because we want all
+                        # messages with UID > min_uid.
+                        retval[uid] = maxage_excludees[uid]
         return retval
 
     # Interface from BaseFolder
     def quickchanged(self, statusfolder):
-        """Returns True if the Maildir has changed"""
-        self.cachemessagelist()
+        """Returns True if the Maildir has changed
+
+        Assumes cachemessagelist() has already been called """
         # Folder has different uids than statusfolder => TRUE.
         if sorted(self.getmessageuidlist()) != \
                 sorted(statusfolder.getmessageuidlist()):
@@ -218,9 +245,9 @@ class MaildirFolder(BaseFolder):
         return {'flags': set(), 'filename': '/no-dir/no-such-file/'}
 
     # Interface from BaseFolder
-    def cachemessagelist(self):
+    def cachemessagelist(self, maxage=None):
         if self.ismessagelistempty():
-            self.messagelist = self._scanfolder()
+            self.messagelist = self._scanfolder(maxage=maxage)
 
     # Interface from BaseFolder
     def getmessagelist(self):
@@ -445,7 +472,10 @@ class MaildirFolder(BaseFolder):
             os.unlink(filepath)
         except OSError:
             # Can't find the file -- maybe already deleted?
-            newmsglist = self._scanfolder()
+            maxage = self.getmaxage()
+            # get more than enough messages to be sure we captured the min UID
+            # expected on the other side
+            newmsglist = self._scanfolder(maxage=maxage + 1)
             if uid in newmsglist:       # Nope, try new filename.
                 filename = newmsglist[uid]['filename']
                 filepath = os.path.join(self.getfullname(), filename)
-- 
2.3.1




More information about the OfflineIMAP-project mailing list