[PATCH 1/1] change behavior of maxage to avoid timezone issues

Janna Martl janna.martl109 at gmail.com
Wed Mar 18 22:40:15 GMT 2015


The maxage feature was not taking into account the fact that IMAP SINCE queries
are with respect to the IMAP server's timezone, which we don't know. Instead of
ignoring mail that is maxage days ago, now we ignore mail that comes before
(with respect to UIDs) the first message that both local and remote, with their
different timezones, think is less than "maxage days ago".  For more details see
docs/doc-src/API.rst.

Signed-off-by: Janna Martl <janna.martl109 at gmail.com>
---
 docs/doc-src/API.rst               | 58 ++++++++++++++++++++++++++++
 offlineimap.conf                   | 20 +++++++---
 offlineimap/accounts.py            | 77 ++++++++++++++++++++++++++++++++++++--
 offlineimap/folder/Base.py         | 39 ++++++++++++-------
 offlineimap/folder/Gmail.py        |  4 +-
 offlineimap/folder/GmailMaildir.py |  8 ++--
 offlineimap/folder/IMAP.py         | 25 +++++++------
 offlineimap/folder/Maildir.py      | 22 +++++------
 offlineimap/imaplibutil.py         |  3 +-
 9 files changed, 201 insertions(+), 55 deletions(-)

diff --git a/docs/doc-src/API.rst b/docs/doc-src/API.rst
index de8b500..0c7cd49 100644
--- a/docs/doc-src/API.rst
+++ b/docs/doc-src/API.rst
@@ -84,3 +84,61 @@ prior to its first use.
 
       You can access the values of stored options using the usual
       syntax, offlineimap.globals.options.<option-name>
+
+
+A note about maxage
+===================
+When you are starting to sync an already existing account you can tell
+OfflineIMAP to only sync relatively recent messages; messages over a certain
+cutoff age will not be synced. The exact cutoff depends on the maxage
+configuration option, and on the times of the messages in the local and remote
+folders. The general strategy is as follows:
+
+syncfolder() in accounts.py makes calls::
+
+  localfolder.cachemessagelist(maxage)
+  ...
+  remotefolder.cachemessagelist(maxage) 
+
+which retrieve lists containing "maxage days" worth of messages from the local
+and remote folders.
+
+How does each folder determine what "maxage days ago" means? In
+folder/Maildir.py, _iswithinmaxage() checks to see if the timestamp at the
+beginning of the filename is younger than 00:00 on today's date (UTC) - maxage
+days. In folder/IMAP.py, _msgs_to_fetch() makes the IMAP query
+SINCE(yyyy-mm-dd), where yyyy-mm-dd is today's date (UTC) - maxage days.
+However, the IMAP server interprets yyyy-mm-dd with respect to its own timezone,
+which we don't know.
+
+So we have two different timezones, and this causes a problem. Example::
+
+                              A           E
+    (local)          |--------------|--------------->
+                   -48              0             
+                              A  B*    C*      D
+    (remote)  |--------------|--------------->
+              -48            0              
+
+where stars indicate messages not in the statusfolder (these are eligible to be
+copied but not deleted), and the numbers are times relative to "maxage days ago"
+in each folder's timezone. local.messagelist = [E] and remote.messagelist =
+[A,B*,C*,D] so, uncorrected, A would get deleted on the server.
+
+This is corrected in accounts.py, in maxage_messagelist_correction().
+max(min(local UIDs), min(remote UIDs)) is a UID that's after both zero points,
+so exclude unstarred messages after this: then B* and C* get copied, but A doesn't
+get deleted. If one of the messagelists is empty, instead fetch an extra 2 days
+of messages from this to catch and exclude messages like A. Two days is
+sufficient because the gap between the 0's is at worst 26 hours (UTC offsets run
+from -1200 to +1400).
+
+Upshot:
+
+* We never sync (delete, add a new message, sync flags) messages earlier than 2
+  + maxage days ago (worst case: 23:59 rounded down to 00:00 with an IMAP server
+  in timezone +1400).
+* We never look at messages earlier than 4 + maxage days ago.
+* In most cases, people will have enough messages that the cutoff point isn't
+  too far from 0 in the timezone with lower UTC offset.
+* New messages (newer than maxage - 1 days ago) always get synced.
diff --git a/offlineimap.conf b/offlineimap.conf
index 4c8e5d2..f5953cf 100644
--- a/offlineimap.conf
+++ b/offlineimap.conf
@@ -315,12 +315,20 @@ remoterepository = RemoteExample
 # This option stands in the [Account Test] section.
 #
 # When you are starting to sync an already existing account you can tell
-# OfflineIMAP to sync messages from only the last x days.  When you do this,
-# messages older than x days will be completely ignored.  This can be useful for
-# importing existing accounts when you do not want to download large amounts of
-# archive email.
-#
-# Messages older than maxage days will not be synced, their flags will not be
+# OfflineIMAP to only sync relatively recent messages; messages over a certain
+# cutoff age will not be synced. This can be useful for importing existing
+# accounts when you do not want to download large amounts of archive email. The
+# exact cutoff depends on the maxage configuration option, and on the times of
+# the messages in the local and remote folders. More details can be found in
+# docs/doc-src/API.rst, but the most important guarantees are:
+#   * in most scenarios, this cutoff is simply maxage days ago, +/- a few hours
+#   * offlineimap will never download or delete messages more than maxage + 2
+#   days ago
+#   * offlineimap will never fetch messages more than maxage + 4 days ago
+#   * offlineimap will always download messages that are new since the last sync
+#   (unless they are > maxage - 1 days old)
+#
+# Messages older than the cutoff will not be synced, their flags will not be
 # changed, they will not be deleted, etc.  For OfflineIMAP it will be like these
 # messages do not exist.  This will perform an IMAP search in the case of IMAP
 # or Gmail and therefore requires that the server support server side searching.
diff --git a/offlineimap/accounts.py b/offlineimap/accounts.py
index 62ed5c3..0fdb651 100644
--- a/offlineimap/accounts.py
+++ b/offlineimap/accounts.py
@@ -429,9 +429,12 @@ def syncfolder(account, remotefolder, quick):
 
         statusfolder.cachemessagelist()
 
+        maxage = localfolder.config.getdefaultint("Account %s"%
+            localfolder.accountname, "maxage", -1)
+
         if quick:
-            if (not localfolder.quickchanged(statusfolder) and
-                not remotefolder.quickchanged(statusfolder)):
+            if (not localfolder.quickchanged(statusfolder, maxage) and
+                not remotefolder.quickchanged(statusfolder, maxage)):
                 ui.skippingfolder(remotefolder)
                 localrepos.restore_atime()
                 return
@@ -439,7 +442,7 @@ def syncfolder(account, remotefolder, quick):
         # Load local folder.
         ui.syncingfolder(remoterepos, remotefolder, localrepos, localfolder)
         ui.loadmessagelist(localrepos, localfolder)
-        localfolder.cachemessagelist()
+        localfolder.cachemessagelist(maxage)
         ui.messagelistloaded(localrepos, localfolder, localfolder.getmessagecount())
 
         # If either the local or the status folder has messages and
@@ -463,10 +466,17 @@ def syncfolder(account, remotefolder, quick):
 
         # Load remote folder.
         ui.loadmessagelist(remoterepos, remotefolder)
-        remotefolder.cachemessagelist()
+        remotefolder.cachemessagelist(maxage)
         ui.messagelistloaded(remoterepos, remotefolder,
                              remotefolder.getmessagecount())
 
+        # For the maxage feature, we can't use the local and remote
+        # messagelists as is: even though they were both fetched using maxage,
+        # they might start at different times because of timezone issues.
+        if maxage != -1:
+            maxage_messagelist_correction(localfolder, remotefolder,
+                statusfolder, maxage)
+
         # Synchronize remote changes.
         if not localrepos.getconfboolean('readonly', False):
             ui.syncingmessages(remoterepos, remotefolder, localrepos, localfolder)
@@ -501,3 +511,62 @@ def syncfolder(account, remotefolder, quick):
         for folder in ["statusfolder", "localfolder", "remotefolder"]:
             if folder in locals():
                 locals()[folder].dropmessagelistcache()
+
+
+def maxage_messagelist_correction(localfolder, remotefolder, statusfolder, maxage):
+    """ Possibly shrink local or remote messagelist to avoid the following
+    problem: if remote is IMAP, then SINCE(yyyy-mm-dd) returns messages received
+    after 00:00 on yyyy-mm-dd with respect to the server's internal timezone,
+    which we don't know. This might look like:
+
+                              A        B E
+    (local)          |--------------|--------------->
+                   -48              0             
+                              A        B    C  D
+    (remote)  |--------------|--------------->
+              -48            0              
+
+    where indicated times are relative to "maxage days ago" in each folder's
+    timezone (UTC for local, unknown for remote).  local.messagelist = [B,E]
+    and remote.messagelist = [A,B,C,D] so, uncorrected, A would get deleted on
+    the server.
+    
+    Correction: max(min(local UIDs), min(remote UIDs)) is a UID that's after
+    both zero points, so restrict to messages after this. If one of the
+    messagelists is empty, instead fetch an extra 2 days of messages from this
+    to catch and exclude messages like A. This works because the gap between
+    the 0's is at worst 26 hours (-1200 to +1400).
+    """
+
+    def _exclude_invisible(reference, excludee, maxage):
+        """ Exclude messages from excludee that are too old to be seen by
+        reference.messagelist """
+        big_list = reference.scanfolder(maxage + 2)
+        for uid in excludee.messagelist.keys():
+            if uid in big_list.keys() and uid not in reference.getmessageuidlist():
+                # uid exists in reference but is too old to be seen by
+                # reference.messagelist; exclude it so it doesn't get deleted
+                del excludee.messagelist[uid]
+
+    def _exclude_low_uids(folder, statusfolder, lowest_uid):
+        """ Exclude UIDs lower than a cutoff except those eligible to be copied,
+        not deleted """
+        for uid in folder.messagelist.keys():
+            if uid < lowest_uid and uid in statusfolder.messagelist.keys():
+                del folder.messagelist[uid]
+
+    if not localfolder.messagelist and not remotefolder.messagelist:
+        ui = getglobalui()
+        ui.terminate()
+    if not localfolder.messagelist:
+        _exclude_invisible(localfolder, remotefolder, maxage)
+    if not remotefolder.messagelist:
+        _exclude_invisible(remotefolder, localfolder, maxage)
+    else:
+        lowest_local_uid = min(localfolder.getmessageuidlist())
+        lowest_remote_uid = min(remotefolder.getmessageuidlist())
+        lowest_acceptable_uid = max(lowest_local_uid, lowest_remote_uid)
+        if lowest_local_uid < lowest_remote_uid:
+            _exclude_low_uids(localfolder, statusfolder, lowest_acceptable_uid)
+        else:
+            _exclude_low_uids(remotefolder, statusfolder, lowest_acceptable_uid)
diff --git a/offlineimap/folder/Base.py b/offlineimap/folder/Base.py
index df2e654..ffb379b 100644
--- a/offlineimap/folder/Base.py
+++ b/offlineimap/folder/Base.py
@@ -112,7 +112,7 @@ class BaseFolder(object):
     # XXX: if user specifies 'quick' flag for folder that doesn't
     # XXX: support quick status queries, so one believes that quick
     # XXX: status checks will be done, but it won't really be so.
-    def quickchanged(self, statusfolder):
+    def quickchanged(self, statusfolder, maxage):
         """ Runs quick check for folder changes and returns changed
         status: True -- changed, False -- not changed.
 
@@ -695,11 +695,6 @@ class BaseFolder(object):
                 content = self.getmessage(uid)
                 rtime = emailutil.get_message_date(content, 'Date')
 
-            if uid > 0 and dstfolder.uidexists(uid):
-                # dst has message with that UID already, only update status
-                statusfolder.savemessage(uid, None, flags, rtime)
-                return
-
             # If any of the destinations actually stores the message body,
             # load it up.
             if dstfolder.storesmessages():
@@ -757,6 +752,18 @@ class BaseFolder(object):
 
         copylist = filter(lambda uid: not statusfolder.uidexists(uid),
             self.getmessageuidlist())
+
+        for uid in copylist:
+            if uid > 0 and dstfolder.uidexists(uid):
+                # dst has message with that UID already, only update status
+                flags = self.getmessageflags(uid)
+                rtime = self.getmessagetime(uid)
+                if dstfolder.utime_from_message:
+                    content = self.getmessage(uid)
+                    rtime = emailutil.get_message_date(content, 'Date')
+                statusfolder.savemessage(uid, None, flags, rtime)
+                copylist.remove(uid)
+
         num_to_copy = len(copylist)
         if num_to_copy and self.repository.account.dryrun:
             self.ui.info("[DRYRUN] Copy {0} messages from {1}[{2}] to {3}".format(
@@ -790,19 +797,23 @@ class BaseFolder(object):
         that were deleted in 'self'. Delete those from dstfolder and
         statusfolder.
 
-        This function checks and protects us from action in ryrun mode.
+        This function checks and protects us from action in dryrun mode.
         """
 
         deletelist = filter(lambda uid: uid >= 0 and not
             self.uidexists(uid), statusfolder.getmessageuidlist())
         if len(deletelist):
-            self.ui.deletingmessages(deletelist, [dstfolder])
-            if self.repository.account.dryrun:
-                return #don't delete messages in dry-run mode
-            # delete in statusfolder first to play safe. In case of abort, we
-            # won't lose message, we will just retransmit some unneccessary.
-            for folder in [statusfolder, dstfolder]:
-                folder.deletemessages(deletelist)
+            # delete messages from statusfolder that were either deleted by the
+            # user, or not being tracked because of maxage
+            statusfolder.deletemessages(deletelist)
+            deletelist = filter(lambda uid: dstfolder.uidexists(uid), deletelist)
+            if len(deletelist):
+                self.ui.deletingmessages(deletelist, [dstfolder])
+                if self.repository.account.dryrun:
+                    return #don't delete messages in dry-run mode
+                # delete in statusfolder first to play safe. In case of abort, we
+                # won't lose message, we will just retransmit some unneccessary.
+                dstfolder.deletemessages(deletelist)
 
     def __syncmessagesto_flags(self, dstfolder, statusfolder):
         """Pass 3: Flag synchronization.
diff --git a/offlineimap/folder/Gmail.py b/offlineimap/folder/Gmail.py
index 1afbe47..f2fc4ca 100644
--- a/offlineimap/folder/Gmail.py
+++ b/offlineimap/folder/Gmail.py
@@ -121,9 +121,9 @@ class GmailFolder(IMAPFolder):
 
     # TODO: merge this code with the parent's cachemessagelist:
     # TODO: they have too much common logics.
-    def cachemessagelist(self):
+    def cachemessagelist(self, maxage):
         if not self.synclabels:
-            return super(GmailFolder, self).cachemessagelist()
+            return super(GmailFolder, self).cachemessagelist(maxage)
 
         self.messagelist = {}
 
diff --git a/offlineimap/folder/GmailMaildir.py b/offlineimap/folder/GmailMaildir.py
index 894792d..25812ee 100644
--- a/offlineimap/folder/GmailMaildir.py
+++ b/offlineimap/folder/GmailMaildir.py
@@ -39,10 +39,10 @@ class GmailMaildirFolder(MaildirFolder):
         if self.synclabels:
             self.syncmessagesto_passes.append(('syncing labels', self.syncmessagesto_labels))
 
-    def quickchanged(self, statusfolder):
+    def quickchanged(self, statusfolder, maxage):
         """Returns True if the Maildir has changed. Checks uids, flags and mtimes"""
 
-        self.cachemessagelist()
+        self.cachemessagelist(maxage)
         # Folder has different uids than statusfolder => TRUE
         if sorted(self.getmessageuidlist()) != \
                 sorted(statusfolder.getmessageuidlist()):
@@ -64,9 +64,9 @@ class GmailMaildirFolder(MaildirFolder):
                 'filename': '/no-dir/no-such-file/', 'mtime': 0}
 
 
-    def cachemessagelist(self):
+    def cachemessagelist(self, maxage):
         if self.ismessagelistempty():
-            self.messagelist = self._scanfolder()
+            self.messagelist = self.scanfolder(maxage)
 
         # Get mtimes
         if self.synclabels:
diff --git a/offlineimap/folder/IMAP.py b/offlineimap/folder/IMAP.py
index ccc83b0..ac762e6 100644
--- a/offlineimap/folder/IMAP.py
+++ b/offlineimap/folder/IMAP.py
@@ -106,7 +106,7 @@ class IMAPFolder(BaseFolder):
             self.imapserver.releaseconnection(imapobj)
 
     # Interface from BaseFolder
-    def quickchanged(self, statusfolder):
+    def quickchanged(self, statusfolder, maxage):
         # An IMAP folder has definitely changed if the number of
         # messages or the UID of the last message have changed.  Otherwise
         # only flag changes could have occurred.
@@ -144,7 +144,7 @@ class IMAPFolder(BaseFolder):
         return False
 
 
-    def _msgs_to_fetch(self, imapobj):
+    def _msgs_to_fetch(self, imapobj, maxage):
         """Determines sequence numbers of messages to be fetched.
 
         Message sequence numbers (MSNs) are more easily compacted
@@ -164,8 +164,6 @@ class IMAPFolder(BaseFolder):
         # By default examine all messages in this folder
         msgsToFetch = '1:*'
 
-        maxage = self.config.getdefaultint(
-            "Account %s"% self.accountname, "maxage", -1)
         maxsize = self.config.getdefaultint(
             "Account %s"% self.accountname, "maxsize", -1)
 
@@ -210,19 +208,23 @@ class IMAPFolder(BaseFolder):
 
 
     # Interface from BaseFolder
-    def cachemessagelist(self):
-        self.messagelist = {}
+    def cachemessagelist(self, maxage):
+        if self.ismessagelistempty():
+            self.messagelist = self.scanfolder(maxage)
+
+    def scanfolder(self, maxage):
+        retval = {}
 
         imapobj = self.imapserver.acquireconnection()
         try:
-            msgsToFetch = self._msgs_to_fetch(imapobj)
+            msgsToFetch = self._msgs_to_fetch(imapobj, maxage)
             if not msgsToFetch:
                 return # No messages to sync
 
             # Get the flags and UIDs for these. single-quotes prevent
             # imaplib2 from quoting the sequence.
             res_type, response = imapobj.fetch("'%s'"%
-                msgsToFetch, '(FLAGS UID)')
+                msgsToFetch, '(FLAGS UID INTERNALDATE)')
             if res_type != 'OK':
                 raise OfflineImapError("FETCHING UIDs in folder [%s]%s failed. "
                     "Server responded '[%s] %s'"% (self.getrepository(), self,
@@ -243,10 +245,11 @@ class IMAPFolder(BaseFolder):
                                           minor = 1)
             else:
                 uid = long(options['UID'])
-                self.messagelist[uid] = self.msglist_item_initializer(uid)
+                retval[uid] = self.msglist_item_initializer(uid)
                 flags = imaputil.flagsimap2maildir(options['FLAGS'])
                 rtime = imaplibutil.Internaldate2epoch(messagestr)
-                self.messagelist[uid] = {'uid': uid, 'flags': flags, 'time': rtime}
+                retval[uid] = {'uid': uid, 'flags': flags, 'time': rtime}
+        return retval
 
     def dropmessagelistcache(self):
         self.messagelist = {}
@@ -836,8 +839,6 @@ class IMAPFolder(BaseFolder):
         self.__deletemessages_noconvert(uidlist)
 
     def __deletemessages_noconvert(self, uidlist):
-        # Weed out ones not in self.messagelist
-        uidlist = [uid for uid in uidlist if self.uidexists(uid)]
         if not len(uidlist):
             return
 
diff --git a/offlineimap/folder/Maildir.py b/offlineimap/folder/Maildir.py
index 77f7ebb..0e8f79c 100644
--- a/offlineimap/folder/Maildir.py
+++ b/offlineimap/folder/Maildir.py
@@ -150,7 +150,7 @@ class MaildirFolder(BaseFolder):
             flags = set((c for c in flagmatch.group(1) if not c.islower()))
         return prefix, uid, fmd5, flags
 
-    def _scanfolder(self):
+    def scanfolder(self, maxage):
         """Cache the message list from a Maildir.
 
         Maildir flags are: R (replied) S (seen) T (trashed) D (draft) F
@@ -158,8 +158,6 @@ class MaildirFolder(BaseFolder):
         :returns: dict that can be used as self.messagelist.
         """
 
-        maxage = self.config.getdefaultint("Account " + self.accountname,
-                                           "maxage", None)
         maxsize = self.config.getdefaultint("Account " + self.accountname,
                                             "maxsize", None)
         retval = {}
@@ -174,7 +172,7 @@ class MaildirFolder(BaseFolder):
             # We store just dirannex and filename, ie 'cur/123...'
             filepath = os.path.join(dirannex, filename)
             # Check maxage/maxsize if this message should be considered.
-            if maxage and not self._iswithinmaxage(filename, maxage):
+            if maxage != -1 and not self._iswithinmaxage(filename, maxage):
                 continue
             if maxsize and (os.path.getsize(os.path.join(
                         self.getfullname(), filepath)) > maxsize):
@@ -199,9 +197,9 @@ class MaildirFolder(BaseFolder):
         return retval
 
     # Interface from BaseFolder
-    def quickchanged(self, statusfolder):
+    def quickchanged(self, statusfolder, maxage):
         """Returns True if the Maildir has changed"""
-        self.cachemessagelist()
+        self.cachemessagelist(maxage)
         # Folder has different uids than statusfolder => TRUE.
         if sorted(self.getmessageuidlist()) != \
                 sorted(statusfolder.getmessageuidlist()):
@@ -218,9 +216,9 @@ class MaildirFolder(BaseFolder):
         return {'flags': set(), 'filename': '/no-dir/no-such-file/'}
 
     # Interface from BaseFolder
-    def cachemessagelist(self):
+    def cachemessagelist(self, maxage):
         if self.ismessagelistempty():
-            self.messagelist = self._scanfolder()
+            self.messagelist = self.scanfolder(maxage)
 
     # Interface from BaseFolder
     def getmessagelist(self):
@@ -439,16 +437,16 @@ class MaildirFolder(BaseFolder):
         :return: Nothing, or an Exception if UID but no corresponding file
                  found.
         """
-        if not self.uidexists(uid):
-            return
-
         filename = self.messagelist[uid]['filename']
         filepath = os.path.join(self.getfullname(), filename)
         try:
             os.unlink(filepath)
         except OSError:
             # Can't find the file -- maybe already deleted?
-            newmsglist = self._scanfolder()
+            maxage = localfolder.config.getdefaultint("Account %s"%
+                localfolder.accountname, "maxage", -1)
+
+            newmsglist = self.scanfolder(maxage)
             if uid in newmsglist:       # Nope, try new filename.
                 filename = newmsglist[uid]['filename']
                 filepath = os.path.join(self.getfullname(), filename)
diff --git a/offlineimap/imaplibutil.py b/offlineimap/imaplibutil.py
index 30aed9a..44fa6dc 100644
--- a/offlineimap/imaplibutil.py
+++ b/offlineimap/imaplibutil.py
@@ -19,6 +19,7 @@ import fcntl
 import time
 import subprocess
 from sys import exc_info
+from calendar import timegm
 import threading
 from hashlib import sha1
 import socket
@@ -245,4 +246,4 @@ def Internaldate2epoch(resp):
 
     tt = (year, mon, day, hour, min, sec, -1, -1, -1)
 
-    return time.mktime(tt)
+    return timegm(tt) - zone
-- 
2.3.3






More information about the OfflineIMAP-project mailing list