PARTIALLY REMOVING MAXAGE (was: [PATCH v4] make maxage use UIDs to avoid timezone issues)
Nicolas Sebrecht
nicolas.s-dev at laposte.net
Tue Mar 31 15:42:47 BST 2015
On Tue, Mar 31, 2015 at 04:53:29AM -0400, Janna Martl wrote:
> remove IMAP-IMAP support for maxage; add startdate, folder_startdate to compensate
>
> maxage was fundamentally wrong in the IMAP-IMAP case: it assumed that
> remote messages have UIDs in the same order as their local counterparts,
> which could be false, e.g. when messages are copied in quick succession.
> So, remove support for maxage in the IMAP-IMAP case.
>
> Instead, support a special case of date restriction in IMAP-IMAP syncs:
> suppose there's a pre-existing IMAP folder that you want to sync to a
> new, empty folder, but you only want to sync the messages after a
> certain date. This date can be specified by the configuration option
> folder_startdate. On all subsequent syncs, the messages in the original
> folder after folder_startdate, and all the messages in the new folder,
> are considered.
>
> Also add the option startdate, which is similar to maxage but involves
> syncing messages after a fixed date, instead of a fixed number of days
> ago. This is not supported in the IMAP-IMAP case for the same reasons as
> maxage.
Good idea.
I find the wording of the options to be confusing: folder_startdate
/sounds/ to be the same thing as startdate but at a folder level. At
least, that's what the "folder" prefix currently mean.
While still sticking with your proposal, it should be more something
like
- maxage -> maxage (new limitations)
- folder_startdate -> localstartdate
- startdate -> startdate
That being said, maxage and startdate are really the same thing as they
both have the same limitations. So, I would improve maxage rather than
inserting another option:
1. is integer: historical meaning.
2. is date format yyyy-mm-dd: fixed reference.
Then, we can make things less confusing:
- startdate -> maxage (2 possible forms, requires Maildir)
- folder_startdate -> startdate (1 form, other limitations)
Later comments are against the current proposal while I think the only 2
above options is better.
Side note: currently we have a lot of options that are applied to a
arbitrary level (current levels: general -> account -> repository,
mbnames). In the future, it would be nice to apply each option to the
lowest level it makes sense and fallback to the upper levels when not
set. Also, supporting per-folder options would be great but this is
another topic.
> ---
> offlineimap.conf | 43 +++++++++-
> offlineimap/accounts.py | 187 ++++++++++++++++++++++++++++++------------
> offlineimap/folder/Base.py | 28 +++++++
> offlineimap/folder/Gmail.py | 4 +-
> offlineimap/folder/IMAP.py | 79 ++++++------------
> offlineimap/folder/Maildir.py | 76 ++++++++---------
> 6 files changed, 267 insertions(+), 150 deletions(-)
>
> diff --git a/offlineimap.conf b/offlineimap.conf
> index af382fb..c7751b2 100644
> --- a/offlineimap.conf
> +++ b/offlineimap.conf
> @@ -260,7 +260,7 @@ remoterepository = RemoteExample
> # This option stands in the [Account Test] section.
> #
> # OfflineImap can replace a number of full updates by quick synchronizations.
> -# This option is ignored if maxage is used.
> +# This option is ignored if maxage, startdate, or folder_startdate are used.
> #
> # It only synchronizes a folder if
> #
> @@ -342,6 +342,9 @@ remoterepository = RemoteExample
> #
> # Known edge cases are described in the offlineimap(1).
> #
> +# maxage is not supported for syncing two IMAP (or Gmail) accounts, and
> +# may not be used in conjunction with startdate or folder_startdate.
# maxage is allowed only when local is of Maildir type. It can't be used
# with startdate or folder_startdate.
> +#
> # The maxage option expects an integer (for the number of days).
> #
> #maxage = 3
> @@ -349,6 +352,22 @@ remoterepository = RemoteExample
>
> # This option stands in the [Account Test] section.
> #
> +# When you are starting to sync an already existing account you can tell
> +# OfflineIMAP to only sync messages starting at a particular date. When you do
> +# this, messages older than that date will be completely ignored. This can be
> +# useful for importing existing accounts when you do not want to download large
> +# amounts of archive email.
I would change all the above because it's confusing. Just:
# The same as maxage but from a fixed date, instead.
> +#
> +# startdate is not supported for syncing two IMAP (or Gmail) accounts,
> +# and may not be used in conjunction with maxage or folder_startdate.
> +#
> +# The startdate option expects a date in the format yyyy-mm-dd.
> +#
> +#startdate = 2015-03-20
> +
> +
> +# This option stands in the [Account Test] section.
> +#
> # Maildir file format uses colon (:) separator between uniq name and info.
> # Unfortunatelly colon is not allowed character in windows file name. If you
> # enable maildir-windows-compatible option, OfflineIMAP will be able to store
> @@ -451,6 +470,28 @@ localfolders = ~/Test
>
> # This option stands in the [Repository LocalExample] section.
> #
> +# Use this option if LocalExample is a pre-existing account that you
> +# want to clone to a new, empty account, but you only want to clone
> +# messages later than a particular date. If you are using this option
> +# for the first time, RemoteExample must be empty. On all subsequent
> +# syncs (for which this option is enabled), offlineimap will consider
> +# the messages in LocalExample later than folder_startdate, and will
> +# consider all messages in RemoteExample. Older messages in LocalExample
> +# will be completely ignored.
> +#
> +# This option is mainly useful for IMAP-IMAP syncs, because maxage and
> +# startdate are not supported in that case.
Would change the description to just:
# startdate sync mails starting from the given date.
#
# It applies the date restriction to the local IMAP server only.
# Remote repository MUST be empty at first sync.
> +#
> +# folder_startdate may not be used in conjunction with maxage or
> +# startdate.
> +#
> +# The folder_startdate option expects a date in the format yyyy-mm-dd.
> +#
> +#folder_startdate = 2015-03-20
> +
> +
> +# This option stands in the [Repository LocalExample] section.
> +#
> # Some users may not want the atime (last access time) of folders to be
> # modified by OfflineIMAP. If 'restoreatime' is set to yes, OfflineIMAP
> # will restore the atime of the "new" and "cur" folders in each maildir
> diff --git a/offlineimap/accounts.py b/offlineimap/accounts.py
> index a7ad623..e4da366 100644
> --- a/offlineimap/accounts.py
> +++ b/offlineimap/accounts.py
> @@ -17,10 +17,11 @@
> from subprocess import Popen, PIPE
> from threading import Event
> import os
> +import time
> from sys import exc_info
> import traceback
>
> -from offlineimap import mbnames, CustomConfig, OfflineImapError
> +from offlineimap import mbnames, CustomConfig, OfflineImapError, imaplibutil
> from offlineimap import globals
> from offlineimap.repository import Repository
> from offlineimap.ui import getglobalui
> @@ -402,6 +403,96 @@ def syncfolder(account, remotefolder, quick):
>
> Filtered folders on the remote side will not invoke this function."""
>
> +
> + def sanity_check(date):
nitpick: s,sanity_check,is_date_sane,
> + if date[0] < 1900:
> + raise OfflineImapError("date restriction led to year %d. "
> + "Abort syncing."% min_date[0],
> + OfflineImapError.ERROR.REPO)
> +
> + def check_uid_validity(localfolder, remotefolder, statusfolder):
> + # If either the local or the status folder has messages and
> + # there is a UID validity problem, warn and abort. If there are
> + # no messages, UW IMAPd loses UIDVALIDITY. But we don't really
> + # need it if both local folders are empty. So, in that case,
> + # just save it off.
> + if localfolder.getmessagecount() or statusfolder.getmessagecount():
I'm confused about what it means. Are we looking for
getmessagecount() != 0, not being None, not raising an exception, whatever?
I know python allows being lazy and sometime Python experts encourage
such pratices. I consider this BAD. It just allows to insert more subtle
bugs, makes code reading harder and more subject to wrong interpretations.
> + if not localfolder.check_uidvalidity():
> + ui.validityproblem(localfolder)
> + localrepos.restore_atime()
> + return
> + if not remotefolder.check_uidvalidity():
> + ui.validityproblem(remotefolder)
> + localrepos.restore_atime()
> + return
> + else:
> + # Both folders empty, just save new UIDVALIDITY
> + localfolder.save_uidvalidity()
> + remotefolder.save_uidvalidity()
> +
> + def save_min_uid(folder, min_uid):
> + uidfile = folder.get_min_uid_file()
> + fd = open(uidfile, 'wt')
> + fd.write(str(min_uid) + "\n")
> + fd.close()
Knock knock! :-)
save_min_uid() method must belong to a folder class!
> +
> + def cachemessagelists_by_date(localfolder, remotefolder, date):
> + """ Returns messages with uid > min(uids of within-date
> + messages)."""
messages, date included)."""
> +
> + localfolder.cachemessagelist(min_date=date)
> + check_uid_validity(localfolder, remotefolder, statusfolder)
> + # local messagelist had date restriction applied already. Restrict
> + # sync to messages with UIDs >= min_uid from this list.
> + #
> + # local messagelist might contain new messages (with uid's < 0).
> + positive_uids = filter(
> + lambda uid: uid > 0, localfolder.getmessageuidlist())
> + if len(positive_uids) > 0:
> + remotefolder.cachemessagelist(min_uid=min(positive_uids))
> + else:
> + # No messages with UID > 0 in range in localfolder.
> + # date restriction was applied with respect to local dates but
> + # remote folder timezone might be different from local, so be
> + # safe and make sure the range isn't bigger than in local.
Adding 24h below makes the range bigger, though.
> + remotefolder.cachemessagelist(
> + min_date=time.gmtime(time.mktime(date) + 24*60*60))
> +
> + def cachemessagelists_folder_startdate(new, partial, datestr):
> + """ Retrieve messagelists when folder_startdate has been set for
> + the folder 'partial'.
> +
> + Idea: suppose you want to clone the messages after date in one
> + account (partial) to a new one (new). If new is empty, then copy
> + messages in partial newer than date to new, and keep track of the
> + min uid. On subsequent syncs, sync all the messages in new against
> + those after that min uid in partial. This is a partial replacement
> + for maxage in the IMAP-IMAP sync case, where maxage doesn't work:
> + the UIDs of the messages in localfolder might not be in the same
> + order as those of corresponding messages in remotefolder, so if L in
> + local corresponds to R in remote, the ranges [L, ...] and [R, ...]
> + might not correspond. But, if we're cloning a folder into a new one,
> + [min_uid, ...] does correspond to [1, ...].
> +
> + This is just for IMAP-IMAP. For Maildir-IMAP, use startdate instead.
> + """
> +
> + new.cachemessagelist()
> + if not new.getmessageuidlist():
Ditto, "not inst.getlist()" may mean too much things. Be precise about
what you expect.
> + # First sync
> + date = time.strptime(datestr, "%Y-%m-%d")
This might badly fail (see later).
> + partial.cachemessagelist(min_date=date)
> + uids = partial.getmessageuidlist()
> + if len(uids) > 0:
> + min_uid = min(uids)
> + else:
> + min_uid = 1
> + save_min_uid(partial, min_uid)
> + else:
> + min_uid = partial.get_min_uid()
> + partial.cachemessagelist(min_uid=min_uid)
> +
> +
> remoterepos = account.remoterepos
> localrepos = account.localrepos
> statusrepos = account.statusrepos
> @@ -429,68 +520,56 @@ def syncfolder(account, remotefolder, quick):
>
> statusfolder.cachemessagelist()
>
> - maxage = localfolder.getmaxage()
>
> # Load local folder.
> ui.syncingfolder(remoterepos, remotefolder, localrepos, localfolder)
> - ui.loadmessagelist(localrepos, localfolder)
> - # For maxage, returns messages with uid > min(uids of within-maxage messages).
> - localfolder.cachemessagelist(maxage=maxage)
> - ui.messagelistloaded(localrepos, localfolder, localfolder.getmessagecount())
> -
> - if quick:
> - # IMAP quickchanged isn't compatible with maxage, since the "quick"
> - # check can only retrieve a full list of UIDs in the folder.
> - if maxage:
> - ui.warn("Quick syncs (-q) not supported in conjunction "\
> - "with maxage; ignoring -q.")
> - else:
> +
> + #Retrieve messagelists, taking into account age-restriction
> + #options
Is there a reason for not putting a whitespace after '#' sometimes?
> + maxage = localfolder.getmaxage()
> + localstart = localfolder.getfolder_startdate()
> + remotestart = remotefolder.getfolder_startdate()
> + startdate = localfolder.getstartdate()
> + if (maxage != None) + (localstart != None) + (remotestart != None)\
> + + (startdate != None) > 1:
> + raise OfflineImapError("You can set at most one of the "
> + "following: maxage, folder_startdate (for the local folder), "
> + "folder_startdate (for the remote folder), startdate",
> + OfflineImapError.ERROR.REPO), None, exc_info()[2]
> + if (maxage != None or localstart or remotestart or startdate)\
> + and quick:
> + # IMAP quickchanged isn't compatible with options that
> + # involve restricting the messagelist, since the "quick"
> + # check can only retrieve a full list of UIDs in the folder.
> + ui.warn("Quick syncs (-q) not supported in conjunction "
> + "with maxage, folder_startdate, or startdate; ignoring "
> + "-q.")
> + if maxage != None:
> + timelimit = time.gmtime(time.time() - 60*60*24*maxage)
> + sanity_check(timelimit)
> + cachemessagelists_by_date(localfolder, remotefolder, timelimit)
> + elif startdate != None:
> + timelimit = time.strptime(startdate, "%Y-%m-%d")
> + sanity_check(timelimit)
> + cachemessagelists_by_date(localfolder, remotefolder, timelimit)
> + elif localstart != None:
> + cachemessagelists_folder_startdate(remotefolder, localfolder,
> + localstart)
> + check_uid_validity(localfolder, remotefolder, statusfolder)
> + elif remotestart != None:
> + cachemessagelists_folder_startdate(localfolder, remotefolder,
> + remotestart)
> + check_uid_validity(localfolder, remotefolder, statusfolder)
> + else:
> + localfolder.cachemessagelist()
> + if quick:
> if (not localfolder.quickchanged(statusfolder) and
> not remotefolder.quickchanged(statusfolder)):
> ui.skippingfolder(remotefolder)
> localrepos.restore_atime()
> return
> -
> - # If either the local or the status folder has messages and
> - # there is a UID validity problem, warn and abort. If there are
> - # no messages, UW IMAPd loses UIDVALIDITY. But we don't really
> - # need it if both local folders are empty. So, in that case,
> - # just save it off.
> - if localfolder.getmessagecount() or statusfolder.getmessagecount():
> - if not localfolder.check_uidvalidity():
> - ui.validityproblem(localfolder)
> - localrepos.restore_atime()
> - return
> - if not remotefolder.check_uidvalidity():
> - ui.validityproblem(remotefolder)
> - localrepos.restore_atime()
> - return
> - else:
> - # Both folders empty, just save new UIDVALIDITY
> - localfolder.save_uidvalidity()
> - remotefolder.save_uidvalidity()
> -
> - # Load remote folder.
> - ui.loadmessagelist(remoterepos, remotefolder)
> - if maxage != None:
> - # local messagelist was applied maxage, already. Restrict sync for
> - # messages with UIDs >= min_uid from this list.
> - #
> - # local messagelist might contain new messages (with uid's < 0).
> - positive_uids = filter(
> - lambda uid: uid > 0, localfolder.getmessageuidlist())
> - if len(positive_uids) > 0:
> - remotefolder.cachemessagelist(min_uid=min(positive_uids))
> - else:
> - # No messages with UID > 0 in range in localfolder.
> - # maxage was applied with respect to local dates but
> - # remote folder timezone might be different from local, so be
> - # safe and make sure the range isn't bigger than in local.
> - remotefolder.cachemessagelist(maxage=maxage - 1)
> - else:
> + check_uid_validity(localfolder, remotefolder, statusfolder)
> remotefolder.cachemessagelist()
> - ui.messagelistloaded(remoterepos, remotefolder,
> - remotefolder.getmessagecount())
>
> # Synchronize remote changes.
> if not localrepos.getconfboolean('readonly', False):
> diff --git a/offlineimap/folder/Base.py b/offlineimap/folder/Base.py
> index 4af0361..4dbcebd 100644
> --- a/offlineimap/folder/Base.py
> +++ b/offlineimap/folder/Base.py
> @@ -306,6 +306,34 @@ class BaseFolder(object):
> return self.config.getdefaultint("Account %s"%
> self.accountname, "maxsize", None)
>
> + def getstartdate(self):
> + return self.config.getdefault("Account %s"%
> + self.accountname, "startdate", None)
> +
> + def getfolder_startdate(self):
> + """ Retrieve the value of the configuration option folder_startdate """
> + return self.config.getdefault("Repository " + self.repository.name,
> + 'folder_startdate', None)
We are exposing any further use of this value to user craziness. Also,
we should return
time.strptime(datestr, "%Y-%m-%d")
since that's what we want to work with.
> +
> + def get_min_uid_file(self):
> + startuiddir = os.path.join(self.config.getmetadatadir(),
> + 'Repository-' + self.repository.name, 'StartUID')
> + if not os.path.exists(startuiddir):
> + os.mkdir(startuiddir, 0o700)
> + return os.path.join(startuiddir, self.getfolderbasename())
> +
> + def get_min_uid(self):
Nitpick about the name:
- get/set
- read/write
- save/?
retrieve?
> + uidfile = self.get_min_uid_file()
> + try:
> + fd = open(uidfile, 'rt')
> + min_uid = long(fd.readline().strip())
> + fd.close()
> + return min_uid
> + except:
> + raise IOError("Can't read %s. To start using folder_startdate, "\
> + "folder must be empty"% uidfile)
s,folder must be empty,,
> +
> +
> def savemessage(self, uid, content, flags, rtime):
> """Writes a new message, with the specified uid.
>
> diff --git a/offlineimap/folder/Gmail.py b/offlineimap/folder/Gmail.py
> index 93e8eee..7f239a3 100644
> --- a/offlineimap/folder/Gmail.py
> +++ b/offlineimap/folder/Gmail.py
> @@ -121,9 +121,9 @@ class GmailFolder(IMAPFolder):
>
> # TODO: merge this code with the parent's cachemessagelist:
> # TODO: they have too much common logics.
> - def cachemessagelist(self, maxage=None, min_uid=None):
> + def cachemessagelist(self, min_date=None, min_uid=None):
> if not self.synclabels:
> - return super(GmailFolder, self).cachemessagelist(maxage=maxage,
> + return super(GmailFolder, self).cachemessagelist(min_date=min_date,
> min_uid=min_uid)
>
> self.messagelist = {}
> diff --git a/offlineimap/folder/IMAP.py b/offlineimap/folder/IMAP.py
> index 25dd26d..cabf5b9 100644
> --- a/offlineimap/folder/IMAP.py
> +++ b/offlineimap/folder/IMAP.py
> @@ -18,6 +18,7 @@
> import random
> import binascii
> import re
> +import os
> import time
> from sys import exc_info
>
> @@ -79,6 +80,18 @@ class IMAPFolder(BaseFolder):
> def waitforthread(self):
> self.imapserver.connectionwait()
>
> + def getmaxage(self):
> + if self.config.getdefaultint("Account %s"%
> + self.accountname, "maxage", None):
> + raise OfflineImapError("maxage is not supported on IMAP-IMAP sync",
> + OfflineImapError.ERROR.REPO), None, exc_info()[2]
Aren't you forbidding any use of maxage? When did you check that local
repository is not a Maildir?
<...>
I might be missing other (more important) things. All those successive
incremental patches tend to make reviewing harder.
I believe this patch v7 should be squashed to patch v4 (its current
parent) for a fresh patch v8 which would supersede all previous patches
on next round. ,-)
--
Nicolas Sebrecht
More information about the OfflineIMAP-project
mailing list