[supersedes] Re: maxage causes loss of local email
Janna Martl
janna.martl109 at gmail.com
Fri Mar 13 07:46:44 GMT 2015
What about IMAP-to-IMAP sync? This is strictly harder than the
Maildir-to-IMAP case: if we have a Maildir, we know the internal
timezone, and it's less expensive to fetch messages.
> for uid in local_messageslist:
> if uid in server_messageslist:
> lowest_common_uid = uid
> break
> ...
>
> if lowest_common_uid != None:
> for mlist in [local_messageslist, server_messageslist]:
> for uid in mlist:
> if uid > lowest_common_uid:
> continue
> del mlist[uid]
In this situation
E X
|--------------|--------------->
-24 0 24
D F G H I X
|--------------|--------------->
-24 0 24
we end up ignoring everything before X, which is wrong because G, H,
E, I should be deleted. In a previous incarnation of this, you only
looked for lowest_common_uid < "0" (where "0" is with respect to the
Maildir's timezone, which we know), which avoids this problem but
doesn't generalize to the IMAP/IMAP case where we don't know where 0
is on either list. Internaldates would be mighty helpful.....
> # TODO: make a new local scan and a new fetch for UIDs within
> # (maxage + (maxage_safety_interval * maxage_helper)),
> # look for common UID again:
> # - if found: proceed with it, print a hint of what was
> # the working maxage_safety_interval/maxage.
> # - else: recurse.
The maximum timezone difference (call this tzgap) is 26 hours (-1200
to +1400). A lot of this complexity would go away if we just assumed
that tzgap < 2 days, for example. Suppose we start by doing two
SINCE(maxage + 1) queries, and don't find a lowest_common_uid.
How far back do we have to worry about? Here I've put A in the worst
possible place, i.e. a distance of 24 + tzgap < 3 from 0. We don't
have to worry about things further out (e.g. B) because we want to do
the usual deletion procedure on the two (-24, now) messagelists after
having excluded problematic things like A, and B just doesn't enter
into that picture.
E
B A |--------------|--------------->
-24 0 24
B A D F G H I
|--------------|--------------->
-24 0 24
|______24______|_____tzgap_________|
Some pseudocode:
# from before
messagelist1_small = do SINCE(maxage + 1) query on server1
messagelist2_small = do SINCE(maxage + 1) query on server2
# now do the first attempt at finding a lowest_common_uid
if lowest_common_uid == None:
messagelist1_big = do SINCE(maxage + 3) query on server1
messagelist2_big = do SINCE(maxage + 3) query on server2
for uid in messagelist1_small:
if uid is in messagelist2_big and uid is not in messagelist2_small:
# found A
remove uid from messagelist1_small
for uid in messagelist2_small:
if uid is in messagelist1_big and uid is not in messagelist1_small:
# found A
remove uid from messagelist2_small
# now do the usual deletion procedure using messagelist1_small and
# messagelist2_small
Also, I'm still kind of confused exactly what edge cases you want this
to take care of.
* Do you want to avoid touching IMAP INTERNALDATEs entirely, or are
there specific things you don't want to do with them?
* Earlier you mentioned timezone gaps of > 1 day being problematic,
but I think the above idea deals with this. It just doesn't deal with
arbitrary timezone gaps. Do you specifically want to treat the case of
a server that's responding to SINCE(yesterday) queries with messages
that are a week old? (And if the server is that broken, can we trust
it to give out strictly increasing UIDs?)
-- J.M.
More information about the OfflineIMAP-project
mailing list