maxage causes loss of local email

Fri Mar 13 10:48:32 UTC 2015

On Fri, Mar 13, 2015 at 03:59:29AM -0400, Janna Martl wrote:

> What about IMAP-to-IMAP sync? This is strictly harder than the
> Maildir-to-IMAP case: if we have a Maildir, we know the internal
> timezone,

I'd rather say, if we have a Maildir, we know the rtimes. The rtimes
might be screwed-up by wrong timezones.

>           and it's less expensive to fetch messages.

Not sure 100% for all use cases (with big maxages and Maildir on a slow
device).

> we end up ignoring everything before X, which is wrong because G, H,
> E, I should be deleted. In a previous incarnation of this, you only
> looked for lowest_common_uid < "0" (where "0" is with respect to the
> Maildir's timezone, which we know), which avoids this problem but
> doesn't generalize to the IMAP/IMAP case where we don't know where 0
> is on either list. Internaldates would be mighty helpful.....

You're right that this does not fit for IMAP/IMAP.

> The maximum timezone difference (call this tzgap) is 26 hours (-1200
> to +1400).

How did you got those values?

>            A lot of this complexity would go away if we just assumed
> that tzgap < 2 days, for example. Suppose we start by doing two
> SINCE(maxage + 1) queries, and don't find a lowest_common_uid.
> How far back do we have to worry about? Here I've put A in the worst
> possible place, i.e. a distance of 24 + tzgap < 3 from 0. We don't
> have to worry about things further out (e.g. B) because we want to do
> the usual deletion procedure on the two (-24, now) messagelists after
> having excluded problematic things like A, and B just doesn't enter
> into that picture.
> 
>       A                                                      E         
>                          |--------------|--------------->
>                        -24              0             24
> 
>       A                D     F    G H I                
>      |--------------|--------------->
>      -24            0              24
> 
>      |______24______|_____tzgap_________|
> 
> Some pseudocode:
> 
> # from before
> messagelist1_small = do SINCE(maxage + 1) query on server1
> messagelist2_small = do SINCE(maxage + 1) query on server2
> 
> # now do the first attempt at finding a lowest_common_uid
> 
> if lowest_common_uid == None:
>    messagelist1_big = do SINCE(maxage + 3) query on server1
>    messagelist2_big = do SINCE(maxage + 3) query on server2
>    for uid in messagelist1_small:
>        if uid is in messagelist2_big and uid is not in messagelist2_small:
>    	    # found A
>            remove uid from messagelist1_small
>    for uid in messagelist2_small:
>        if uid is in messagelist1_big and uid is not in messagelist1_small:
>    	    # found A
>            remove uid from messagelist2_small
> # now do the usual deletion procedure using messagelist1_small and
> # messagelist2_small

Yes. Thinking about this more, do take this approach with small and big
message lists. This is the proper fix to the maxage issue you raised.

In fact, I was going a bit further by extending the maxage feature.
What's the purpose of maxage? In real life it highly depends on what the
user need. Some want to restrict bandwith, others try to get only
the most relevant mails, etc.

But extending the maxage feature is another topic. That's why I think
your last approach is good. And once it will be done, implementing
extended maxage will become EASY.

What I like with the "lowest common ancestor in range X" is that we will
be able to do much more with few code. I think that "sync mails back up
to a common ancestor" is in fact better for some use cases like if no
sync was done for a long time (where a strict maxage become offending).

> Also, I'm still kind of confused exactly what edge cases you want this
> to take care of.

I dislike making mutliple fetches at all syncs. Multiple fetches are
acceptable when handling edge-cases but not for the normal case.

What I care about most:
- speed for the normal case (most offender: multiple fetches)
- algo should be as reliable as possible:
  - working everywhere: IMAP servers have different implementations
  - all kind of crap that can happen and screw-up the logic (crazy
    dates, timezones, etc)
  - prevent failures by avoiding parsing of dates (things like that are
    usually somewhat haskish and tend to get easily broken)

FMPOV, you last implementation logic match those pre-requisites and
properly fix the issue.

Have fun! ,-)

-- 
Nicolas Sebrecht