[supersedes] Re: maxage causes loss of local email

Nicolas Sebrecht nicolas.s-dev at laposte.net
Thu Mar 12 11:15:54 GMT 2015


[ Supersedes my previous mail with
  Message-ID: <20150312094001.GA2235 at vidovic.ultras.lan> ]

On Thu, Mar 12, 2015 at 01:00:28AM -0400, Janna Martl wrote:

> Oh! I thought you were talking about a slightly different strategy,
> hence the confusion. However, I'm worried about the following:
> 
> (server)
>    A                B  C      
>  |--------------|---------------|
> -24             0              24
> 
> (local)
>    A                                   
>           |--------------|---------------|
>          -24             0              24
> 
> Now local_lowest_uid will stay undefined, the remote messagelist will
> be [A,B,C], and the local messagelist will be []. This will lead to A
> (and B and C) being deleted on the server, which is bad because A
> actually existed locally.

Yeah, I woke up this morning with a fix of this issue in mind. ,-)

I was right initially that we have to care about not having common UID.
In this case, it would be nice to process UIDs for range [(maxage)..lastest].

> The obvious way to fix this is to check the internal date, and not
> delete messages on the server with internaldate < maxage. But I
> thought we were trying to avoid using internaldate, and I'm not sure
> why this would be better than what I was trying to do before.
> 
> Alternatively, we could get (maxage + 2) days of local messages,
> exclude remote messages that fall on the (maxage + 2) local list but
> not the (maxage + 1) local list, then reduce the local list to maxage
> + 1, and continue with the above procedure. Essentially this is
> forcing it to look like Case 1, not Case 2.

Yes and no.

<before>
No because this doesn't fix the issue. This just move the problem back
to maxage -2.
<now>
I was wrong about why it's wrong. You tend to assume that unexpected
deletion can only happen one way. We have to keep in mind that unsynced
times/timzones:
- can have both positive and negative gaps;
- might cause unexpected deletions on BOTH sides.
</>


Yes, because it's a good idea to change of maxage_safety_interval. The
good maxage_safety_interval depends on the rate that mails are coming.
We don't know about that but the user does and we will help him to get
the correct value. So let's add a new mandatory configuration option for
maxage.

Now, here is my new version of the logic. Let's check, 2nd round.


CASE ONE (local time is oldest)
--------

(server)
  I  J     L        B  C      E  F    X
                    |--------------|---------------|
                   -24             0             24

(local)
  I     K     M  A     C   D     F    X
          |--------------|---------------|
          -24             0             24

We want to sync [X].
Syncing [D, E, F, X] is still acceptable.



CASE TWO (server time is oldest)
--------

(server)
  I  J     L        B  C      E  F    X
 |--------------|---------------|
-24             0             24

(local)
  I     K     M  A     C   D     F    X
          |--------------|---------------|
          -24             0             24

We want to sync [D, E, F, X].

So in both case, we should stop at UIDs > C.

Now comes the processing logic.

---------------
maxage_helper = 1
local_messageslist  # ONE SCAN  for messages in range
                    # [(maxage + (maxage_safety_interval * maxage_helper))..lastest]
server_messageslist # ONE FETCH for messages in range
                    # [(maxage + (maxage_safety_interval * maxage_helper))..lastest]

lowest_common_uid = None
if WhateverObject.is_first_sync():
    # This is our first sync, we can't have common UID.
    lowest_common_uid = 0

else:
    # This check just avoids unnecessary computing.
    if len(server_messageslist) > 1:
      # Assumed UID ordered, lowest to biggest.
      # DIFF: we now process ALL known UIDs.
<now>
      for uid in local_messageslist:
          if uid in server_messageslist:
              lowest_common_uid = uid
              break
<before>
      for m in local_messageslist:
          if uid in server_messageslist:
              lowest_common_uid = local_messageslist[uid]
              break
</>

<now>
    if lowest_common_uid == None:
        # (maxage + (maxage_safety_interval * maxage_helper)) is too low.
        self.ui.warn("could not find a common UID.")
<before>
    if maxage_helper > 1:
        self.ui.info("had to go up to 'maxage_safety_interval = %d' to find "
            "a common UID."% (maxage_safety_interval * maxage_helper))
    if lowest_common_uid == None:
</>
        if not self.conf.get("maxage_safety_interval_helper_enabled", False):
            raise MaxageException("cannot find common UID, maxage_safety_interval "
                "appears to be too low, it has to be increased.")
        else:
            # Trying to figure a better maxage by ourself.
            self.ui.info("could not find common UID within interval "
                "(%i + (%i * %i)), making a new pass (max = 13)."%
                (maxage, maxage_safety_interval, maxage_helper))
            maxage_helper *= 2
<now>
            # TODO: make a new local scan and a new fetch for UIDs within
            # (maxage + (maxage_safety_interval * maxage_helper)),
            # look for common UID again:
            # - if found: proceed with it, print a hint of what was
            # the working maxage_safety_interval/maxage.
            # - else: recurse.
            if maxage_helper > 13:
                # TODO: rescan local with range [(maxage)..latest]
                # and sync.
                # Recurse with a function call or by raising a dedicated
                # exception.
                raise MaxageHelperException("%d"% maxage_helper)
<before>
            # TODO: make a new local scan and a new fetch for UIDs within
            # (maxage + (maxage_safety_interval * maxage_helper)),
            # look for common UID again:
            # - if found: proceed with it, print hint to what would have
            # been the correct maxage_safety_interval.
            # - else: recurse.
            if maxage_helper > 13:
                lowest_common_uid = 0
                # TODO: rescan local with range [(maxage)..latest]
                # and sync.
                raise MaxageException("cannot find common UID, going back "
                    "to more than %d. "% (.../compute/...))
</>

    # Clean the lists for messages we should not sync.
    if lowest_common_uid != None:
        for mlist in [local_messageslist, server_messageslist]:
            for uid in mlist:
                if uid > lowest_common_uid:
                    continue
                del mlist[uid]

CASE THREE
----------

Same as cases one and two but without common UID.
There is no case three we should worry: we always know what to do.

<all following updated>

NOTE: I'm not sure we need any new configuration option. Having default
values
- maxage_safety_interval = 1
- maxage_helper = 1
are pretty good, if not the best at all.

I like this logic because we are avoiding ALL kind of timezone issues,
including the most crazy like having a big time gap being plain wrong
(far in the future or in the past).


Also, can extend the maxage feature when its value does not make any
sense (or won't sync anything). This alternative is to simply print a
hint and STOP by default. I think we could rewrite the above logic with
the following configuration options:

    # When no mail is found within maxage interval, we will try to find
    # the latest common UID and print a hint of a better maxage value.
    #
    # If False, don't do anything. You are in a cul-de-sac, you'll have to
    # increase maxage manually in such a case.
    #
    # Default is False.
    #
    #maxage_helper = False

    # If maxage_helper is True, also sync mails since latest common UID.
    #
    # Default is False.
    #
    #maxage_extend = False

I didn't wrote this processing logic so we can discuss all the
possibilities with very detailed implementation options in mind.


Possible fixes recap
--------------------

1. "Lazy or strict". Fetch remote once more with (maxage_helper * 2):
   - If common UID, sync.
   - If not, come back to maxage and sync.

2. "Improved".
   2a. "Improved expose raw helpers". Expose configuration options
       'maxage_safety_interval' and 'maxage_safety_interval_helper_enabled'.
       We'll have to programatically decide what to do for edge cases.

   2b. "Improved expose smart helpers". Expose configuration options
       'maxage_helper' and 'maxage_extend'.
       Definetly the smartest and the most user-oriented since it helps
       tuning maxage.
       Implies to work with latest_common_UID rather than lowest_common_uid
       within the recurse loop.


I think we should avoid 2a. I have no strong opinion between 1 and 2b.

I tend to think that allowing the user to improve the maxage value is a
good thing with 2b since it doesn't require long/complex code addition
compared to 1.

What do you think?

-- 
Nicolas Sebrecht




More information about the OfflineIMAP-project mailing list