Offlineimap duplicating email redux

Sebastian Spaeth Sebastian at SSpaeth.de
Thu Jan 20 10:34:28 GMT 2011


On Wed, 19 Jan 2011 23:02:13 -0500, "Edward Z. Yang" <ezyang at MIT.EDU> wrote:
> Six years ago, a user wrote in to report that OfflineIMAP was
> downloading then reuploading files to the server:
> 
>     http://www.mail-archive.com/debian-user@lists.debian.org/msg131430.html
> 
> I would like to state that I've seen this behavior, and I did
> some more investigative work.  I think the following code in OfflineImap
> is wrong:

> In particular, notice that we only test dest_messagelist (presumably
> LocalStatus) when determining whether or not to copymessageto, which
> then unconditionally resaves the message.  So this is consistent with
> the behavior I (and the other user) saw that if LocalStatus is wrong,
> you are so out of luck: OfflineIMAP will duplicate your messages into
> kingdom come.  Experimentally, this seems to be the case:

I believe your analysis is correct, I have come to the same conclusion
but for a small difference:

I believe this code is not "wrong", I believe this is the intended
behavior. (But I agree with you that it should not be the intended
behavior).

But I find the syncing strategy very complex anyway and think we can
simplify it. Just let me understand your analyis a bit better first:

This is how we it is currently done IMHO (acounts.py:syncfolder()):

1) If LocalStatus exists:
  1b) delete messages with a positive UID on LOCAL if they are not on REMOTE
  1c) sync LOCAL folder to REMOTE folder based on LOCAL vs LOCALSTATUS info)
2) sync REMOTE folder to LOCAL folder
3) sync LOCAL folder to LOCALSTATUS db (IMHO usually a NoOp)

Is your observation about a missing or a corrupt LocalStatus? (the
syncing strategy differs for those, as you can see above)

You are saying, that when LocalStatus is _missing_ for some reason, we first
download a mail and then reupload it? That shouldn't happen. In the case
that it exists but is corrupt, we will upload a locally existing message
to REMOTE, as we have it on LOCAL but LocalStatus thinks it is not on
REMOTE. This will lead to it getting a new uid on REMOTE I think, the
local message will be renamed to reflect the new UID and
the message is on REMOTE twice now. On syncing from REMOTE to LOCAL, we
will have to redownload the old message, as we indeed don't have it
locally (it has been renamed to the new UID).

Would you agree with the above analysis? I believe the code can and
should be simplified, the syncing strategies are all but obvious and all
those "dests and applyto[0]" less than intuitive. (if you haven't
written the code yourself).

For the record, I disagree that the above syncing strategy is the right
one. :-) Especially above step 1b seems superfluous and potentially very
dangerous.

I think, we should be doing this:

1) sync from REMOTE to LOCAL based on info from REMOTE and LOCAL
   LOCALSTATUS only helps to understand whether a message has been added
   one one side or removed on the other in case of doubt. If LOCALSTATUS
   corrupts, we will redownload everything missing on LOCAL but keep
   messages that already existed LOCALLY.

2) sync LOCAL to REMOTE based on info from LOCAL and REMOTE, and update LOCALSTATUS
   same logic as above.

> What's more concerning, and what I believe has caused me to lose some email
> permanently, is that the deletion algorithm appears to do about the same thing.
> So I frequently notice OfflineIMAP deleting /huge/ amounts of mail from my
> Maildir that I *know* I did not delete (I actually wouldn't have noticed if
> Sup wasn't realizing the files were going missing and let me know about it.)
> 
> Right now, I've disabled deletion from Maildir and uploading to IMAP, which
> seems to be working OK as a stopgap. But please, let's fix these algorithms!

First of all. as syncing works per folder. Moving a mail beween folders
on the IMAP side will delete it locally and redownload it into the new
folder. (not easily detectable or fixable) This happens when my server
filer shoves my INBOX files into a year-based ARCHIVE folder after a
while. Seeing these "deletes" is harmless as they really are moves
between folders. Could that be what you are seeing?

That having said, I would love to see us work out an improved and
simplified (and more transparent syncing strategy). I have a branch
lying around that started working on that and that I could post for
review if I find it again.

We must not lose emails (priority 1), but we should also not duplicate
them in case our LOCALSTATUS gets corrupted.

Sebastian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: <http://alioth-lists.debian.net/pipermail/offlineimap-project/attachments/20110120/cf785d60/attachment-0001.sig>


More information about the OfflineIMAP-project mailing list