Best alternative to maildir?

Bart Schouten lists at xenhideout.nl
Fri Sep 19 19:22:19 BST 2014


Hi,

I'll tell you a very short story. Believe it or not, but I lost important 
emails due to the fact that my backup tool that I used (IMAPSize) uses a 
maildir format to store its backups and as a result of that choice (to use 
that tool) I became careless and made some bad choices which resulted in 
Thunderbird (most likely) corrupting its mail store without me knowing it 
and deleting the entire month of May and June (almost, save for one 
message) and half of March and more messages from July. This year.

And because I was careless and didn't notice, I deleted the backups I had 
before and Thunderbird obviously also deleted all of that from my IMAP 
store.

Which is just a remote host that hopefully and likely will have a backup 
since this only happened 2 weeks ago.

So, perhaps weird to consider or hard to understand but the fact was that 
I immediately disliked the fact that while IMAPSize is capable of 
converting to mbox format, it doesn't use that, instead burdening the 
filesystem with thousands of files (in my case 12k files). I know it's 
pretty common for Linux or Unix systems to do that thing. I recently 
downloaded a source archive for my Synology NAS and it literally consisted 
of 500k files and something like 27k directories. And probably 400k of 
those files were smaller than 5kB. Thousands upon thousands of little tiny 
header files and makefiles and whatnot. Windows 7 Explorer completely 
choked on it and even reading the archive in WinRAR proved fruitless. I 
just had to extract it straight away.

But I proceeded anyway because IMAPSize seemed like the perfect tool and 
actually it is pretty awesome except that it is not. And only because of 
maildir.

Storing so many files in a filesystem instead of a packed archive the way 
for example Blizzard games have done since forever is completely 
pointless. At least to me it is. It makes your archive extremely 
vulnerable to all kinds of corruption. To name just something: Windows 7 
has a bug where it changes the modification date of the .eml file if it 
ever accesses it. This also happens on a file copy. And then, when you 
sync those files, horror ensues. So now you're suddenly dependent on and 
vulnerable to the workings of the operating system and its perfect 
handling of whatever file you can have. Whereas if you stored those emails 
in e.g. mbox format, none of that would really matter. Even if the OS 
would start messing with the mbox (the way some virus scanners are 
reported of doing, apparently...) it would probably be trivial to solve 
since you don't have hundreds of IMAP folders in your repository. You just 
have to maintain a few larger files and backups of those.

Storing all your thousands of emails in invididual files is like not 
having a home with a front door, but rather living on a grass field of 20x 
40 meters and distributing all your possessions along that field. It 
doesn't contain anything and containment is important. On the level of the 
filesystem, those 20-60k files you might have in your mail are not 
organized at all, they are all just inodes in the inode blocks and 
whatever distribution and possible dispersion they might have along your 
actual disk.

When I first heard of maildir (on the OfflineIMAP front page) (or actually 
just moments before somewhere else) I didn't even know what it was, only 
that it wasn't any good. That was my immediate knowledge. My intuitive 
knowing. And learning what it was, and is, it became immediately clear 
that it was the very same thing as that which caused me to lose those 
emails.

Because I started storing my emails as individual files, in my deeper 
mind, so to speak, they started to lose the sense of being one integral 
collection. They became those dispersed objects on the grass fields. And 
it should come as no surprise that if you organize your possessions that 
way, you will end up losing parts of them and you won't even notice. How 
could you notice? You'd have to scan the entire field or area to know if 
everything is still there.

And all of this is just psychological. It caused me to become unattentive 
and lose my grasp on the containment of my integral datastore of emails. 
And the direct physical result of that was losing entire months of email 
due to a Thunderbird bug, most likely.

Heh, I was so happy (or thought I was) with that IMAPSize program that I 
immediately made a donation to that developer/person/guy. But the (paypal) 
money was never accepted, and even though PayPal itself doesn't offer any 
ways or tools of getting that money back, I just reversed the payment 
through my bank. I need to correct this mistake or several important deals 
with some sellers of certain goods will fall flat on their faces. Without 
the emails that have led up to this point in time, and this point in the 
interactions, I can never complete the deals and the interactions and they 
are just going to ignore any requests I make of them.

So, mentally and spiritually speaking, I'm just going to reverse 
everything I've done and remove my investment in that IMAPSize program, to 
begin with. But naturally I still need a different and another solution 
and I know it is going to be OfflineIMAP. When I see the code, and the 
commit(s), my immediate impression is that the developers are inspired.

I will need to get it running on my Synology NAS. But being Python, it 
probably won't be so hard. I haven't done much in Python up to this date, 
just some GTK2 tryout thing for turning an old MS-DOS Qbasic thing of mine 
into something a little more modern ;-).

And it's not like I have a shortage of projects... right now I have at 
least 4 programming things I'm working on. But one of them is going to 
involve Python at some point anyway. One is just PHP hacking of a wiki 
solution. One is a little Java server application that I want to marry to 
Python as well with Java doing the structural fool-proof and resilient 
'platform' execution while python would do a form of scripting that I need 
for it. Not sure how to do that yet. There are probably libs that I could 
use to have a regular python interpreter provide a form of 'embedded' 
scripting environment within the server. Preferably not through any 
command line or inter-process communication but a real compiled library.

And then there is another thing I want to do for the Synology platform 
involving probably regular C no matter how much I hate it... something 
that will have to use nCurses and in fact I now see that I will have to 
use Perl for that. Python would not be robust enough on a systems level.

The Synology has a bit of IMAP functionality available but it is not more 
than this:

cyrus-imapd - 2.2.12-15 - The Carnegie Mellon University Cyrus IMAP Server
cyrus-imapd-devel - 2.2.12-15 - The Carnegie Mellon University Cyrus IMAP 
Server
cyrus-imapd-doc - 2.2.12-15 - The Carnegie Mellon University Cyrus IMAP 
Server
imap - 2007a1-1 - University of Washington IMAP package
imap-libs - 2007a1-1 - University of Washington IMAP package
php-imap - 5.2.17-2 - imap extension for php
up-imapproxy - 1.2.5-1 - proxies IMAP transactions between an IMAP client 
and an IMAP server

But at this point, I definitely don't want to run an IMAP server myself, 
AND I don't want to use any folder-based mail storage.

I see your program uses SQLite in some way. SQLite would not be a bad way 
to store email I would think! Not sure what you use it for. I'm versed 
well enough in SQL to make something work and I have a little bit of 
experience with these self-contained databases.

So, before I drift off into another day of being so tired I can't do 
anything....

What are your thoughts on what I want to do?

Here is the grep for the package list of "sqlite" on my NAS, excluding 
perl:

py24-sqlite - 2.4.1-1 - pysqlite is an interface to the SQLite database 
server for Python. It aims to be fully compliant with Python database API 
versi
sqlite - 3.8.1-1 - SQLite is a small C library that implements a 
self-contained, embeddable, zero-configuration SQL database engine.
sqlite2 - 2.8.17-3 - SQLite is a small C library that implements a 
self-contained, embeddable, zero-configuration SQL database engine.

I will need to immediately stop using this IMAPSize or run into more 
mistakes of my own regarding my email....

That means I'll just need to depend on manipulating Thunderbird in some 
way. It does have a form of "local folders" that you can use independently 
of IMAP and that you can use to move local storage into its IMAP 
mirroring. That should be enough for now. (The reason I had to use that 
IMAPSize is because Thunderbird keeps all deleted messages in its store 
and then when you compact it, it ruins your mbox files. But it kinda bit 
me in the arse because I lost all those emails anyway. My Thunderbird 
store was 1.2 GB while my actual email was only about 500 MB. Which is a 
bit problematic for backing up that data to remote hosts. Without an 
external tool I had no way to safely compact it, but I could just have 
started out with a clean install and then just download everything 
anew....). Not the first time my email client ruins my data. When I used 
"Opera M2" which was an older mail client of/inside the Opera browser, it 
also corrupted my email at which point I decided to move to IMAP only.

So I'm a bit sensitive to opportunities of data corruption. And for that 
reason maildir is by far not an option. It encourages stupidity. I can't 
have that anymore now.

I'm just wondering what you could tell me about the prospects of getting 
this to run on my NAS and having a persistent sqlite store for managing 
the downloaded email data. I saw there is a version 2 and a version 3, 
which kinda reminds me of GTK+. As well as python itself. But Python 2 is 
still pervasive and GTK3 is pretty sucky. I don't mind putting in some 
work. And I don't mind becoming well-versed in Python either.

Regards,

Bart Schouten, NL.

[UPDATES]

I wrote this yesterday deep morning but apparently forgot to send it, 
instead postponing it again. I have since learned that my email host 
doesn't have a backup older than a few days. I did undelete a deleted 
System Volume Restore (win7) that I'm now trying to feed to the reader 
tools without much success thus far. It contains at least a string that 
should only be present in those emails. That I need. Grepping this huge 
BLOB in text mode does produce some usuable output. Perhaps it will 
suffice, I'm not sure. But I can at least get some text I need out of it.

Also, I see you're using Python 2.6. I have an aversion, I now realize, 
against 3+. My deep impression is that they have been making some very bad 
"improvements" that were unnecessary and make the code uglier instead of 
prettier. Lots of companies, organisations and inviduals are doing that 
these days. It seems to be the onset of the last days of our current era. 
People have started making thing worse instead of better. This is true for 
GTK3, WordPress 4.0, the WordPress "Jetpack" plugin, which evolved (so to 
speak) from all of the separate WordPress.com services (Akismet, Stats), 
it is true for the European "IBAN" banking system, it is true for IPv6, it 
is true for many features that are being introduced these days (like 
"generics" in Java and in Facebook's PHP version that they call Hack), it 
is true for Blu-ray disk, in terms of the video formats, that no one 
really needs all that badly, it is even more true for the new 4K format 
that they want to sell and push through everyone's throats. Not to mention 
3D-TV. It is true for the children's playgrounds that are being designed 
these days. Solid, decent and fun is not good enough anymore, it has to be 
something fancy new. Fabrics are being improved and the fabrics are 
useless. Video game companies and rendering engine designers are pushing 
for the wrong advancements. Trying to create more and more 'realism' even 
though it makes games less and less attractive. Who ever decided that 
realism is what gamers want?? One of the most popular and commercially 
successful games ever (World of Warcraft) is not realistic at all, being 
very cartoony and fantasy-like. Older games, in eras where all this 
computing power was not available, are in general much better games.

World of Warcraft itself was a much better game before its developer 
started pushing for all kinds of "improvements" that kept the masses 
slightly more happy while actually ruining the entire game. There is no 
end to the amount of "improvements" that people make that actually make 
thing worse.

This is the basic premise of human life anyway, to improve things that 
didn't need improvement and thereby ruin everything. Most essentially the 
way that the human wants to behave, which has been termed sinful and 
guilty and primitive and uncivilised. And so we improved on that and 
created war, hunger, rape, and illness. And now, in the final days of this 
era, people and governments alike have started improving things that were 
completely pointless to want to improve. Companies are reinventing 
themselves and their brands for no reason. Assets are being acquired for 
no purpose. Human behaviours that have always been termed "okay" are now 
suddenly not good enough anymore. Soon we'll all (except me) be chipped so 
that government can keep improving our behaviours.

And what is the worst improvement of today, that I mention? It is actually 
SQLite version 3. It's incredible, and I haven't much looked into it yet, 
and it *seems* like it has become better (binary data no longer encoded as 
text, etcetera) (and UTF8 support?) but actually the improvements are bad 
which is why version 2 is still around.

Same for GTK3, same for Python 3. Not sure what's going on but....

People are abandoning en masse their own feelings and their own ideas of 
what is good and beautiful and make changes that nobody really wants for 
no other reason than wanting to make changes... not knowing what to do 
with themselves otherwise. Because current culture is at its end. The 
current direction has ceased going anywhere. People still embedded in that 
mindset do not recognize that the road has stopped climbing and is now 
steeply falling down in the abyss. Still they venture on on this road not 
having any alternative in their minds. And so they start obliterating what 
was good in order to make it better. And failing miserably.

The human has really really become self-destructive now. The ego is 
reaching maximum orbit. And so the real alternative becomes not only 
wishful, but mandatory. We had it back in the day:

http://en.wikipedia.org/wiki/RealPlayer#Real_Alternative

"In November 2011 RealNetworks' case against Edskes was dismissed and 
RealNetworks was ordered to pay him €48,000 in damages. Details of 
the case and judgement have been published."

But RealPlayer itself died a very painful death ;-).

I myself will probably use Python 2.7. And I will use SQLite 2.8, but I 
might have to recompile it with utf8 support so the routines don't treat 
all utf8 text as single-byte characters. If that succeeds, I will 
definitely try to repackage it for the repository I use. It is a Marvell 
ARM cpu so I will probably need to cross-compile it on some Linux 
installation, but perhaps my webhost will do.

I notice the big activity for this list was in 2011-2012, and has become 
extremely quiet these days. That is not a problem to me, I can probably 
figure everything out myself, but I just need a few pointers to get the 
right.. perspective on how to approach this :).


More information about the OfflineIMAP-project mailing list