Best alternative to maildir?
Bart Schouten
lists at xenhideout.nl
Fri Sep 19 19:22:19 BST 2014
Hi,
I'll tell you a very short story. Believe it or not, but I lost important
emails due to the fact that my backup tool that I used (IMAPSize) uses a
maildir format to store its backups and as a result of that choice (to use
that tool) I became careless and made some bad choices which resulted in
Thunderbird (most likely) corrupting its mail store without me knowing it
and deleting the entire month of May and June (almost, save for one
message) and half of March and more messages from July. This year.
And because I was careless and didn't notice, I deleted the backups I had
before and Thunderbird obviously also deleted all of that from my IMAP
store.
Which is just a remote host that hopefully and likely will have a backup
since this only happened 2 weeks ago.
So, perhaps weird to consider or hard to understand but the fact was that
I immediately disliked the fact that while IMAPSize is capable of
converting to mbox format, it doesn't use that, instead burdening the
filesystem with thousands of files (in my case 12k files). I know it's
pretty common for Linux or Unix systems to do that thing. I recently
downloaded a source archive for my Synology NAS and it literally consisted
of 500k files and something like 27k directories. And probably 400k of
those files were smaller than 5kB. Thousands upon thousands of little tiny
header files and makefiles and whatnot. Windows 7 Explorer completely
choked on it and even reading the archive in WinRAR proved fruitless. I
just had to extract it straight away.
But I proceeded anyway because IMAPSize seemed like the perfect tool and
actually it is pretty awesome except that it is not. And only because of
maildir.
Storing so many files in a filesystem instead of a packed archive the way
for example Blizzard games have done since forever is completely
pointless. At least to me it is. It makes your archive extremely
vulnerable to all kinds of corruption. To name just something: Windows 7
has a bug where it changes the modification date of the .eml file if it
ever accesses it. This also happens on a file copy. And then, when you
sync those files, horror ensues. So now you're suddenly dependent on and
vulnerable to the workings of the operating system and its perfect
handling of whatever file you can have. Whereas if you stored those emails
in e.g. mbox format, none of that would really matter. Even if the OS
would start messing with the mbox (the way some virus scanners are
reported of doing, apparently...) it would probably be trivial to solve
since you don't have hundreds of IMAP folders in your repository. You just
have to maintain a few larger files and backups of those.
Storing all your thousands of emails in invididual files is like not
having a home with a front door, but rather living on a grass field of 20x
40 meters and distributing all your possessions along that field. It
doesn't contain anything and containment is important. On the level of the
filesystem, those 20-60k files you might have in your mail are not
organized at all, they are all just inodes in the inode blocks and
whatever distribution and possible dispersion they might have along your
actual disk.
When I first heard of maildir (on the OfflineIMAP front page) (or actually
just moments before somewhere else) I didn't even know what it was, only
that it wasn't any good. That was my immediate knowledge. My intuitive
knowing. And learning what it was, and is, it became immediately clear
that it was the very same thing as that which caused me to lose those
emails.
Because I started storing my emails as individual files, in my deeper
mind, so to speak, they started to lose the sense of being one integral
collection. They became those dispersed objects on the grass fields. And
it should come as no surprise that if you organize your possessions that
way, you will end up losing parts of them and you won't even notice. How
could you notice? You'd have to scan the entire field or area to know if
everything is still there.
And all of this is just psychological. It caused me to become unattentive
and lose my grasp on the containment of my integral datastore of emails.
And the direct physical result of that was losing entire months of email
due to a Thunderbird bug, most likely.
Heh, I was so happy (or thought I was) with that IMAPSize program that I
immediately made a donation to that developer/person/guy. But the (paypal)
money was never accepted, and even though PayPal itself doesn't offer any
ways or tools of getting that money back, I just reversed the payment
through my bank. I need to correct this mistake or several important deals
with some sellers of certain goods will fall flat on their faces. Without
the emails that have led up to this point in time, and this point in the
interactions, I can never complete the deals and the interactions and they
are just going to ignore any requests I make of them.
So, mentally and spiritually speaking, I'm just going to reverse
everything I've done and remove my investment in that IMAPSize program, to
begin with. But naturally I still need a different and another solution
and I know it is going to be OfflineIMAP. When I see the code, and the
commit(s), my immediate impression is that the developers are inspired.
I will need to get it running on my Synology NAS. But being Python, it
probably won't be so hard. I haven't done much in Python up to this date,
just some GTK2 tryout thing for turning an old MS-DOS Qbasic thing of mine
into something a little more modern ;-).
And it's not like I have a shortage of projects... right now I have at
least 4 programming things I'm working on. But one of them is going to
involve Python at some point anyway. One is just PHP hacking of a wiki
solution. One is a little Java server application that I want to marry to
Python as well with Java doing the structural fool-proof and resilient
'platform' execution while python would do a form of scripting that I need
for it. Not sure how to do that yet. There are probably libs that I could
use to have a regular python interpreter provide a form of 'embedded'
scripting environment within the server. Preferably not through any
command line or inter-process communication but a real compiled library.
And then there is another thing I want to do for the Synology platform
involving probably regular C no matter how much I hate it... something
that will have to use nCurses and in fact I now see that I will have to
use Perl for that. Python would not be robust enough on a systems level.
The Synology has a bit of IMAP functionality available but it is not more
than this:
cyrus-imapd - 2.2.12-15 - The Carnegie Mellon University Cyrus IMAP Server
cyrus-imapd-devel - 2.2.12-15 - The Carnegie Mellon University Cyrus IMAP
Server
cyrus-imapd-doc - 2.2.12-15 - The Carnegie Mellon University Cyrus IMAP
Server
imap - 2007a1-1 - University of Washington IMAP package
imap-libs - 2007a1-1 - University of Washington IMAP package
php-imap - 5.2.17-2 - imap extension for php
up-imapproxy - 1.2.5-1 - proxies IMAP transactions between an IMAP client
and an IMAP server
But at this point, I definitely don't want to run an IMAP server myself,
AND I don't want to use any folder-based mail storage.
I see your program uses SQLite in some way. SQLite would not be a bad way
to store email I would think! Not sure what you use it for. I'm versed
well enough in SQL to make something work and I have a little bit of
experience with these self-contained databases.
So, before I drift off into another day of being so tired I can't do
anything....
What are your thoughts on what I want to do?
Here is the grep for the package list of "sqlite" on my NAS, excluding
perl:
py24-sqlite - 2.4.1-1 - pysqlite is an interface to the SQLite database
server for Python. It aims to be fully compliant with Python database API
versi
sqlite - 3.8.1-1 - SQLite is a small C library that implements a
self-contained, embeddable, zero-configuration SQL database engine.
sqlite2 - 2.8.17-3 - SQLite is a small C library that implements a
self-contained, embeddable, zero-configuration SQL database engine.
I will need to immediately stop using this IMAPSize or run into more
mistakes of my own regarding my email....
That means I'll just need to depend on manipulating Thunderbird in some
way. It does have a form of "local folders" that you can use independently
of IMAP and that you can use to move local storage into its IMAP
mirroring. That should be enough for now. (The reason I had to use that
IMAPSize is because Thunderbird keeps all deleted messages in its store
and then when you compact it, it ruins your mbox files. But it kinda bit
me in the arse because I lost all those emails anyway. My Thunderbird
store was 1.2 GB while my actual email was only about 500 MB. Which is a
bit problematic for backing up that data to remote hosts. Without an
external tool I had no way to safely compact it, but I could just have
started out with a clean install and then just download everything
anew....). Not the first time my email client ruins my data. When I used
"Opera M2" which was an older mail client of/inside the Opera browser, it
also corrupted my email at which point I decided to move to IMAP only.
So I'm a bit sensitive to opportunities of data corruption. And for that
reason maildir is by far not an option. It encourages stupidity. I can't
have that anymore now.
I'm just wondering what you could tell me about the prospects of getting
this to run on my NAS and having a persistent sqlite store for managing
the downloaded email data. I saw there is a version 2 and a version 3,
which kinda reminds me of GTK+. As well as python itself. But Python 2 is
still pervasive and GTK3 is pretty sucky. I don't mind putting in some
work. And I don't mind becoming well-versed in Python either.
Regards,
Bart Schouten, NL.
[UPDATES]
I wrote this yesterday deep morning but apparently forgot to send it,
instead postponing it again. I have since learned that my email host
doesn't have a backup older than a few days. I did undelete a deleted
System Volume Restore (win7) that I'm now trying to feed to the reader
tools without much success thus far. It contains at least a string that
should only be present in those emails. That I need. Grepping this huge
BLOB in text mode does produce some usuable output. Perhaps it will
suffice, I'm not sure. But I can at least get some text I need out of it.
Also, I see you're using Python 2.6. I have an aversion, I now realize,
against 3+. My deep impression is that they have been making some very bad
"improvements" that were unnecessary and make the code uglier instead of
prettier. Lots of companies, organisations and inviduals are doing that
these days. It seems to be the onset of the last days of our current era.
People have started making thing worse instead of better. This is true for
GTK3, WordPress 4.0, the WordPress "Jetpack" plugin, which evolved (so to
speak) from all of the separate WordPress.com services (Akismet, Stats),
it is true for the European "IBAN" banking system, it is true for IPv6, it
is true for many features that are being introduced these days (like
"generics" in Java and in Facebook's PHP version that they call Hack), it
is true for Blu-ray disk, in terms of the video formats, that no one
really needs all that badly, it is even more true for the new 4K format
that they want to sell and push through everyone's throats. Not to mention
3D-TV. It is true for the children's playgrounds that are being designed
these days. Solid, decent and fun is not good enough anymore, it has to be
something fancy new. Fabrics are being improved and the fabrics are
useless. Video game companies and rendering engine designers are pushing
for the wrong advancements. Trying to create more and more 'realism' even
though it makes games less and less attractive. Who ever decided that
realism is what gamers want?? One of the most popular and commercially
successful games ever (World of Warcraft) is not realistic at all, being
very cartoony and fantasy-like. Older games, in eras where all this
computing power was not available, are in general much better games.
World of Warcraft itself was a much better game before its developer
started pushing for all kinds of "improvements" that kept the masses
slightly more happy while actually ruining the entire game. There is no
end to the amount of "improvements" that people make that actually make
thing worse.
This is the basic premise of human life anyway, to improve things that
didn't need improvement and thereby ruin everything. Most essentially the
way that the human wants to behave, which has been termed sinful and
guilty and primitive and uncivilised. And so we improved on that and
created war, hunger, rape, and illness. And now, in the final days of this
era, people and governments alike have started improving things that were
completely pointless to want to improve. Companies are reinventing
themselves and their brands for no reason. Assets are being acquired for
no purpose. Human behaviours that have always been termed "okay" are now
suddenly not good enough anymore. Soon we'll all (except me) be chipped so
that government can keep improving our behaviours.
And what is the worst improvement of today, that I mention? It is actually
SQLite version 3. It's incredible, and I haven't much looked into it yet,
and it *seems* like it has become better (binary data no longer encoded as
text, etcetera) (and UTF8 support?) but actually the improvements are bad
which is why version 2 is still around.
Same for GTK3, same for Python 3. Not sure what's going on but....
People are abandoning en masse their own feelings and their own ideas of
what is good and beautiful and make changes that nobody really wants for
no other reason than wanting to make changes... not knowing what to do
with themselves otherwise. Because current culture is at its end. The
current direction has ceased going anywhere. People still embedded in that
mindset do not recognize that the road has stopped climbing and is now
steeply falling down in the abyss. Still they venture on on this road not
having any alternative in their minds. And so they start obliterating what
was good in order to make it better. And failing miserably.
The human has really really become self-destructive now. The ego is
reaching maximum orbit. And so the real alternative becomes not only
wishful, but mandatory. We had it back in the day:
http://en.wikipedia.org/wiki/RealPlayer#Real_Alternative
"In November 2011 RealNetworks' case against Edskes was dismissed and
RealNetworks was ordered to pay him €48,000 in damages. Details of
the case and judgement have been published."
But RealPlayer itself died a very painful death ;-).
I myself will probably use Python 2.7. And I will use SQLite 2.8, but I
might have to recompile it with utf8 support so the routines don't treat
all utf8 text as single-byte characters. If that succeeds, I will
definitely try to repackage it for the repository I use. It is a Marvell
ARM cpu so I will probably need to cross-compile it on some Linux
installation, but perhaps my webhost will do.
I notice the big activity for this list was in 2011-2012, and has become
extremely quiet these days. That is not a problem to me, I can probably
figure everything out myself, but I just need a few pointers to get the
right.. perspective on how to approach this :).
More information about the OfflineIMAP-project
mailing list