Best alternative to maildir?

Bart Schouten lists at xenhideout.nl
Fri Sep 19 22:28:13 BST 2014


I was not asking for such remarks.

I was asking for feedback on a creative proposal.

That is all. These remarks are useless. I have already set my mind, and 
you cannot decide what works best for me and what doesn't.

Regards,...


ps. a filesystem was not designed to "handle a large number of files". It 
was designed to /handle files/ no matter their number. If you create an 
ext3fs with 20 inodes, it will NOT handle a large number of files. The 
number of files a fs can or will handle is up to the programmer, designer 
or configurator. It is pretty pointless to go and increase your file 
number "just because you can". Such remarks are utterly pointless and 
idiotic and not something I would expect from this list/program. You are 
now treating it as a "we just always do it this way, no reason really". 
It's like saying I need to write a 2 meg text file and you say "well, 
split it in 100 files, the filesystem was designed for that!".

What nonsense. Give someone else your crap, not me. And if no one here can 
help me, I'll just do it on my own. I have no need for wise-guys who claim 
to know better (and even best) how ANOTHER person should be doing his 
projects. I was not asking or begging for your programming time 
whatsoever, just a few simple perspectives on how best to do what I want 
to do.

"I want to build a house."
"You shouldn't build a house, houses are stupid."

VERY USEFUL.

ps2. "If you are worried about updating metadata too often, enable noatime 
in fstab." kinda proves how moronic you are being here. You are now 
tunnel-visioning onto a parameter of a freakin filesystem or mount option 
whereas any decent design should be filesystem-agnostic except perhaps for 
obvious requirements (as could be) such as symlink support. So you are now 
basically suggesting to me performance alleviations in case your superb 
design starts overloading the (metadata) system of that fs. THAT PROVES 
that your design is flawed. Or you wouldn't immediately start suggesting 
workarounds to the problems it may cause!!!!

And then you say "It all works just fine on unix." when you JUST TOLD ME 
how to adjust freaking PARAMETERS in case it DOESN'T "just work fine"....

DOH?

My god.. And your complaints about lack of interoperatiblity obviously 
wouldn't matter if you were to port a software to a failing OS like MS 
Windows? That completely CHOKED (in Windows Explorer) on a new folder 
worth 6000 .eml files. "Lock you into one toolset". Well, tell you what, I 
have never used any other program that stored email as thousands of .eml 
files. Nor have I ever wanted to. From the perspective of a Windows user, 
it is nonsensical at best and utterly utterly dumb at best. The word I 
want to use is not even in English. DEBIEL. Having folders with 6k+ files 
makes them completely unmanagable in a manual way. Sure, shell scripts. 
Fine. Everyone should be dependent on his or her own ability to write 
fail-proof shell-scripts just in case he or she wants to use freaking 
EMAIL. You are a commercial mind my friend. You should design the next OS, 
and sell it to consumers, you will do excellent!

Not really.

There is a good reason why not any commercial (let's say, really popular) 
email client would never store email as individual files. 99.5% of users 
would complain about it. People who go into Thunderbird see a neat lists 
of files that coincide with their IMAP folders. They are not burdened with 
the details of what is inside those files, because they have a tool for 
that: It's called their email program (Thunderbird itself).

When you write software (libraries) you also make reasonable judgements 
about how you will distribute your containment. On the one extreme, you 
could put every subroutine or function into its own file and then include 
that. On the other extreme, you could not compartmentalise at all and dump 
everything into a single monolothic file. It is clear any design needs to 
find the middle ground.

Thunderbird stores it as some form of mbox (not really sure if it is what 
they call "mbox" but with a simple plugin (it should be standard of 
course) it is easy to export as .eml files. This plugin is also only 
capable of importing .eml files, it seems. So where is my lack of 
interoperability now? I use Thunderbird, it doesn't store as .eml, and I 
am still capable of interoperating with anything I want.

Unless of course I want 30 different tools to operate on my email 
collection at once. That would spell "safety". Having these files all out 
in the open would invite anything and everyone to mess with it. Any badly 
configurated tool could start deleting stuff and I wouldn't notice. If it 
started deleting "mbox" files I would be SURE to notice. What about the 
filenames of those emails. How are you going to make that meaningful? I 
can think of no meaningful naming scheme whatsoever, since the filenames 
themselves are not relevant to an email program. Are you now going to 
encode metadata in the name??? That would introduce MORE opportunities for 
disaster. Now the filesystem has to, for example, be able to deal with 
e.g. Japanese kanji characters and the like, AND your own OS has to be 
able to handle it as well.

You are creating requirements and dependencies on perfect operation left 
and right. And you call this a good design??

You also did not really grasp the meaning of my design goals. I am not 
intending to create a backup store without any containment (do you not 
realize the complete contradiction in those words? Backup? No-containment? 
Safe? Dispersed?) just because "other tools" will be able to operate on my 
"backup". I HAVE IMAP FOR THAT.

My backups get TUCKED AWAY inside even encrypted rsyncable tarballs. Why 
on EARTH do I want other tools to be able to operate on that? In general a 
backup wants to be restored in its entirety and when that is not the case, 
the backup tool itself (in this case OfflineIMAP) can arrange for its 
individual extraction.

With email at least, I rarely make manual mistakes because we have Trash 
folders that provide a two-level delete. So my backup is meant for 
safekeeping, not for operating on. In the rare event that I do need to 
operate on it, it really won't be hard, I can promise you that.

In short, the filesystem is NO place for storing email metadata 
whatsoever. All metadata should be done by the application.

The "interoperable" system I and we have is called IMAP. It is a 
standardized protocol which has a huge variety of tools to work with. 
Although it seems to be lacking in terms of backup abilities and 
possibilities and alternatives, which is why I am here.

This is my "medium" through which I move to other tools. I don't need 
another standardized protocol (such as the way I store my email locally) 
to operate with "other tools". LEAST of all on my BACKUPS.

Introducing another common medium is pointless to begin with. At least for 
my every purpose, it is. Completely. I have been using email for a million 
years, so to speak, just like everyone else here, and I have never needed 
any other "tools" to work on my email store. And failing proper backups, 
even one failing tool that syncs with some IMAP store is enough to ruin 
your data. Even for a mail spool, I would not depend on individual files. 
You are putting too much emphasis and too much reliance and too much 
dependence on the filesystem handling your operational tasks when it is 
not even designed for that. It is abusing your fs.

So you are claiming it was designed for that. I am claiming you are 
abusing it. There we have our difference in perspective.

"It all works just fine" or "It works for me" are more of those 
meaningless remarks. Whether it "works just fine" for you or for some 
system you claim to be expert on, does not mean anything at all. What does 
"it works" mean. What goals, standards and requirements are you relating 
that to when you say that? One could have very low goals and then it would 
"work just fine" and another could have very high goals and it would fall 
apart.

So these are subjective, relative statements with unexplained premises. 
What works for you may not work for another. You fail to look outside your 
narrow set of requirements.

If I thought the tools available would do the job, I would not have 
written my initial email. So basically you are thinking I am a moron, 
which is why I am calling you one straight out. If the tools did the job I 
need them to do, I would not have written anything and I would have just 
used them. PERIOD. My writing proves to you that the tools do not do the 
job I want them to do. YOU KNOW THAT. But you are just offended because 
you are apparently kindof ego-invested in these tools being so awesome.

And if someone comes out and says "these tools are not very awesome for 
me" you apparently feel insulted because I seem to insult your choices 
(or those of others you agree with). And then you start projecting YOUR 
emotional needs onto ME. But I'm not dissing your personal choices. If you 
think these tools are right for you, fine. They are not right for me. But 
you can't live with that, apparently.

And so I am setting out to improve them. For my personal purposes. And why 
should I not have the right to do that? It is open source, probably, 
apparently. Every other tool I need as well.

Everything is available: python, sqlite, a linux system, and offlineimap 
itself. I do not even need more than that to completely achieve my current 
goals. Why are you trying to think for me what would work best for me? You 
are obviously not thinking in a creative direction, but in a reactive 
pattern trying to dissuade me from proceeding as I am doing.

One tool I *could* use is having the ability to easier sync stuff I am 
currently working on. Saving this email only on my server is not good 
enough for me as I write. So I am currently at least manually copying it 
from my postponed-msgs folder each time but I am doing it so often that it 
becomes an annoying thing to do. Presently I *could* use that IMAPSize to 
sync my ENTIRE mailbox after postponing it (in Alpine). But ideally I 
would use something else (like OfflineIMAP) with a script to just sync 
that postponed-msgs and (currently) also the "drafts" folder on the press 
of a button. Again, that doesn't require anything more than what I am 
currently proposing.

When I am writing something bigger or more important, I also make 
intermediate printouts in case disaster strikes. I do not flirt with 
disaster. Never. Except... after I started using IMAPSize. Coincidence? We 
both know I am not speaking bull.

For very important things, I sometimes store it in 4 locations at once as 
I am writing or working on it. And now you are blaming my workflow for 
lack of safety? Sure I lack the tools presently to automate it. I have a 
sync script setup to a USB stick and that works fine, I don't have it 
loaded now. If I need a third location for this text of this email, I 
would mount it and start syncing that. Currently I need a sync from IMAP 
to local files more than anything else. But in principle, individual 
drafts of individual emails are not a meaningful distinction or 
separation. If I could sync the entire folder at once, that would be 
exactly what I need. Why then store as individual files? What purpose does 
it have?

The only purpose it has is for lazy programmers who use the filesystem to 
provide for operational support they would otherwise need to encode into a 
different datastructure in some individual (set) of files. Just imagine a 
database storing every row of every table in a separate individual file! 
It would be madness!!! For one, it would immensely clog up the 
filesystem's (inode) tables. The number of inodes in a ext-fs is fixed at 
creation and cannot be changed or increased. So now you are marrying the 
number of emails you have to a prior design choice regarding the number of 
inodes you want to have.

And if you had thousands of users with thousands of emails each, or more, 
......

...you could easily see you would run into problems of scalability right 
away.

The only reason you can say mbox files are more prone to corruption is 
because you are now depending on the functioning of the filesystem proper 
which has seen much greater development than anything any individual tool 
could probably allow or mandate. So you are abusing something that is just 
well-tested and fully-developed even though it is not the proper place for 
that functionality. And then you say that that is safer. It's only safer 
if you mangle your own mbox files, or allow other tools to do so.

Any reasonable "folder" implementation should have indices as PART of the 
container itself. And then your tool should have a sublayer that handles 
ALL access into individual "sections" such as would be, for example, 
emails. If that sublayer is developed well enough, there should be no 
issues more than in a filesystem that is equally well-developed. SQLite 
should be such a "sublayer". Perhaps it is not perfectly ideal but it 
should do the job for now...

Right? Maybe I'm wrong about that... I was abit hesitant about SQLite for 
this from the beginning.... Any regular database stores indices separate 
from the data itself, I believe. That may seem common sense, but it is 
not, because you have now created a dependency that is, from the level of 
the filesystem, external. Thunderbird just flags deleted emails as 
"removed" in its index and/or folder file while not actually removing the 
email itself so as to save on rewriting costs, which is not a bad 
approach. It then offers to "compact" the folder once in a while, which, 
again in principle, is not all that bad. What I would personally do, is 
create an index at the beginning capable of storing an X amount of emails 
(indices to emails). The last block would then be a pointer to the next 
index-block further down. This is much like the ext-fs inode blocks. In 
this way, if you grab parts of a large inbox file you can still make sense 
of them as long as they contain these index-blocks which, of course, are 
also easy to rebuild given uncorrupted data-segments. I would put X at 
500. If every email requires about 1k worth of meta-data, you'd get a 
block of 500k. But something like that needs a bit of development of 
course. Ideally you would not have to write such a library yourself....

Regards, and BYE.

B.




On Fri, 19 Sep 2014, Zak Smith wrote:

> Maildir is a robust and standardized way for multiple programs to
> (simultaneously) access mail messages on (mostly) unix systems.  There
> is nothing wrong or unstable about it.
>
> You are projecting emotions about your loss of data.  You should blame
> some combination of Thunderbird, IMAPSize, your workflow and/or your
> mistakes, and lack of a viable backups on your data loss.
>
> There is nothing wrong with having a large number of files. That's
> what a filesystem is designed to do.  If you are worried about
> updating metadata too often, enable noatime in fstab.
>
> Using the trivial example, mbox files are much less reliable with
> regard to data loss or corruption compared to maildir.  Any other more
> consolidated method (ie database files) will dramatically reduce
> interoperability and then lock you into one tool set.
>
> In the case of some messages becoming corrupted and/or deleted in even
> a huge mailbox, restoring the damaged messages from a viable backup
> can be done with a simple shell script.  It all works just fine on
> unix.
>
>
>
>
> On Fri, Sep 19, 2014 at 08:22:19PM +0200, Bart Schouten wrote:
>> Hi,
>>
>> I'll tell you a very short story. Believe it or not, but I lost
>
> <snip>
>
>
>
>
> --
> # Zak Smith    mobile 970-232-4468
>
>




More information about the OfflineIMAP-project mailing list