<DKIM> Re: <DKIM> [PATCH,review]: add lmdb folder backend

Nicolas Sebrecht nicolas.s-dev at laposte.net
Mon Dec 19 17:53:16 UTC 2016


On Mon, Dec 19, 2016 at 02:06:04PM +0000, Luke Kenneth Casson Leighton wrote:

>  ... and make life for debian, ubuntu, fedora and all other distros
> that have strict policies for explicitly and meticulously maintaining
> an accurate and up-to-date list of all copyright holders for all
> files...

It's damn easy to get the contributors from the git logs in a per-file
basis. BTW, offlineimap is already packaged in all the distributions you
mentioned and many others.

I'd not worry about that.

This decision was taken collectively to become a policy on
contributions.

> > We need to have this dependency made optional.
> 
>  import sqlite should probably also likewise.

No, no. The sqlite backend was made the default and the legacy plain
text has been announced deprecated. There are plans to remove the old
plain text backend.

For this to happen, we moved the SQLite dependency from optional to
required.

We take care of the dependencies to allow users to clone/download the
sources and run offlineimap with minimal other requirements.

> >> +        with self._env.begin() as txn:
> >
> > I wonder it's missing of locks. This class will be *instanciated* and
> > used more than once in different threads.
> 
>  ok lmdb is a multi-reader with a single global mutex on writing.
> it's extremely clever.  so all those "with self._env.begin()" blocks
> are transactions (that don't block readers) - the only one(s) that
> will block (across all threads) is the "with
> self._env.begin(write=True)" ones.

Then, I wonder how atomicity of the changes is handled to recover from
unexpected kills.

>  damn gmail variable-width fonts, can't tell what you're pointing at,
> i assume the space in betweeen () and ) which i put there deliberately
> so as to emphasise the empty tuple... yeh ok it goes :)

Yes, we should have: ())

>  amazingly this code actually works, i found a couple of errors, also
> in the current version i removed the dict and replaced it with a
> 4-tuple, no point storing the keys of the dict repeated all the time,
> they just take up space, esp. when there's versioning.
> 
> the one thing i'm really not happy about is the *massive* in-memory
> cacheing.  i have 200,000 messages, it's a 9GB gmail folder, and as a
> libre project maintainer i need to be keeping an eye on messages
> several times an hour.  100% CPU usage for up to 90 seconds, to take
> an in-memory copy of the database, isn't okay.

I think the all-in-memory pattern is there because it was easier to
implement at the beginning. I'm not sure the contributors worried
about the memory consumption.

There are reported issues in the bug tracker about this limitation.

> however.... it looks like in-memory cacheing is an integral part of
> the design.  i thought about creating a derivative class of UserDict
> (one that understands and mirrors the status_db) and then returning
> that from cachemessagelist, the general idea being that the
> replacement items, __get__ etc. functions actually *directly*
> reference the lmdb database on-demand instead of keeping the data
> in-memory.  lmdb is a particularly good candidate for this because it
> does *not* take copies of keys or values: because it's a mmap'd shmem
> file (with COW enabled) it passes the *real* memory address around.
> kinda cool.
> 
> ok off to taiwan in the morning, will take this up again in a day or so.

I'll be away by the end of the week up to next year. No worry, we aren't
in a hurry. ,-)

-- 
Nicolas Sebrecht



More information about the OfflineIMAP-project mailing list