<DKIM> Re: <DKIM> [PATCH,review]: add lmdb folder backend

Luke Kenneth Casson Leighton lkcl at lkcl.net
Mon Dec 19 21:37:36 GMT 2016


---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68


On Mon, Dec 19, 2016 at 5:53 PM, Nicolas Sebrecht
<nicolas.s-dev at laposte.net> wrote:
> On Mon, Dec 19, 2016 at 02:06:04PM +0000, Luke Kenneth Casson Leighton wrote:
>
>>  ... and make life for debian, ubuntu, fedora and all other distros
>> that have strict policies for explicitly and meticulously maintaining
>> an accurate and up-to-date list of all copyright holders for all
>> files...
>
> It's damn easy to get the contributors from the git logs in a per-file
> basis.

 if the entire commit history right back to the init has signed-off
notices (or each file has been modified since to include one
signed-off per copyright holder) then yeah that would work.

> BTW, offlineimap is already packaged in all the distributions you
> mentioned and many others.

 if copyright_check.py understood git logs i'd be able to verify the
accuracy of their copyright notice claims :)

> I'd not worry about that.

 ok

> This decision was taken collectively to become a policy on
> contributions.
>
>> > We need to have this dependency made optional.
>>
>>  import sqlite should probably also likewise.
>
> No, no. The sqlite backend was made the default and the legacy plain
> text has been announced deprecated. There are plans to remove the old
> plain text backend.

 mm... it helps for debugging, testing and understanding.  i looked at
the plain text one as well as the sqlite one in order to decide what
to do.

> For this to happen, we moved the SQLite dependency from optional to
> required.
>
> We take care of the dependencies to allow users to clone/download the
> sources and run offlineimap with minimal other requirements.

 yep.  sorted

>> >> +        with self._env.begin() as txn:
>> >
>> > I wonder it's missing of locks. This class will be *instanciated* and
>> > used more than once in different threads.
>>
>>  ok lmdb is a multi-reader with a single global mutex on writing.
>> it's extremely clever.  so all those "with self._env.begin()" blocks
>> are transactions (that don't block readers) - the only one(s) that
>> will block (across all threads) is the "with
>> self._env.begin(write=True)" ones.
>
> Then, I wonder how atomicity of the changes is handled to recover from
> unexpected kills.

 ahh, that's the really cool part: that top-level b-tree, it's a
single page (and all pages are aligned to match exactly with the OS's
VM page size and boundary).

 if that one page (as an example) is not written out to disk properly
on an fsync by the OS (or later as part of the ongoing flushing
later), even on a powerloss or segfault, then you have much much
bigger problems with the hardware and the OS than just having lmdb
corrupt its data.

if it *doesn't* get updated, then all that happens is that on restart,
the application will discover that the entire database integrity is
perfectly ffine... but that the history is back to whatever it was as
if that top-level page wasn't written out.... i.e. the previous atomic
write transaction.

it's extremely powerful and i had no idea that shmem with
copy-on-write could be used in such an amazing way.   it took one
reviewer a *YEAR* to fully understand lmdb.

>>  amazingly this code actually works, i found a couple of errors, also
>> in the current version i removed the dict and replaced it with a
>> 4-tuple, no point storing the keys of the dict repeated all the time,
>> they just take up space, esp. when there's versioning.
>>
>> the one thing i'm really not happy about is the *massive* in-memory
>> cacheing.  i have 200,000 messages, it's a 9GB gmail folder, and as a
>> libre project maintainer i need to be keeping an eye on messages
>> several times an hour.  100% CPU usage for up to 90 seconds, to take
>> an in-memory copy of the database, isn't okay.
>
> I think the all-in-memory pattern is there because it was easier to
> implement at the beginning.

 yeahyeah understandable, don't make life complicated (early
optimisation) until you know you need it

> I'm not sure the contributors worried
> about the memory consumption.
>
> There are reported issues in the bug tracker about this limitation.

 ack.

>> ok off to taiwan in the morning, will take this up again in a day or so.
>
> I'll be away by the end of the week up to next year. No worry, we aren't
> in a hurry. ,-)

 whereas whoops, i'm on a practical time-limit as well as "on the
clock" for returning to zhuhai, mwaaa.

 i don't normally do unit-tests but i need this to work reliably and
fairly soon, i'll try writing something up which does some
simultaneous mods (fake messages), takes all 3 backends, shoves data
in them (compares them), changes data (compares them), removes data
(compares them).

 it'll be fuuun :)

l.




More information about the OfflineIMAP-project mailing list