<html><body><div style="color:#000; background-color:#fff; font-family:times new roman, new york, times, serif;font-size:12pt"><br><div style="font-family: times new roman, new york, times, serif; font-size: 12pt;"><div style="font-family: times new roman, new york, times, serif; font-size: 12pt;">
On Fri, 6 May 2011 11:14:17 -0700 (PDT), chris coleman wrote:<br>> I want to put this out there and get the opinion of you and the list.<br>> <br>> For performance (while multi-threading.. dealing with huge inboxes.. multiple accounts on multiple servers) and data-integrity reasons (crashes or other interruptions in the code that might damage the data stored previously in flat text files, typically 100KB in size, that were getting written to disk possibly 100 times in a single invocation of offlineimap, every 3 minutes for one week... Now I know why the disk light stays on solid for 1-2minutes when the script is running... yikes!).<br><br>Yes, OfflineImap always played super safe and writes out the cache after<br>every single change. It does so by writing it first to a new tmp file<br>and then moving it into place of the old one to avoid partially written<br>content (often fsyncing inbetween). We basically have to expect to be<br>killed or crash
at anytime, so playing safe is good, in general. It is<br>safe but it kills performance, especially for those guys that have<br>multi-million email boxes (yes, they do exist).<br><br>That's why the sqlite patches have been floating since 2008 or so. But<br>due to development stagnation, they were never incorporated.<br><br>Using a database such as sqlite is very well suited for our purposes I<br>think, although people have pointed out the benefits or plain text<br>files when it comes to e.g. recovering from corruption.<br><br>This change is a major step in my opinion, in terms of performance and<br>generally being nicer to our hard disks. However, there is plenty of<br>stuff left to do in offlineimap on which I would rather focus then<br>implementing or even incorporating an abstract database backend. sqlite<br>is good, but *I* don't really see any benefit in being able to stuff<br>your cache into a postgres. After all, firefox doesn't offer you
the<br>possibility to put your bookmarks and its cache into postgres either.<br><br>> There are some already existing frameworks , pre-packaged, tested and working, and they're available with a simple "apt-get python-sqlobject" or "apt-get python-sqlalchemy" for example.<br><br>Now, that we have a 2nd LocalStatus backend implemented, it would be<br>rather trivial to implement more backends, also one that implements an<br>db abstraction backend. There are only a few functions to<br>implement. Patches are welcome, but *I* am not gonna introduce another<br>level of abstraction for a smallish cache.<br><br>> I think it would be really cool to let the user pick the database that they have available, with a setting in the .offlineimaprc, and the offlineimap python code using one of these persistence frameworks , would be unchanged.<br><br>Once we are convinced that sqlite works great and everyone who remembers<br>that it even can do plaintext, *I* would
rather remove the current<br>option from offlineimap.conf again and just use the sensible<br>default. offlineimap.conf is a monster as it is, and each additional<br>code path means more paths to test (and conversely, less code paths<br>actively being used) which is bound to introduce more regressions and<br>failure opportunities. I'd rather try to keep offlineimap as simple as<br>possible.<br><br>> 1) the added performance and reliability would really be awesome! <br><br>I am 100% sure that using a 3rd party db abstraction would not gain us<br>performance over just using sqlite. But I am willing to be convinced by<br>benchmarks :-).<br><br>> 2) no need to close the connection every time you go thru the loop because another thread will corrupt it. <br>Fixed in the latest revision, sqlite3 is multithreading capable since 3.3. after<br>all (published in April 2008 or so). We don't close it anymore.<br><br>[SNIP lots of valid
stuff]<br>> What's your opinion?<br><br>All nice and good. In the end, it comes down to someone getting their<br>hands dirty and implementing it. When it comes to developer time, *I*<br>would spend my time rather debugging IMAP hangs and improve our Error<br>message handling, than including one more level of abstraction,<br>requiring additional packages to install. This is just a smallish cache,<br>it's not like we are doing a db-based web app. :-)<br><br>But this being open source, the door is always open to contributions for<br>everyone to scratch their itches. ;)<br><br>Sorry if that is not what you wanted to hear from me, but you asked for<br>my opinion :)<br><br>Sebastian<br><br>==========================================<br><br>Sebastian, <br><br>I appreciate your opinion.<br><br>I bring this up because I see the project has been spending valuable energy reinventing some basic database technology that already exists and is tested by millions of
users already-<br>a) flat files
that are written to disk so often so they can play the role of journaling and transaction logs - to prevent crashes from losing/corrupting data. <br>b) cached local copy of a remote database index (this is what the LocalStatus file is).
<div><br>
</div>
<div>With the trend being toward larger
and larger IMAP inboxes (unlimited email storage available nearly everywhere), there are larger and larger LocalStatus cache files.</div><div><br></div><div>The chances of crashing
and losing/corrupting data in the middle of a multi-megabyte write that takes place several times per second - is going to get bigger not smaller. <br></div><div><br></div><div>So, seriously, why
not just let a proven reliable database handle the data integrity
concerns? Even a 64MB laptop can run free mysql with room to spare so it can't be because of system requirements...<br>
</div>
<div><br>
</div>
<div>I would say that if you're willing to try, the next step should be to point out which source files contain database calls. <br></div><div><br></div><div>Do you have, or could you write up, a document that lists the source files that call the db.<br></div><div><br></div><div>And the unwritten rules that must be followed when making calls to the db.</div><div><br></div><div>That is the hard part.<br></div><div><br></div><div>The next step after that is easy. Just have to alter the calls to talk through one of the high rated persistence frameworks... and test.<br></div>
<div></div>
Chris<br><br></div></div></div></body></html>