<html><body><div style="color:#000; background-color:#fff; font-family:times new roman, new york, times, serif;font-size:12pt"><div><span><br></span></div><br><div style="font-family: times new roman, new york, times, serif; font-size: 12pt;"><div style="font-family: times new roman, new york, times, serif; font-size: 12pt;">
On Fri, 6 May 2011 00:34:39 +0200, Nicolas Sebrecht wrote:<br>> > def upgrade_db(self, from_ver):<br>> > """Upgrade the sqlite format from version 'from_ver' to current"""<br>> <br>> This is a good improvement but...<br><br>Thanks<br><br>> <br>> ...we tend to have a lot of constructs like this<br>> <br>> conn, cursor = self.get_cursor()<br>> with conn:<br>> while True:<br>> try: <br>> <execute some SQL request and commit><br>> except blablah:<br>> <retry on database lock><br>> else:<br>> bail out of the while loop<br><br>Yes, 4 times or so, as sqlite will simply bail out if another thread<br>currently locks the db, rather than waiting for the lock.<br><br>> it
would be nice to factorize out. Not that this is wrong by design but<br>> it will cause harder work to maintain with possible errors and bugs.<br><br>I agree that factoring that out would be nice, but... ( :-) )<br>... I really don't want to put that into this topic series. I have been<br>spending plenty of time in refining each of the patches (always doing<br>'git rebase -i next'), and the series is already as big as I am<br>comfortable with a series.<br><br>How about we think about merging this as is now, and work on a followup<br>series that builds on what we have and factors out whatever is<br>possible. For example, we still don't know if this approach [opening and<br>closing the db for each transaction] is scalable at all or whether we<br>don't need a "db owning thread" that we pass the data to. I'd rather<br>gather some data from others first.<br>Alternatively, it got pointed out that *newer versions* of sqlite can<br>access the db from
multiple threads. I would love to find out *how new*<br>that version must be and improve our design to not always close and open<br>our database first before embarking on more architecture design.<br><br>> I think we need a wrapper (or more) proceeding the SQL requests with<br>> generic data methods. Let's me try to categorize the different request<br>> types:<br><br>> - read only: mostly SELECTs with no need to commit()<br><br>right, no need to check for locks here, as we won't conflict. (Well<br>after reading the docs, it basically states that one must always expect<br>them)<br> <br>> - write (we need to commit()):<br>> * one statement: easy<br>> * multiple statements: easy using a list<br><br>Yes.<br><br>> * statements including additional code to execute: pass both the<br>> request and a function to the wrapper<br><br>Mmmh, I am not really sure I
get this yet, can you give an example? What<br>additional code could you be thinking off? Are you thinking of something like<br><br>def savemessageflags(UID, flags):<br><br> res = data.write(SQL ='UPDATE status flags=:flags WHERE UID=:UID',<br> sqlargs ={UID:1,flags:'T'},<br> func=<a function that does something (with what?)>)<br> if res: ....<br>> - statistics<br>> - advanced tracking system for logging IMAP/cache sessions<br>> - manage miscallenous exceptions<br>> - multiple data formats as backend (usefull for testing purpose ,-))<br><br>The above 4 we can already do with quite less effort I think, see how<br>*'relatively easy'* it was to plug in a new data format as<br>backend. Improved exception handling will come anyhow now that a<br>foundation for them is in
place.<br><br>> - thread safe concurrent access by managing queues<br><br>Right, that would be an advantage, but we need to take care that we make<br>sure we don't continue before the queue has been worked off. Offlineimap<br>takes care that it can "crash" anytime without losing data, so we<br>shouldn't just stuff queries in a queue and continue. Not insovable, but<br>requires some design thinking.<br><br>That is why I would love to get what we have as experimental and<br>optional in the code now and go on improving it from there.<br><br>> I think the SQL topic is a very good opportunity for improving our data<br>> driver design. In short, I would expect something like data.write(),<br>> data.read(), (...).<br><br>mmmh, I would think that the current system already provides quite a bit<br>of abstraction by providing overridable classes for each backend for<br>each functionality, but I would be happy to get a discussion started
on<br>what and how to improve things. But I still feel that is someout out of<br>the scope of this topic series.<br><br>Sebastian<br><br>==============<br><br>Sebastian, Nicolas, List,<br><br>I want to put this out there and get the opinion of you and the list.<br><br>For performance (while multi-threading.. dealing with huge inboxes.. multiple accounts on multiple servers) and data-integrity reasons (crashes or other interruptions in the code that might damage the data stored previously in flat text files, typically 100KB in size, that were getting written to disk possibly 100 times in a single invocation of offlineimap, every 3 minutes for one week... Now I know why the disk light stays on solid for 1-2minutes when the script is running... yikes!).<br><br>http://pythonsource.com/open-source/persistence<br><br>There are some already existing frameworks , pre-packaged, tested and working, and they're available with a simple "apt-get python-sqlobject" or
"apt-get python-sqlalchemy" for example.<br><br>These handle thread-safeness and provide generic interface to most database backends (mysql, postgresql...) <br><br>I think it would be really cool to let the user pick the database that they have available, with a setting in the .offlineimaprc, and the offlineimap python code using one of these persistence frameworks , would be unchanged.<br><br>The user could select from: flat text file, or a back end database, with connection hostname username, pw and table prefix.<br><br>In my opinion, <br>1) the added performance and reliability would really be awesome! <br>2) no need to close the connection every time you go thru the loop because another thread will corrupt it. <br>3) And no need to reinvent, develop and troubleshoot a framework that someone has already done.. <br>4) There is a reason why these frameworks exist and it's because the problem of handling persistence in an efficient and
thread safe way is not trivial, nor is it built into python.<br>5) I would bet you that 99% of people running offlineimap are running on a machine that
has a db that one of these persistence frameworks could talk to, even if it's just sqlite.<br>
<br>What's your opinion?<br><br>Chris<br><br></div></div></div></body></html>