IDLE again

Fri Dec 4 19:45:43 UTC 2009

On Fri, Dec 4, 2009 at 9:26 AM,  <exarkun at twistedmatrix.com> wrote:
> On 08:26 am, dom.lobue at gmail.com wrote:
>>
>> On Thu, Dec 3, 2009 at 10:32 PM,  <exarkun at twistedmatrix.com> wrote:
>>>
>>> On 05:45 am, dom.lobue at gmail.com wrote:
>>>>
>>>> Jean-Paul,
>>>>
>>>> I read over the IMAP IDLE RFC and went through the IMAP4 twisted
>>>> library and I've sketched out a rough outline of how to implement
>>>> IDLE. I've run into some things in Twisted however that I don't really
>>>> understand well, and I'm hoping you can point me in the correct
>>>> direction.
>>>
>>> Cool.  That was quick. :)  Before I get into things, since this thread is
>>> likely to go into a lot of Twisted-specific details which may not be
>>> generally interesting, if there's anyone who'd like off the cc list,
>>> please
>>> speak up. :)
>>>>
>>>> First, just to verify my understanding: the IMAP4Client class is a
>>>> Protocol class.
>>>
>>> Yep.  And to expand on that, instances of Protocol classes typically have
>>> a
>>> one-to-one relationship with a connection.
>>>>
>>>> All IMAP commands are represented by at least two
>>>> methods in the IMAP4Client class - one for what to do when the command
>>>> is received from the server, and one for when the command is sent to
>>>> the server.
>>>
>>> Generally, though there are some exceptions.  For example, AUTHENTICATE
>>> is
>>> implemented with one method that starts by possibly sending a CAPABILITY
>>> (IMAP4Client.authenticate) command, then another method which will
>>> actually
>>> send AUTHENTICATE (IMAP4Client.__cbAuthenticate), then two more methods
>>> for
>>> dealing with the response to the AUTHENTICATE
>>> (IMAP4Client.__cbContinueAuth
>>> and IMAP4Client.__cbAuthTLS).
>>>
>>> Another way to look at it is like this.  For each supported protocol
>>> action,
>>> there is at least one public method on IMAP4Client to initiate this
>>> action
>>> by sending some bytes to the server.  All bytes received by IMAP4Client
>>> from
>>> the server are parsed according to the state the client is in, what
>>> commands
>>> are outstanding, etc.  Depending on the state and the bytes, callbacks
>>> might
>>> be invoked as a result of this, possibly delivering the results of a
>>> protocol action initiated earlier to the calling application code.
>>>
>>> This doesn't disagree with what you said too much, it just re-states it
>>> in
>>> slightly more general terms.
>>>>
>>>> Assuming this assertion to be true, the broad strokes of
>>>> the IDLE implementation is as follows:
>>>>
>>>> -IDLE is engaged: command is sent to server turning on IDLE, schedule
>>>> an IDLE reset in 29 minutes, and an attribute ( _IDLE_Enabled for
>>>> example ) is set to True.
>>>
>>> Basically, yes.  One subtle point, though - since the server might reject
>>> the IDLE command, the client shouldn't assume it has entered the IDLE
>>> state
>>> until it receives a positive acknowledgement of the command from the
>>> server
>>> (eg the "+ idling" line from the RFC).
>>>>
>>>> -In the command dispatcher code: checks if IDLE is enabled or not. If
>>>> enabled, it appends IDLE to all incoming commands and sends a DONE to
>>>> the server before any new commands are sent. (On the incoming commands
>>>> part: what I mean is if the method originally to be called was
>>>> "incoming_exists", instead it would go to "incoming_existsIDLE".)
>>>
>>> This part could probably bear some elaboration.  I think the question
>>> here
>>> is how the unsolicited information should best be made available to the
>>> application code which caused the IDLE to be issued (if I've
>>> misunderstood
>>> what you were getting at here, let me know).  One possibility is the
>>> existing IMailboxListener interface - if you look at the very end of the
>>> implementation of IMAP4Client, you'll find three no- op methods,
>>> modeChanged, flagsChanged, and newMessages.  These are intended for
>>> subclasses to override and are already called by IMAP4Client when
>>> unsolicited information is given by the server in response to a command.
>>>  It
>>> may make sense to direct data provided during an IDLE to these callbacks,
>>> or
>>> others similar to them.
>>>>
>>>> -When IDLE is confirmed off by server: delete scheduled IDLE reset.
>>>
>>> Yep.
>>>>
>>>> And that's basically it I think. Fancy stuff like downloading the
>>>> messages that IDLE notifies you about are handled in the
>>>> ClientFactory, right?
>>>
>>> Perhaps by a factory, or perhaps by something else.  When I've said
>>> "application code" above, this is what I'm talking about - the code that
>>> someone else has written which uses IMAP4Client somehow in order to do
>>> something IMAP4 related.  In our case, offlineimap would be the
>>> application
>>> code. :)  It doesn't make much difference to the IMAP4Client
>>> implementation
>>> who or what is using it, so it could be a ClientFactory or another
>>> protocol
>>> or a GUI or any number of other things.
>>>>
>>>> Some things that I'm not all that clear on and could use your help to
>>>> understand:
>>>> How do you cancel a previously queued/scheduled callback?
>>>
>>> It would probably make sense to be more specific here.  "Callback" might
>>> mean a lot of things.
>>>
>>> Deferreds, the central callback-management API used in Twisted, don't
>>> directly support cancellation (though we consider adding such support
>>> from
>>> time to time).  Generally APIs which want to offer cancellation do it by
>>> some other means separate from the Deferred they return.  For example, a
>>> number of APIs accept a "timeout" parameter which is a form of
>>> cancellation.
>>>  These APIs internally use the timed call features of the Twisted reactor
>>> to
>>> make the operation fail if it does not complete within the given time
>>> frame,
>>> resulting in an "errback" on the Deferred (just a callback for errors).
>>>
>>> Actually canceling an operation depends on what the operation is and how
>>> it's implemented.  For example, the IMAP4 protocol itself offers no
>>> mechanism for canceling a command which the server has already received
>>> and
>>> begun processing, aside from prematurely closing the connection.  So if
>>> you
>>> issue a FETCH, you don't have much of a way to avoid receiving the
>>> results.
>>>
>>> I'm not sure with what aim you bring up cancellation, so I don't think I
>>> can
>>> be any more specific than this now.  Let me know if I didn't actually
>>> answer
>>> your question.
>>>>
>>>> How are multiple connections to the same server handled? And more
>>>> importantly: how do you have a command in one session use a callback
>>>> on another already-open connection?
>>>
>>> This is simpler than it seems.  The answer is mostly what you'd expect if
>>> you asked it about a non-Twisted-based app.  If you want protocol
>>> instance A
>>> to do something to protocol instance B, you make sure A has a reference
>>> to B
>>> and then you have A call a method on B.  There are lots of approaches to
>>> making sure that reference is available, but you don't really need
>>> anything
>>> fancier than an attribute somewhere - using the factory is a common
>>> approach, since  ClientFactory sets itself as the "factory" attribute on
>>> each protocol instance it creates.
>>>
>>> Hope that was helpful,
>>> Jean-Paul
>>
>> Jean-Paul,
>>
>> That was most helpful, thanks!
>>
>> I was rushing out the door, so I forgot some of the questions I wanted
>> to ask, and why some of my explanations were so bereft.
>>
>> To answer your questions:
>>>>
>>>> Some things that I'm not all that clear on and could use your help to
>>>> understand:
>>>> How do you cancel a previously queued/scheduled callback?
>>>
>>> It would probably make sense to be more specific here.  "Callback" might
>>> mean a lot of things.
>>
>> Specifically I'm talking about cancelling the scheduled reset of IDLE
>> here.
>
> Ah.  This is straightforward.  Timed calls are set up with
> reactor.callLater, which returns an instance of DelayedCall. DelayedCall has
> a cancel method.
>>>>
>>>> -In the command dispatcher code: checks if IDLE is enabled or not. If
>>>> enabled, it appends IDLE to all incoming commands and sends a DONE to
>>>> the server before any new commands are sent. (On the incoming commands
>>>> part: what I mean is if the method originally to be called was
>>>> "incoming_exists", instead it would go to "incoming_existsIDLE".)
>>>
>>> This part could probably bear some elaboration.  I think the question
>>> here
>>> is how the unsolicited information should best be made available to the
>>> application code which caused the IDLE to be issued (if I've
>>> misunderstood
>>> what you were getting at here, let me know).  One possibility is the
>>> existing IMailboxListener interface - if you look at the very end of the
>>> implementation of IMAP4Client, you'll find three no- op methods,
>>> modeChanged, flagsChanged, and newMessages.  These are intended for
>>> subclasses to override and are already called by IMAP4Client when
>>> unsolicited information is given by the server in response to a command.
>>>  It
>>> may make sense to direct data provided during an IDLE to these callbacks,
>>> or
>>> others similar to them.
>>
>> Abstraction is not my strong point :(
>>>
>>> From what I read of imaplib2, it looks like that library would start
>>
>> IDLE on all its connections, and as soon as it was notified of a new
>> incoming message, it would disengage IDLE and immediately start
>> downloading that message. I believe this method caused the stalled bug
>> that got finally got imaplib2 canned from OfflineIMAP (imaplib2
>> thought it was no longer in IDLE, when in fact it was so all commands
>> were ignored by the server).
>>
>> My planned use of IDLE is to have one session open and dedicated to
>> running IDLE all the time. Any updates that come in over IDLE are sent
>> to a threadpool which either downloads the new message or deletes the
>> local copy in order to stay in sync with the remote.
>
> This seems basically sound, though you won't need a threadpool if you're
> using Twisted - you can just run the multiple IMAP4Client connections in the
> main thread and they'll cooperate with each other.
>>
>> Personally I think this is the only way to go as it keeps a clear
>> separation of concerns, and keeps everything just that much simpler.
>> But then again, I'm biased. :)
>
> When I was actively following the imap4 protocol list, I remember there
> being a bit of discussion about the best way to use IDLE.  It's been a few
> years though, so I forget all the content of those discussions. ;) What you
> outlined above sounds reasonable to me.
>>
>> Abstraction aside, were I building this just for myself I'd refactor
>> IMAP4Client into three classes: IMAP4ClientBase(basic.LineReceiver,
>> policies.TimeoutMixin), IMAP4ClientIDLE(IMAP4ClientBase), and
>> IMAP4Client(IMAP4Client). Whatever IMAP4Client and IMAP4ClientIDLE can
>> both use is put in IMAP4ClientBase, and the rest stays in IMAP4Client.
>>
>> That, unfortunately, just forces everyone else to build their
>> applications my way though. Like I said, abstraction is not my strong
>> suite. :(
>>
>> As I was typing this all out though, a possible solution occurred to
>> me: What if  IMAP4Client and IMAP4ClientIDLE were mixins instead? ex-
>> class myIDLEClient(IMAP4ClientIDLE, IMAP4ClientBase)
>> class myFullIMAPClient(IMAP4ClientIDLE, IMAP4Client, IMAP4ClientBase)
>
> What's the goal of splitting up the implementation this way?  It doesn't
> seem necessary to me, but you've probably thought it through more than I
> have.
>>
>> Thoughts?
>>
>>
>> Some questions I forgot to ask earlier: What exactly is the purpose of
>> a "Factory"? In some examples I've found, I saw methods that could
>> have gone in the Protocol subclass go instead in a ClientFactory. In
>> other examples there was no ClientFactory at all!
>
> The only job a factory *must* perform is the creation of protocol instances
> to use to handle connections.  The reactor sets up the connection, then asks
> the factory for a protocol (via the buildProtocol method) to use with that
> connection.
>
> This applies to both servers and clients, but it's more obviously useful on
> the server-side of things, where you have less control over how many
> connections get created.
>
> It's also common to use factories as a place to store state which needs to
> outlive any particular connection, since it's easy to get to a factory from
> one of the protocols it created (via the protocol's "factory" attribute
> which most factories set).  They can also be used to implement reconnection
> logic, since they're told when connections fail or end
> (clientConnectionFailed, clientConnectionLost).
>>
>> What is the exact purpose of an "Interface"?
>
> Documentation, largely.  If you see a class that "implements(...)" an
> interface, then you know something about what methods that class has and
> what they're generally supposed to do.  In Twisted's IMAP4 code, I think
> this is exclusively how interfaces are used.  In particular, there are a
> number of interfaces which "application code" is expected to implement. The
> interface classes are just a way to communicate to application developers
> what is expected of the objects they supply to certain of the
> Twisted-supplied APIs.
>
> There are other uses - mainly adaption - but unless you're particularly
> curious about that I won't go into it.
>>
>> Somewhat relatedly: As far as I can tell, the main "types" or "parts"
>> of a Twisted application are: reactor, Protocol, Interface (auth?),
>> and ClientFactory. Are there any I'm missing? And how exactly do all
>> these parts fit together?
>
> Transports are another big concept.  A protocol instance gets a transport
> when it is associated with a socket.  There are a few kinds of transports.
>  TCP, UDP, SSL, etc.  You'll probably mostly encounter the
> connection-oriented transports when working with IMAP4 - TCP and SSL.
> Transports basically just take the place a socket would if you were using
> the socket module directly.
>
> The way things fit together is generally like this...
>
> The reactor is the event muxer/demuxer.  Everything asks the reactor to
> watch certain resources (mainly sockets) and report back when there's
> something interesting to do with them (such as read or write).  The reactor
> is basically a uniform wrapper around select, poll, epoll,
> WaitForMultipleEvents, etc.
>
> Factories build protocols, as I described above.  And protocols handle
> events on transports.
>>
>> Sorry for the base questions, but I've looked all over the
>> documentation and through the Twisted Essentials book and I can't find
>> a clear-cut answer to these questions and its driving me nuts!
>>
>> Last question for the night (I swear!) - what kind of parsing and
>> dispatch system/method/technique did you have in mind to replace what
>> is currently in the library?
>>
>> I looked over the commandDispatch method and I see what you were
>> talking about. But outside of using an actual parsing library like
>> pyparsing (which I think is disallowed in the twisted coding
>> guidlines, right?), or providing a static list of all possible IMAP
>> commands, I'm not sure what the options are.
>
> I don't have a strong opinion about a particular parsing tool yet.  I've
> used pyparsing before and found it to be serviceable if not spectacular. If
> using such a library would make the IMAP4 parsing code better, then I think
> it's a good idea.
>
> It'd be nice if there were something which could be fed the BNF from the RFC
> (probably with a little help, since I doubt the BNF is quite
> descriptive/accurate enough to do the job all by itself).  I wouldn't be
> crushed if some other solution were found, though.
>>
>> Also, I don't follow your concern for backwards compatibility. Correct
>> me if I'm wrong, but method names are based upon the command that
>> triggers them. So even if you use a different method to parse the IMAP
>> command, the method names wouldn't change, right?
>
> Woops, I wrote a big long thing about Deferreds and named callbacks before I
> realized what you were actually asking here.
>
> The backwards compatibility concern with replacing the parser is mainly that
> the current parser produces a somewhat unwieldy result which loses
> information in some cases.  However, since it isn't *always* lossy, existing
> applications may exist which work properly based on these results.
>
> The new parser would hopefully not lose information, but in order to do this
> it probably needs to return results in a different structure.  It's this
> change in structure that I'm concerned about with respect to backwards
> compatibility.
>
> Old code should keep working, but new code should be able to get the
> lossless representation.
>
> One idea for this might be to use the new parser everywhere and have a
> function for converting the good structure into the bad structured. Devil's
> in the details, though.
>
> I'm off on a semi-vacation for the next week.  I'll be checking email a
> little but probably won't be replying as promptly.
>
> #twisted on freenode is also often a good resource, in case you find
> yourself with more questions or wanting to have a real-time conversation
> with someone.
>
> Jean-Paul
>

>>
>> Abstraction aside, were I building this just for myself I'd refactor
>> IMAP4Client into three classes: IMAP4ClientBase(basic.LineReceiver,
>> policies.TimeoutMixin), IMAP4ClientIDLE(IMAP4ClientBase), and
>> IMAP4Client(IMAP4Client). Whatever IMAP4Client and IMAP4ClientIDLE can
>> both use is put in IMAP4ClientBase, and the rest stays in IMAP4Client.
>>
>> That, unfortunately, just forces everyone else to build their
>> applications my way though. Like I said, abstraction is not my strong
>> suite. :(
>>
>> As I was typing this all out though, a possible solution occurred to
>> me: What if  IMAP4Client and IMAP4ClientIDLE were mixins instead? ex-
>> class myIDLEClient(IMAP4ClientIDLE, IMAP4ClientBase)
>> class myFullIMAPClient(IMAP4ClientIDLE, IMAP4Client, IMAP4ClientBase)
>
> What's the goal of splitting up the implementation this way?  It doesn't
> seem necessary to me, but you've probably thought it through more than I
> have.

I have a tendency to over-think things, and I'm still trying to figure
Twisted out, so splitting up the client class may indeed be entirely
superfluous.

I had two goals in splitting up the classes. The first was to follow
the DRY concept and keep code duplication down. The second was
flexibility with an eye towards not spending resources where they are
not required. In short, my reasoning is that there's no point in
having 120-something methods in a class where only a handful of them
are usable. So by splitting the classes up, developers can pick and
choose what they need for their specific implementation, and leave the
rest.

I freely admit that I've only been programming seriously in python for
less than a year, so if my efforts give new meaning to the word
"Redundant", the reason why is that I'd rather play it safe and end up
going overboard than make every rookie mistake in the book.

>> Last question for the night (I swear!) - what kind of parsing and
>> dispatch system/method/technique did you have in mind to replace what
>> is currently in the library?
>>
>> I looked over the commandDispatch method and I see what you were
>> talking about. But outside of using an actual parsing library like
>> pyparsing (which I think is disallowed in the twisted coding
>> guidlines, right?), or providing a static list of all possible IMAP
>> commands, I'm not sure what the options are.
>
> I don't have a strong opinion about a particular parsing tool yet.  I've
> used pyparsing before and found it to be serviceable if not spectacular. If
> using such a library would make the IMAP4 parsing code better, then I think
> it's a good idea.
>
> It'd be nice if there were something which could be fed the BNF from the RFC
> (probably with a little help, since I doubt the BNF is quite
> descriptive/accurate enough to do the job all by itself).  I wouldn't be
> crushed if some other solution were found, though.

I just did a quick google search on the subject and found quite a bit.
To begin with, according to the following website BNF is a grammer
specification language:
http://xahlee.org/cmaci/notation/pattern_matching_vs_pattern_spec.html

Found a bunch of other sites on parsers too:
http://www.garshol.priv.no/download/text/bnf.html
http://nedbatchelder.com/text/python-parsers.html
http://dalkescientific.com/writings/NBN/parsing_with_ply.html
http://eikke.com/text-parsing-formal-grammars-and-bnf-introduction/
http://oreilly.com/pub/a/python/2006/01/26/pyparsing.html
http://navarra.ca/?p=538
http://fdik.org/pyPEG/
http://aspn.activestate.com/ASPN/Mail/Message/python-list/831475

Gotta love google. :)

Wow, this one looks like it might be perfect: http://www.acooke.org/lepl/

I'll see if I can find a library that can use BNF as a rule template
for parsing.

Thank you for explaining those base concepts for me, I appreciate it!

-- 
Dominic LoBue