IDLE again

Fri Dec 4 17:26:13 GMT 2009

On 08:26 am, dom.lobue at gmail.com wrote:
>On Thu, Dec 3, 2009 at 10:32 PM,  <exarkun at twistedmatrix.com> wrote:
>>On 05:45 am, dom.lobue at gmail.com wrote:
>>>
>>>Jean-Paul,
>>>
>>>I read over the IMAP IDLE RFC and went through the IMAP4 twisted
>>>library and I've sketched out a rough outline of how to implement
>>>IDLE. I've run into some things in Twisted however that I don't 
>>>really
>>>understand well, and I'm hoping you can point me in the correct
>>>direction.
>>
>>Cool.  That was quick. :)  Before I get into things, since this thread 
>>is
>>likely to go into a lot of Twisted-specific details which may not be
>>generally interesting, if there's anyone who'd like off the cc list, 
>>please
>>speak up. :)
>>>
>>>First, just to verify my understanding: the IMAP4Client class is a
>>>Protocol class.
>>
>>Yep.  And to expand on that, instances of Protocol classes typically 
>>have a
>>one-to-one relationship with a connection.
>>>
>>>All IMAP commands are represented by at least two
>>>methods in the IMAP4Client class - one for what to do when the 
>>>command
>>>is received from the server, and one for when the command is sent to
>>>the server.
>>
>>Generally, though there are some exceptions.  For example, 
>>AUTHENTICATE is
>>implemented with one method that starts by possibly sending a 
>>CAPABILITY
>>(IMAP4Client.authenticate) command, then another method which will 
>>actually
>>send AUTHENTICATE (IMAP4Client.__cbAuthenticate), then two more 
>>methods for
>>dealing with the response to the AUTHENTICATE 
>>(IMAP4Client.__cbContinueAuth
>>and IMAP4Client.__cbAuthTLS).
>>
>>Another way to look at it is like this.  For each supported protocol 
>>action,
>>there is at least one public method on IMAP4Client to initiate this 
>>action
>>by sending some bytes to the server.  All bytes received by 
>>IMAP4Client from
>>the server are parsed according to the state the client is in, what 
>>commands
>>are outstanding, etc.  Depending on the state and the bytes, callbacks 
>>might
>>be invoked as a result of this, possibly delivering the results of a
>>protocol action initiated earlier to the calling application code.
>>
>>This doesn't disagree with what you said too much, it just re-states 
>>it in
>>slightly more general terms.
>>>
>>>Assuming this assertion to be true, the broad strokes of
>>>the IDLE implementation is as follows:
>>>
>>>-IDLE is engaged: command is sent to server turning on IDLE, schedule
>>>an IDLE reset in 29 minutes, and an attribute ( _IDLE_Enabled for
>>>example ) is set to True.
>>
>>Basically, yes.  One subtle point, though - since the server might 
>>reject
>>the IDLE command, the client shouldn't assume it has entered the IDLE 
>>state
>>until it receives a positive acknowledgement of the command from the 
>>server
>>(eg the "+ idling" line from the RFC).
>>>
>>>-In the command dispatcher code: checks if IDLE is enabled or not. If
>>>enabled, it appends IDLE to all incoming commands and sends a DONE to
>>>the server before any new commands are sent. (On the incoming 
>>>commands
>>>part: what I mean is if the method originally to be called was
>>>"incoming_exists", instead it would go to "incoming_existsIDLE".)
>>
>>This part could probably bear some elaboration.  I think the question 
>>here
>>is how the unsolicited information should best be made available to 
>>the
>>application code which caused the IDLE to be issued (if I've 
>>misunderstood
>>what you were getting at here, let me know).  One possibility is the
>>existing IMailboxListener interface - if you look at the very end of 
>>the
>>implementation of IMAP4Client, you'll find three no- op methods,
>>modeChanged, flagsChanged, and newMessages.  These are intended for
>>subclasses to override and are already called by IMAP4Client when
>>unsolicited information is given by the server in response to a 
>>command.  It
>>may make sense to direct data provided during an IDLE to these 
>>callbacks, or
>>others similar to them.
>>>
>>>-When IDLE is confirmed off by server: delete scheduled IDLE reset.
>>
>>Yep.
>>>
>>>And that's basically it I think. Fancy stuff like downloading the
>>>messages that IDLE notifies you about are handled in the
>>>ClientFactory, right?
>>
>>Perhaps by a factory, or perhaps by something else.  When I've said
>>"application code" above, this is what I'm talking about - the code 
>>that
>>someone else has written which uses IMAP4Client somehow in order to do
>>something IMAP4 related.  In our case, offlineimap would be the 
>>application
>>code. :)  It doesn't make much difference to the IMAP4Client 
>>implementation
>>who or what is using it, so it could be a ClientFactory or another 
>>protocol
>>or a GUI or any number of other things.
>>>
>>>Some things that I'm not all that clear on and could use your help to
>>>understand:
>>>How do you cancel a previously queued/scheduled callback?
>>
>>It would probably make sense to be more specific here.  "Callback" 
>>might
>>mean a lot of things.
>>
>>Deferreds, the central callback-management API used in Twisted, don't
>>directly support cancellation (though we consider adding such support 
>>from
>>time to time).  Generally APIs which want to offer cancellation do it 
>>by
>>some other means separate from the Deferred they return.  For example, 
>>a
>>number of APIs accept a "timeout" parameter which is a form of 
>>cancellation.
>> These APIs internally use the timed call features of the Twisted 
>>reactor to
>>make the operation fail if it does not complete within the given time 
>>frame,
>>resulting in an "errback" on the Deferred (just a callback for 
>>errors).
>>
>>Actually canceling an operation depends on what the operation is and 
>>how
>>it's implemented.  For example, the IMAP4 protocol itself offers no
>>mechanism for canceling a command which the server has already 
>>received and
>>begun processing, aside from prematurely closing the connection.  So 
>>if you
>>issue a FETCH, you don't have much of a way to avoid receiving the 
>>results.
>>
>>I'm not sure with what aim you bring up cancellation, so I don't think 
>>I can
>>be any more specific than this now.  Let me know if I didn't actually 
>>answer
>>your question.
>>>
>>>How are multiple connections to the same server handled? And more
>>>importantly: how do you have a command in one session use a callback
>>>on another already-open connection?
>>
>>This is simpler than it seems.  The answer is mostly what you'd expect 
>>if
>>you asked it about a non-Twisted-based app.  If you want protocol 
>>instance A
>>to do something to protocol instance B, you make sure A has a 
>>reference to B
>>and then you have A call a method on B.  There are lots of approaches 
>>to
>>making sure that reference is available, but you don't really need 
>>anything
>>fancier than an attribute somewhere - using the factory is a common
>>approach, since  ClientFactory sets itself as the "factory" attribute 
>>on
>>each protocol instance it creates.
>>
>>Hope that was helpful,
>>Jean-Paul
>
>Jean-Paul,
>
>That was most helpful, thanks!
>
>I was rushing out the door, so I forgot some of the questions I wanted
>to ask, and why some of my explanations were so bereft.
>
>To answer your questions:
>>>
>>>Some things that I'm not all that clear on and could use your help to
>>>understand:
>>>How do you cancel a previously queued/scheduled callback?
>>
>>It would probably make sense to be more specific here.  "Callback" 
>>might
>>mean a lot of things.
>
>Specifically I'm talking about cancelling the scheduled reset of IDLE 
>here.

Ah.  This is straightforward.  Timed calls are set up with 
reactor.callLater, which returns an instance of DelayedCall. 
DelayedCall has a cancel method.
>>>
>>>-In the command dispatcher code: checks if IDLE is enabled or not. If
>>>enabled, it appends IDLE to all incoming commands and sends a DONE to
>>>the server before any new commands are sent. (On the incoming 
>>>commands
>>>part: what I mean is if the method originally to be called was
>>>"incoming_exists", instead it would go to "incoming_existsIDLE".)
>>
>>This part could probably bear some elaboration.  I think the question 
>>here
>>is how the unsolicited information should best be made available to 
>>the
>>application code which caused the IDLE to be issued (if I've 
>>misunderstood
>>what you were getting at here, let me know).  One possibility is the
>>existing IMailboxListener interface - if you look at the very end of 
>>the
>>implementation of IMAP4Client, you'll find three no- op methods,
>>modeChanged, flagsChanged, and newMessages.  These are intended for
>>subclasses to override and are already called by IMAP4Client when
>>unsolicited information is given by the server in response to a 
>>command.  It
>>may make sense to direct data provided during an IDLE to these 
>>callbacks, or
>>others similar to them.
>
>Abstraction is not my strong point :(
>> From what I read of imaplib2, it looks like that library would start
>IDLE on all its connections, and as soon as it was notified of a new
>incoming message, it would disengage IDLE and immediately start
>downloading that message. I believe this method caused the stalled bug
>that got finally got imaplib2 canned from OfflineIMAP (imaplib2
>thought it was no longer in IDLE, when in fact it was so all commands
>were ignored by the server).
>
>My planned use of IDLE is to have one session open and dedicated to
>running IDLE all the time. Any updates that come in over IDLE are sent
>to a threadpool which either downloads the new message or deletes the
>local copy in order to stay in sync with the remote.

This seems basically sound, though you won't need a threadpool if you're 
using Twisted - you can just run the multiple IMAP4Client connections in 
the main thread and they'll cooperate with each other.
>Personally I think this is the only way to go as it keeps a clear
>separation of concerns, and keeps everything just that much simpler.
>But then again, I'm biased. :)

When I was actively following the imap4 protocol list, I remember there 
being a bit of discussion about the best way to use IDLE.  It's been a 
few years though, so I forget all the content of those discussions. ;) 
What you outlined above sounds reasonable to me.
>Abstraction aside, were I building this just for myself I'd refactor
>IMAP4Client into three classes: IMAP4ClientBase(basic.LineReceiver,
>policies.TimeoutMixin), IMAP4ClientIDLE(IMAP4ClientBase), and
>IMAP4Client(IMAP4Client). Whatever IMAP4Client and IMAP4ClientIDLE can
>both use is put in IMAP4ClientBase, and the rest stays in IMAP4Client.
>
>That, unfortunately, just forces everyone else to build their
>applications my way though. Like I said, abstraction is not my strong
>suite. :(
>
>As I was typing this all out though, a possible solution occurred to
>me: What if  IMAP4Client and IMAP4ClientIDLE were mixins instead? ex-
>class myIDLEClient(IMAP4ClientIDLE, IMAP4ClientBase)
>class myFullIMAPClient(IMAP4ClientIDLE, IMAP4Client, IMAP4ClientBase)

What's the goal of splitting up the implementation this way?  It doesn't 
seem necessary to me, but you've probably thought it through more than I 
have.
>Thoughts?
>
>
>Some questions I forgot to ask earlier: What exactly is the purpose of
>a "Factory"? In some examples I've found, I saw methods that could
>have gone in the Protocol subclass go instead in a ClientFactory. In
>other examples there was no ClientFactory at all!

The only job a factory *must* perform is the creation of protocol 
instances to use to handle connections.  The reactor sets up the 
connection, then asks the factory for a protocol (via the buildProtocol 
method) to use with that connection.

This applies to both servers and clients, but it's more obviously useful 
on the server-side of things, where you have less control over how many 
connections get created.

It's also common to use factories as a place to store state which needs 
to outlive any particular connection, since it's easy to get to a 
factory from one of the protocols it created (via the protocol's 
"factory" attribute which most factories set).  They can also be used to 
implement reconnection logic, since they're told when connections fail 
or end (clientConnectionFailed, clientConnectionLost).
>What is the exact purpose of an "Interface"?

Documentation, largely.  If you see a class that "implements(...)" an 
interface, then you know something about what methods that class has and 
what they're generally supposed to do.  In Twisted's IMAP4 code, I think 
this is exclusively how interfaces are used.  In particular, there are a 
number of interfaces which "application code" is expected to implement. 
The interface classes are just a way to communicate to application 
developers what is expected of the objects they supply to certain of the 
Twisted-supplied APIs.

There are other uses - mainly adaption - but unless you're particularly 
curious about that I won't go into it.
>Somewhat relatedly: As far as I can tell, the main "types" or "parts"
>of a Twisted application are: reactor, Protocol, Interface (auth?),
>and ClientFactory. Are there any I'm missing? And how exactly do all
>these parts fit together?

Transports are another big concept.  A protocol instance gets a 
transport when it is associated with a socket.  There are a few kinds of 
transports.  TCP, UDP, SSL, etc.  You'll probably mostly encounter the 
connection-oriented transports when working with IMAP4 - TCP and SSL. 
Transports basically just take the place a socket would if you were 
using the socket module directly.

The way things fit together is generally like this...

The reactor is the event muxer/demuxer.  Everything asks the reactor to 
watch certain resources (mainly sockets) and report back when there's 
something interesting to do with them (such as read or write).  The 
reactor is basically a uniform wrapper around select, poll, epoll, 
WaitForMultipleEvents, etc.

Factories build protocols, as I described above.  And protocols handle 
events on transports.
>Sorry for the base questions, but I've looked all over the
>documentation and through the Twisted Essentials book and I can't find
>a clear-cut answer to these questions and its driving me nuts!
>
>Last question for the night (I swear!) - what kind of parsing and
>dispatch system/method/technique did you have in mind to replace what
>is currently in the library?
>
>I looked over the commandDispatch method and I see what you were
>talking about. But outside of using an actual parsing library like
>pyparsing (which I think is disallowed in the twisted coding
>guidlines, right?), or providing a static list of all possible IMAP
>commands, I'm not sure what the options are.

I don't have a strong opinion about a particular parsing tool yet.  I've 
used pyparsing before and found it to be serviceable if not spectacular. 
If using such a library would make the IMAP4 parsing code better, then I 
think it's a good idea.

It'd be nice if there were something which could be fed the BNF from the 
RFC (probably with a little help, since I doubt the BNF is quite 
descriptive/accurate enough to do the job all by itself).  I wouldn't be 
crushed if some other solution were found, though.
>Also, I don't follow your concern for backwards compatibility. Correct
>me if I'm wrong, but method names are based upon the command that
>triggers them. So even if you use a different method to parse the IMAP
>command, the method names wouldn't change, right?

Woops, I wrote a big long thing about Deferreds and named callbacks 
before I realized what you were actually asking here.

The backwards compatibility concern with replacing the parser is mainly 
that the current parser produces a somewhat unwieldy result which loses 
information in some cases.  However, since it isn't *always* lossy, 
existing applications may exist which work properly based on these 
results.

The new parser would hopefully not lose information, but in order to do 
this it probably needs to return results in a different structure.  It's 
this change in structure that I'm concerned about with respect to 
backwards compatibility.

Old code should keep working, but new code should be able to get the 
lossless representation.

One idea for this might be to use the new parser everywhere and have a 
function for converting the good structure into the bad structured. 
Devil's in the details, though.

I'm off on a semi-vacation for the next week.  I'll be checking email a 
little but probably won't be replying as promptly.

#twisted on freenode is also often a good resource, in case you find 
yourself with more questions or wanting to have a real-time conversation 
with someone.

Jean-Paul