My WIP on Unicode
Nicolas Sebrecht
nicolas.s-dev at laposte.net
Wed Feb 11 17:02:19 GMT 2015
On Tue, Feb 10, 2015 at 01:17:36PM +0100, Nicolas Sebrecht wrote:
> Hello,
>
> Supporting Unicode is much harder than what it seems. The main reasons
> are:
> - Python 2.X sucks with Unicode;
> - our codebase has too much enterlaced variables over objects and
> modules;
> - each library (if not each module of libraries) handles Unicode
> differently.
Here is my POV:
Support Unicode is a good goal but I believe it doesn't worth the
trouble with the current code base. Prior to that, we should clean the
latter in more than one aspect. Here are some that coming to my minds:
- better names for variables, objects, etc.
- improve comments: most of the current comments assume a very good
knowledge of the internals. That sucks because I guess nobody is
anymore aware of ALL of them. Time when this was a one guy made
project has long passed.
- better policy on objects:
- turn ALL attributes private and use accessors. I know this is not
"pythonic" but such pythonic thing turn the code into intricated
code.
- turn ALL methods not intended to be used ouside, private.
- revamp the factorization: it's not unusual to find "factorized" code
for bad reasons: because it made the code /look/ nicer, but the
factorized function/methods is actually called from ONE place. While it
might locally help, such practice globally defeat the purpose because
we lose the view of what is true factorized code and what is not.
- namespace the factorized code: if a method require a local function,
DON'T USE yet another method. Use a local namespaced function.
E.g.:
class BLah(object):
def _internal_method(self, arg):
def local_factorized(local_arg):
# local_factorized's code
# _internal_method's code.
Python allows local namespaced functions for good reasons.
- Better inheritance policy: take the sample of the
folder/LocalStatus(SQlite) and folder/Base stuffs. It's nearly
IMPOSSIBLE to know and understand what parent method is used by what
child, for what purpose, etc. So, instead of (re)defining methods in the
wild, keep the well common NON-redefined stuff into the parent and
define the required methods in the childs. I think we really don't want
anything like:
def method(self):
raise NotImplemented
I know this is common practice but think about that again: how a
parent object should know all the expected methods/accessors of all the
possible kind of childs?
Inheritance is about factorizing, certainly NOT about defining the
_interface_ of the childs.
- Introduce as many as intermediate inherited objects as required.
Keeping linear inheritance is good because Python sucks at playing
with multiple parents and it keeps things simple. But a parent should
have ALL its methods used in ALL the childs. If not, it's a good
sign that a new intermediate object should be introduced in the
inheritance line.
- Don't blindly inherit from library objects. We do want well defined
interfaces. For example, we do too much things like
imapobj.methodcall() while the imapobj is far inherited from imaplib2.
We have NO clue about what we currently use from the library.
Having a dump wrappper for each call should be made mandatory for
objects inherited from a library. Using composed objects should be
seriously considered in this case, instead of using inheritance.
- Use factories. Current objects do too much initialization stuff
varying with the context it is used. Move things like that into
factories and keep the objects definitions clean.
- Make it clear when we expect a composite object and what we expect
exactly.
Even the more obvious composed objects are badly defined. For example,
the "conf" instances are spread accross a lot of objects. Did you know
that such composed objects are sometimes restricted to the section the
object works on, and most of the time it's not restricted at all?
How many time it requires to find and understand on what we are
currently working?
- seriously improve our debugging/hacking sessions (AGAIN): until now,
we have limited the improvements to allow better/full stack traces.
While this was actually required, we now hit some limitations of the
whole exception-based paradigm. For example, it's very HARD to follow an
instance during its life time. I have a good overview of what we could
do in this area, so don't matter much about that if you don't get the
point or what could be done.
This is not an exhaustive list of what I think we should do. This is
here to open debates about what we want. I consider to introduce a TODO
list along with well defined policies we should use in OfflineIMAP.
--
Nicolas Sebrecht
More information about the OfflineIMAP-project
mailing list