My WIP on Unicode

Wed Feb 11 17:02:19 GMT 2015

On Tue, Feb 10, 2015 at 01:17:36PM +0100, Nicolas Sebrecht wrote:
>   Hello,
> 
> Supporting Unicode is much harder than what it seems. The main reasons
> are:
> - Python 2.X sucks with Unicode;
> - our codebase has too much enterlaced variables over objects and
>   modules;
> - each library (if not each module of libraries) handles Unicode
>   differently.

Here is my POV:

Support Unicode is a good goal but I believe it doesn't worth the
trouble with the current code base. Prior to that, we should clean the
latter in more than one aspect. Here are some that coming to my minds:

- better names for variables, objects, etc.

- improve comments: most of the current comments assume a very good
  knowledge of the internals. That sucks because I guess nobody is
  anymore aware of ALL of them. Time when this was a one guy made
  project has long passed.

- better policy on objects:
  - turn ALL attributes private and use accessors. I know this is not
    "pythonic" but such pythonic thing turn the code into intricated
    code.
  - turn ALL methods not intended to be used ouside, private.

- revamp the factorization: it's not unusual to find "factorized" code
  for bad reasons: because it made the code /look/ nicer, but the
  factorized function/methods is actually called from ONE place. While it
  might locally help, such practice globally defeat the purpose because
  we lose the view of what is true factorized code and what is not.

- namespace the factorized code: if a method require a local function,
  DON'T USE yet another method. Use a local namespaced function.
  E.g.:

  class BLah(object):
      def _internal_method(self, arg):
          def local_factorized(local_arg):
              # local_factorized's code
          # _internal_method's code.

  Python allows local namespaced functions for good reasons.

- Better inheritance policy: take the sample of the
  folder/LocalStatus(SQlite) and folder/Base stuffs. It's nearly
  IMPOSSIBLE to know and understand what parent method is used by what
  child, for what purpose, etc. So, instead of (re)defining methods in the
  wild, keep the well common NON-redefined stuff into the parent and
  define the required methods in the childs. I think we really don't want
  anything like:

    def method(self):
        raise NotImplemented

  I know this is common practice but think about that again: how a
  parent object should know all the expected methods/accessors of all the
  possible kind of childs?
  Inheritance is about factorizing, certainly NOT about defining the
  _interface_ of the childs.

- Introduce as many as intermediate inherited objects as required.
  Keeping linear inheritance is good because Python sucks at playing
  with multiple parents and it keeps things simple. But a parent should
  have ALL its methods used in ALL the childs. If not, it's a good
  sign that a new intermediate object should be introduced in the
  inheritance line.

- Don't blindly inherit from library objects. We do want well defined
  interfaces. For example, we do too much things like
  imapobj.methodcall() while the imapobj is far inherited from imaplib2.
  We have NO clue about what we currently use from the library.
  Having a dump wrappper for each call should be made mandatory for
  objects inherited from a library. Using composed objects should be
  seriously considered in this case, instead of using inheritance.

- Use factories. Current objects do too much initialization stuff
  varying with the context it is used. Move things like that into
  factories and keep the objects definitions clean.

- Make it clear when we expect a composite object and what we expect
  exactly.
  Even the more obvious composed objects are badly defined. For example,
  the "conf" instances are spread accross a lot of objects. Did you know
  that such composed objects are sometimes restricted to the section the
  object works on, and most of the time it's not restricted at all?
  How many time it requires to find and understand on what we are
  currently working?

- seriously improve our debugging/hacking sessions (AGAIN): until now,
  we have limited the improvements to allow better/full stack traces.
  While this was actually required, we now hit some limitations of the
  whole exception-based paradigm. For example, it's very HARD to follow an
  instance during its life time. I have a good overview of what we could
  do in this area, so don't matter much about that if you don't get the
  point or what could be done.

This is not an exhaustive list of what I think we should do. This is
here to open debates about what we want. I consider to introduce a TODO
list along with well defined policies we should use in OfflineIMAP.

-- 
Nicolas Sebrecht