[pymvpa] _tent/parameters now is in yoh/master

Yaroslav Halchenko debian at onerussian.com
Fri May 16 17:42:01 UTC 2008


Dear All,

Having gained ACK from Michael and getting bored with merges and cherry
picking I simply merged _tent/parameters into yoh/master.

It is quite an evil change -- ie it changes a lot and hopefully brings
useful pieces. Quick summary of what I remember was gained in that
branch

* Class Parameter mvpa.misc.param is intended to store parameters of a
  classifier, and be automagically picked up into .params (or
  .kernel_params) collection for the given class. So it in line with how
  StateVariables were collected together into .states.

  TODO: work out the same for Dataset, so we get those samples, labels,
  chunks to be a part of a collection (what would be nice name? we have
  now dsattr internall but that is not a good one imho,
  samples_attributes is too long imho), then we could easily
  add/remove new samples attributes at run-time

* we got new clfs.warehouse. It provides instances of classifiers based
  on their _clf_internals, so you can easily get all linear svms not
  from shogun with
  clfs['linear', 'svm', '!sg']

  and unittests use now that warehouse instead of duplicate under
  tests/tests_warehouse_clfs

  TODO: I think warehouse itself should become iteratable and on
  __getattrib__ it should return another warehouse with selected subset
  of classifiers. That would make it more proper imho

* shogun and libsvm SVMs got base class _SVM (mvpa.clfs._svmbase) which
  does lots of house-keeping in its __init__. That also unified
  parameters of different SVMs, unified and minimized __repr__ for those
  (now only parameters with non-default values are printed)

 TODO:
  imho one of the next steps is to either remove helper classes like
  RbfCSVMC, or just create them automagically. Why? because it is
  somewhat non-orthogonal and polluting imho. We have nu and C SVMs, we
  have at least 4 types of kernel, and in shogun there is around 5
  implementations of SVM. So if we code all those params in class name
  we end up with
  CSVMLibSVM, CSVMSVMLight, or smth like that...

  now I need to work out nice __doc__ creation for those SVMs and unify
  naming and few left-out arguments a bit


* Initial retraining/retesting is done for shogun's SVMs. If class is
  announced to be retrainable, on train, if it was previousely trained
  on the same data - it wouldn't recreate the kernel. That would lead to
  great speed up if you just change labels (which is the case with all
  those pertumation tests)

  There are glitches to fix yet and refactor code so it becomes more
  readable

Obviously there were some other changes which I can't recall/spot now

Major cons:

  Access to attributes of Statefull classes got slower since it goes
  through __getattribute__, __setattrib__ of Stateful. But it can be
  optimized, and will be some time


So if you have some analysis which you ran using master or
someone/master branch, could you give a try to my yoh/master first
before merging/committing, so we know what things got broken ;-)

-- 
Yaroslav Halchenko
Research Assistant, Psychology Department, Rutgers-Newark
Student  Ph.D. @ CS Dept. NJIT
Office: (973) 353-5440x263 | FWD: 82823 | Fax: (973) 353-1171
        101 Warren Str, Smith Hall, Rm 4-105, Newark NJ 07102
WWW:     http://www.linkedin.com/in/yarik        



More information about the Pkg-ExpPsy-PyMVPA mailing list