[pymvpa] tiny suggestion for SOM (10 second change)

Mon Mar 11 19:29:03 UTC 2013

> On 10 March 2013 23:47, Ike Hall <ike.hall at gmail.com> wrote:
>> It would appear that the SimpleSOMMapper class performs the
initialization
>> of it's Kohonen layer in the SimpleSOMMapper._train method.  As the
.train()
>> method calls ._pretrain, ._train, and ._posttrain in order, I submit
that it
>> would be preferable to move the initialization phase to ._pretrain.  With
>> this change, if one wants to subclass the SimpleSOMMapper to use some
other
>> initialization, they only need to override ._pretrain rather than _train.
>
> As far as I can see currently there are three lines of code in the
> initalization phase:
>
> self._K = np.random.standard_normal(
tuple(self.kshape) + (samples.shape[1],))
> unit_deltas = np.zeros(self._K.shape, dtype='float')
> dqd = np.fromfunction(lambda x, y: (x**2 + y**2)**0.5, self.kshape,
> dtype='float')
>
> and I am not convinced that it would be sensible to move the last two
> to a separate _pretrain (the variables would have to be stored in self
> and it all gets a bit messy). That leaves a single line that could
> move to _pretrain - is that really worth it?
>
> Or do you have other suggestions on what to move to _pretrain?
>

...forgot to send to the list:
Currently I have just moved the first line to _pretrain as I would like to
subclass this class to use an alternative initialization of just self._K,
and don't want to override the entirety of _train.  As I see it, the second
line should remain in _train, and the third could go either way. If I were
designing the class, personally I would make the choice to store dqd in
self, and define it at __init__ level requiring an extra parameter for the
distance function that defaults to the euclidian distance.  (I have seen
some implementations that use the manhattan distance rather than euclidian
distance).  In my opinion, it's a judgement call whether you think the
distance formula on self._K should be a property of the class, or a
property of the training method.

In summary:  Moving the first line to _pretrain is something I would
strongly advocate.
moving dqd to be stored in self at initialization is something I would
advocate for, but far less strongly, as someone wanting to use a different
distance function would (likely?) also want to override the
_compute_influence_kernel method, and at that point they are not far from
overriding all of _train.

Cheers

On Sun, Mar 10, 2013 at 5:47 PM, Ike Hall <ike.hall at gmail.com> wrote:

> It would appear that the SimpleSOMMapper class performs the initialization
> of it's Kohonen layer in the SimpleSOMMapper._train method.  As the
> .train() method calls ._pretrain, ._train, and ._posttrain in order, I
> submit that it would be preferable to move the initialization phase to
> ._pretrain.  With this change, if one wants to subclass the SimpleSOMMapper
> to use some other initialization, they only need to override ._pretrain
> rather than _train.
>
> --Ike Hall
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20130311/eaae694d/attachment.html>