[pymvpa] clf generalization over a different set of classes

Sun Aug 16 22:58:47 UTC 2009

Hi,

sorry for the long delay, my days are a little bit crazy at the moment :)

> first the short answer:
>> This shows an accuracy of about 40 % ... however changing the
>> assignment of the faked labels to:
>>> ...<
>> leads to an accuracy of about 60 %. I am not sure if this makes
>> sense, or if I missed something here. For making a statement like
> to say more, those should add up to a 100% ;) because whatever was
> classified 'correctly' in the first mapping became misclassification in
> the 2nd, and vise versa. So, mean of those will be 50% all the time ;)

Yes, of course you are absolutely right here ... in general. But I think
this relies on the assumption, that the classifier could see 'all'
examples of class 1 and 2 during training and 4 and 5 during prediction.
however in my case (data driven question with unequal number of samples
between class 1 & 2 on the one hand and class 4 & 5 on the other hand)
this is not the case. thats why I do the following "spooky" selection to
ensure equal number of samples:

# no of samples in each ds
s_ds1 = len(ds1.samples)
s_ds2 = len(ds2.samples)

# check for equal no. of samples
if s_ds1 < s_ds2:
   ds2 = ds2.getRandomSamples(s_ds1/2)
elif s_ds1 > s_ds2:
   ds1 = ds1.getRandomSamples(s_ds2/2)
elif s_ds1 == s_ds2:
   print ' > equal no. of samples'

this involves a random selection of some samples and as some samples
might have more signal than others, the generalization depends on the
sample selection. Thats why I came up with the idea to do this several
times and calculate the average... or even better to run a monte carlo
simulation which is so comfortable with pymvpa.

 > Now  a bit of Pythonization
>> for l in xrange(tmp_labels.shape[0]):
>>    if tmp_labels[l] == 4.0:
>>        new_labels[l]=2.0
>>    elif tmp_labels[l] == 5.0:
>>        new_labels[l]=1.0
> I guess smth like
> new_labels[tmp_labels == 4.0] = 2.0
> new_labels[tmp_labels == 5.0] = 1.0
> or even
> new_labels = old_labels - 3
> in case of mapping 4,5 -> 1,2
> should suffice

yeah, since I am quite new to python, those comments are warmly welcome.

>> terr.states.enable('confusion')
>> terr(ds1, ds1)
> I guess you are trying to estimate training_confusion which is available
> as the state of the classifier
>
>> error = terr(ds1, ds2)   print terr.confusion
> btw -- terr call is taking first testing dataset and then optional
> training (since it is not always necessary):
>
> *In [4]:TransferError.__call__?
> Type:		instancemethod
> Base Class:	<type 'instancemethod'>
> String Form:	<unbound method TransferError.__call__>
> Namespace:	Interactive
> File:		/usr/lib/pymodules/python2.5/mvpa/clfs/transerror.py
> Definition:	TransferError.__call__(self, testdataset,
trainingdataset=None)
>
> So, I guess you might like to actually use terr(ds2, ds1)? (of cause I
could be
> wrong since it depends on what you are trying to do actually ;))

Right guess here! What I want to do is "terr(ds2, ds1)". So I was not
aware of the fact, that TransferError takes testing dataset first.
Thanks a lot for this comment ... and also for encouraging me to use
ipython in one of your previous posts... this is really very helpful,
especially in combination with matplotlob and scipy.

> btw, for imho clean  analysis and not that clean (if anywhere
> significant/valuable) results of transfer from one data to another, see
> recent
>
> http://www.sciencemag.org/cgi/content/abstract/1171599
>
> Published Online May 7, 2009
> Science DOI: 10.1126/science.1171599

hehe .. cool idea, but they have "small" (chance-like) accuracies... and
this in science :-)

An other example of generalization between different classes is Seymour
et al. (2009) The Coding of Color, Motion, and Their Conjunction in the
Human Visual Cortex

They trained the SVM on colored, moving dots and used moving dots with
different colors, motion directions to investigate if some visual areas
are showing a bias towards one of the two visual properties.

doi:10.1016/j.cub.2008.12.050

Thanks again!
Matthias