[pymvpa] mutliclass SVM weights (sensitivity measures)

Thu Oct 1 22:38:12 UTC 2009

Looks like no one's responded?  odd...

Vadim Axel wrote:
> Hi Michael, Yaroslav et al.,
>
> 1. I ran the site example 
> (http://www.pymvpa.org/measures.html#linear-svm-weights).  When I have 
> three classes to classifiy, there are two vectors of weights and each 
> additional class will add an additional vector of weights (I mean 
> before averaging in sens.py). If the classification is done one vs. 
> rest or alternatively pair-wise, shouldn't I get three vector of 
> weights in my three class classification? Actually, there was the same 
> number of weight vectors in matlab LIBSVM implementaion, so I guess I 
> am missing something conceptual.
>
Before averaging, there should be a different # of weights for 1vs1 and 
1vsAll.  For 1vsAll there should be C weight vectors for C classes; for 
1vs1 there will be 1 weight vector for each set of comparisons.  Since 
C=3=3choose2=3 the number of weight vectors will be identical for 3 
classes in 1v1 and 1vAll, but with 4 classes you should have 6 vectors 
for 1v1.

For extracting per-class weights, the problem seems to be in the coeffs 
- check help for libsvmc._svm.getSVCoef() which looks like it returns SV 
coefficients in an extremely odd manner.  I presume the combining math 
to output total weight from a per-class matrix would yield the same 
output as what is implemented, but you can check that.

I've been playing with individual class weights using the Shogun 
backend, both with 1v1 (libsvm implementation in Shogun) and 1vsAll 
(MCSVM, new in Shogun .8) which both yield 'proper' output.  With the 
Shogun backend at least, another problem is that LinearSVMWeights 
averages the wrong axis out of the box, yielding a size C vector instead 
of one matching nfeatures.  I have fixed this in my own repository by 
adding a new transformer FirstAxisSumOfAbs and a sensitivity analyzer 
LinearMCSVMWeights which is identical to the binary one, but uses that 
transformer instead of the default SecondAxisSumOfAbs.
> 2.  I the site example I used my my dataset (7 voxels ROI, 3 classes). 
> Four of my voxels are noise and three other contain easily separable 
> data. Indeed, the mutliclass classification error rate was 0. When I 
> make three possible two classes classifications the wights totally 
> make sense (close to zero for noise voxels and high values for 
> not-noise). But the weights from mutliclass are not very different for 
> noise and non-noise voxels, which looks strange. Probably the answer 
> to first question will clarify this issue too.
Works out of the box for me...

from mvpa.clfs.libsvmc import SVM
from mvpa.misc.data_generators import normalFeatureDataset
d = normalFeatureDataset(nlabels=3, nfeatures=7, nonbogus_features=[0, 
1, 2], snr=10)
print SVM().getSensitivityAnalyzer()(d)
[ 5.40074713  6.15330183  6.56281274  0.69333497  0.34515556  0.46399045
 1.43135108]

> 3. And now something more theoretical: suppose I am making the 
> classification of 5 gradually changing colors. The absolute BOLD 
> activation level doens't change significantly between classes, but I 
> succesfully classify my colors beyond chance level. Can the weights of 
> the SVM be used as a measure which voxel was more informative between 
> red & orange discrimination and some other voxel for different pairs. 
> Something like taking the highest weight amongst all the weigths of 
> this voxel (similar to Kamitani & Tong 2005). Does it make sense?
Conceptually 1vsAll weights would be better to make a 'tuning curve' a 
la K+T.  I don't know if this matters in practice.  The trickier part is 
deciding why you want that, and what type of statistics you need to show 
that 'this voxel is Red selective'; in fact they skirted that issue 
because they cared more about overall decoding than 'this voxel is 
selective for this orientation', and really only did that to show there 
was no discernible underlying architecture.