[pymvpa] libsvm dense arrays

Mon Sep 29 16:56:42 UTC 2008

Apologies if this doesn't end up in the right thread, there was a minor problem with my subscription settings.

I played with the Shogun backend circa version 0.2.0.  If i remember correctly, this was when it was spitting out those nasty debug messages so I stopped using it pretty quickly.

My understanding of the pymvpa source is that it uses libsvm by default if it is installed, so i stuck with that under the assumption that either:

1) There was an explicit reason you guys chose libsvm as the default backend over Shogun, which I believed likely since libsvm is so popular but I had not heard of Shogun before, or,
2) Shogun actually uses libsvm internally so there was no reason not to use it directly.  To this point, I am unclear if Shogun simply uses the system installation of libsvm or if it has it's implementation or even its own wrapper

Perhaps now is the time to clarify my assumptions - is libsvm not the better default choice?  If Shogun indeed uses a dense array implementation of libsvm, is there any reason not to set it as the default backend?

I suppose I could also figure this out from the source, but while I have your attention, what's the best way (other than uninstalling libsvm) to transparently switch to the Shogun backend?  ie so I can still use the classes LinearCSVMC etc without explicitly calling sg.SVM etc

Thanks again,
Scott

Hi Scott,

>/ Been using pymvpa for a few months now and really enjoying it - thanks 
/>/ for all the good work!
/that is really great to hear, since so far we have just limited idea on
the amount and diversity of pymvpa users. So thanks for posting to the
list

>/ I was curious if it would be possible/useful to switch out the libsvm 
/>/ backend to the dense version mentioned here: 
/>/ http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/#1. <http://www.csie.ntu.edu.tw/%7Ecjlin/libsvmtools/#1.>  They claim 
/>/ approximately 40% increase in speed (though for much larger data sets 
/>/ than normal for mri), and since all fmri data is dense, it seems a 
/100% correct... especially taking into account that now considerate
amount of runtime is spent on just populating those 'sparse'
datastructures for libsvm, whenever they are not sparse at all.

>/ natural move to me.  Has anyone experimented with this?  Or is a similar 
/>/ improvement already included in the pymvpa libsvm wrapper?  Or do you 
/>/ think the performance increase would be negligible?
/I think it would be considerable.

Have you though tried libsvm implementation which comes within shogun?
it should be more efficient (don't know from top of the head which
implementation is used there -- sparse or dense though)

But I am not sure if it is worth to rely on another part of libsvm if
shogun provides significant speed-up instead of libsvm... have you tried
SG's SVMs?