[pymvpa] top-n match lists

Thu Jan 7 18:28:29 UTC 2010

Let me ask a clarifying question - Tara, did you mean the top 10 
PREDICTED CLASSES (how many do you have??) or the top 10 CLOSEST SAMPLES?

For CLASSES, as Yaroslav said, this will be dependent on how the 
classification is done (unified ala SMLR/KNN, 1 vs 1 or 1 vs Rest for 
SVM's, etc), so you need to decide exactly how you want to handle it.  
LibSVM based SVM's do weird things with multiclass values, but you can 
use the Meta classifier MulticlassClassifier to easily create a suite of 
1v1 (or 1 vs Rest in more recent development branches) binary machines 
and extract each of their values.   If you use the Shogun backend, there 
are some SVM's which can spit out correct MC values in either 1v1 or 
1vR, depending on your installed version.  Though perhaps you're not 
using SVM's...?

If you meant SAMPLES, then you can simply look at the prediction kernel 
matrix for the rows with the highest values - or use a distance function 
of your own.

-Scott

Yaroslav Halchenko wrote:
> Hi Tara,
>
> Unfortunately ATM we simply do not have such feature unified and
> exposed at all, but it sounds like a nice feature to have!  Probably
> something like 'predictions_ranked'  where for each prediction it would
> not only store the winner but all labels ordered accordingly.
>
> I have filed the request under
> http://github.com/hanke/PyMVPA/issues#issue/8
>
> Meanwhile you can make use of 'values' state variable.  The difficulty
> might be that the content of values is not unified across the
> classifiers, but it is somewhat obvious for most (e.g. for SMLR it would
> be probabilities from logistic regressions per each label in the order
> of your ds.uniquelabels, so you would need just to argsort each one of
> them with [::-1] to get in reverse order ;) ). What classifier do you
> have in mind though?
>
>