[pymvpa] top-n match lists

Thu Jan 7 18:35:28 UTC 2010

Hi Scott,

Sorry, just realised I forgot to include the list in my earlier reply! I did 
mean the closest classes (I'm working with 25-200 classes, depending on the 
dataset) rather than samples.

Thanks for pointing out the prediction matrix though - I did spot this in the 
docs but it didn't seem to quite fit what I was looking for at the time. For 
kNNs though it might be worth another look.

Thanks,
Tara

Scott Gorlin wrote:
> Let me ask a clarifying question - Tara, did you mean the top 10 
> PREDICTED CLASSES (how many do you have??) or the top 10 CLOSEST SAMPLES?
> 
> For CLASSES, as Yaroslav said, this will be dependent on how the 
> classification is done (unified ala SMLR/KNN, 1 vs 1 or 1 vs Rest for 
> SVM's, etc), so you need to decide exactly how you want to handle it.  
> LibSVM based SVM's do weird things with multiclass values, but you can 
> use the Meta classifier MulticlassClassifier to easily create a suite of 
> 1v1 (or 1 vs Rest in more recent development branches) binary machines 
> and extract each of their values.   If you use the Shogun backend, there 
> are some SVM's which can spit out correct MC values in either 1v1 or 
> 1vR, depending on your installed version.  Though perhaps you're not 
> using SVM's...?
> 
> If you meant SAMPLES, then you can simply look at the prediction kernel 
> matrix for the rows with the highest values - or use a distance function 
> of your own.
> 
> -Scott
> 
> Yaroslav Halchenko wrote:
>> Hi Tara,
>>
>> Unfortunately ATM we simply do not have such feature unified and
>> exposed at all, but it sounds like a nice feature to have!  Probably
>> something like 'predictions_ranked'  where for each prediction it would
>> not only store the winner but all labels ordered accordingly.
>>
>> I have filed the request under
>> http://github.com/hanke/PyMVPA/issues#issue/8
>>
>> Meanwhile you can make use of 'values' state variable.  The difficulty
>> might be that the content of values is not unified across the
>> classifiers, but it is somewhat obvious for most (e.g. for SMLR it would
>> be probabilities from logistic regressions per each label in the order
>> of your ds.uniquelabels, so you would need just to argsort each one of
>> them with [::-1] to get in reverse order ;) ). What classifier do you
>> have in mind though?
>>
>>