[pymvpa] top-n match lists

Thu Jan 7 19:21:38 UTC 2010

With 200 classes, if you go with SVM's, you probably want 1-vs-R (200 vs 
20000 decision functions!) and some form of kernel caching - definitely 
check out the development branches.  With 1vR you can then just get all 
the values and rank them directly, no need to worry about ties a la 1v1 
voting.

If you're interested in 1vs1 you can do some interesting ranking based 
on the values of each binary decision (rather than tallying votes) but 
this is not straightforward.

Tara Gilliam wrote:
> Hi Scott,
>
> Sorry, just realised I forgot to include the list in my earlier reply! 
> I did mean the closest classes (I'm working with 25-200 classes, 
> depending on the dataset) rather than samples.
>
> Thanks for pointing out the prediction matrix though - I did spot this 
> in the docs but it didn't seem to quite fit what I was looking for at 
> the time. For kNNs though it might be worth another look.
>
> Thanks,
> Tara
>
> Scott Gorlin wrote:
>> Let me ask a clarifying question - Tara, did you mean the top 10 
>> PREDICTED CLASSES (how many do you have??) or the top 10 CLOSEST 
>> SAMPLES?
>>
>> For CLASSES, as Yaroslav said, this will be dependent on how the 
>> classification is done (unified ala SMLR/KNN, 1 vs 1 or 1 vs Rest for 
>> SVM's, etc), so you need to decide exactly how you want to handle 
>> it.  LibSVM based SVM's do weird things with multiclass values, but 
>> you can use the Meta classifier MulticlassClassifier to easily create 
>> a suite of 1v1 (or 1 vs Rest in more recent development branches) 
>> binary machines and extract each of their values.   If you use the 
>> Shogun backend, there are some SVM's which can spit out correct MC 
>> values in either 1v1 or 1vR, depending on your installed version.  
>> Though perhaps you're not using SVM's...?
>>
>> If you meant SAMPLES, then you can simply look at the prediction 
>> kernel matrix for the rows with the highest values - or use a 
>> distance function of your own.
>>
>> -Scott
>>
>> Yaroslav Halchenko wrote:
>>> Hi Tara,
>>>
>>> Unfortunately ATM we simply do not have such feature unified and
>>> exposed at all, but it sounds like a nice feature to have!  Probably
>>> something like 'predictions_ranked'  where for each prediction it would
>>> not only store the winner but all labels ordered accordingly.
>>>
>>> I have filed the request under
>>> http://github.com/hanke/PyMVPA/issues#issue/8
>>>
>>> Meanwhile you can make use of 'values' state variable.  The difficulty
>>> might be that the content of values is not unified across the
>>> classifiers, but it is somewhat obvious for most (e.g. for SMLR it 
>>> would
>>> be probabilities from logistic regressions per each label in the order
>>> of your ds.uniquelabels, so you would need just to argsort each one of
>>> them with [::-1] to get in reverse order ;) ). What classifier do you
>>> have in mind though?
>>>
>>>