[pymvpa] Performance distribution with random labels

Mon Dec 12 17:02:09 UTC 2016

Hi all,

I’m having trouble getting my head around something and I was wondering if
you can give me a hand.

I’m running a classification with 4 possible categories, 10 runs. My data
is balanced and I’m using CSVM and a leave one out cross-validation.

Just for fun, I wanted to create a distribution of the possible performance
if I randomized the labels of the runs, so I was expecting a performance
around 0.25, after 12,000 reps, I got 0.200, I don’t get it, do you have
any idea?

This is part of the code I used:

clf = LinearCSVMC()

SensitivityBasedFeatureSelection(OneWayAnova(), FractionTailSelector(0.01,
mode='select', tail='upper'))

fclf = FeatureSelectionClassifier(clf, fsel)

cvte = CrossValidation(fclf, NFoldPartitioner(), errorfx=lambda p, t:
np.mean(p == t), enable_ca=['stats'])

for k in range(0,rndReps):

np.random.shuffle(fds.sa.targets)

        cv_results = cvte(fds)

Thanks!

Raul Hernandez
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20161212/56123633/attachment.html>