[pymvpa] Biased estimates by leave-one-out cross-validations in PyMVPA 2

Ping-Hui Chiu chiupinghui at gmail.com
Fri Apr 20 21:21:48 UTC 2012


Thanks Yaroslav! The previous results make sense now.

I have a related question: After feature selection on totally random
samples, my binary classification accuracy was significantly better than
chance (50%). For MVPA with feature selection on real fMRI data, how do we
know better-than-chance performances reflect true effects or just artifacts
from feature selection?

My code with feature selection is listed below:

from mvpa2.suite import *
fsel=SensitivityBasedFeatureSelection(OneWayAnova(),FixedNElementTailSelector(25,mode='select',tail='upper'))
clf = LinearCSVMC();
cv_chunks = CrossValidation(clf, NFoldPartitioner(attr='chunks'))
cv_events = CrossValidation(clf, NFoldPartitioner(attr='events'))
acc_chunks=[]
acc_events=[]
for i in range(100):
 print i
 ds=Dataset(np.random.rand(200,100))
 ds.sa['targets']=np.remainder(range(200),2)
 ds.sa['events']=range(200)
 ds.sa
['chunks']=np.concatenate((np.ones(50),np.ones(50)*2,np.ones(50)*3,np.ones(50)*4))
 fsel.train(ds)
 ds=fsel(ds)
 ds_chunks=cv_chunks(ds)
 acc_chunks.append(1-np.mean(ds_chunks))
 ds_events=cv_events(ds)
 acc_events.append(1-np.mean(ds_events))

>>>print np.mean(acc_chunks), np.std(acc_chunks)
0.6366 0.0350633712013

>>>print np.mean(acc_events), np.std(acc_events)
0.6405 0.0350820466906

Thanks!
Dale

On Fri, Apr 20, 2012 at 12:34 PM, Yaroslav Halchenko
<debian at onerussian.com>wrote:

> if we were to talk about bias we would talk about classification of true
> effects ;)
>
> you are trying to learn/classify noise on disbalanced sets -- since you
> have 'events' == range(200), each sample/event is taken out
> separately you have 100 of one target (say 1) and 99 of the other (say
> 0).  Since it is a pure noise, classifier might choose just say that it
> is the target with majority samples (regardless of the actual data which
> is noise) since that would simply minimize its objective function during
> training.  As a result you would get this "anti-learning" effect.  That
> is why we usually suggest to assure to have equal number of samples per
> each category in training set (or chain with Balancer generator if data
> is disbalanced).
>
> On Fri, 20 Apr 2012, Ping-Hui Chiu wrote:
>
> >    Dear PyMVPA experts,
> >    Isn't a leave-one-out cross-validation supposed to produce a smaller
> bias
> >    yet a larger variance in comparison to N-fold cross-validations when
> N<#
> >    of samples?
>
> >    I ran a sanity check on binary classification of 200 random samples.
> >    4-fold cross-validations produced unbiased estimates (~50% correct),
> >    whereas leave-one-out cross-validations consistently produced
> >    below-than-chance classification performances (~40% correct). Why?
>
> >    Any insight on this will be highly appreciated!
>
> >    My code is listed below:
>
> >    from mvpa2.suite import *
> >    clf = LinearCSVMC();
> >    cv_chunks = CrossValidation(clf, NFoldPartitioner(attr='chunks'))
> >    cv_events = CrossValidation(clf, NFoldPartitioner(attr='events'))
> >    acc_chunks=[]
> >    acc_events=[]
> >    for i in range(200):
> >    �print i
> >    �ds=Dataset(np.random.rand(200))
> >    �[1]ds.sa['targets']=np.remainder(range(200),2)
> >    �[2]ds.sa['events']=range(200)
> >    �[3]ds.sa
> ['chunks']=np.concatenate((np.ones(50),np.ones(50)*2,np.ones(50)*3,np.ones(50)*4))
> >    �ds_chunks=cv_chunks(ds)
> >    �acc_chunks.append(1-np.mean(ds_chunks))
> >    �ds_events=cv_events(ds)
> >    �acc_events.append(1-np.mean(ds_events))
>
> >    >>>print np.mean(acc_chunks), np.std(acc_chunks)
> >    0.50025 0.0442542370853
> >    >>>print np.mean(acc_events), np.std(acc_events)
> >    0.40674 0.189247516232
>
> >    Thanks!
> >    Dale
>
> > References
>
> >    Visible links
> >    1. http://ds.sa/
> >    2. http://ds.sa/
> >    3. http://ds.sa/
>
> > _______________________________________________
> > Pkg-ExpPsy-PyMVPA mailing list
> > Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> >
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
>
>
> --
> =------------------------------------------------------------------=
> Keep in touch                                     www.onerussian.com
> Yaroslav Halchenko                 www.ohloh.net/accounts/yarikoptic
>
> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20120420/5ecb60be/attachment.html>


More information about the Pkg-ExpPsy-PyMVPA mailing list