[pymvpa] Biased estimates by leave-one-out cross-validations in PyMVPA 2
Ping-Hui Chiu
chiupinghui at gmail.com
Fri Apr 20 21:21:48 UTC 2012
Thanks Yaroslav! The previous results make sense now.
I have a related question: After feature selection on totally random
samples, my binary classification accuracy was significantly better than
chance (50%). For MVPA with feature selection on real fMRI data, how do we
know better-than-chance performances reflect true effects or just artifacts
from feature selection?
My code with feature selection is listed below:
from mvpa2.suite import *
fsel=SensitivityBasedFeatureSelection(OneWayAnova(),FixedNElementTailSelector(25,mode='select',tail='upper'))
clf = LinearCSVMC();
cv_chunks = CrossValidation(clf, NFoldPartitioner(attr='chunks'))
cv_events = CrossValidation(clf, NFoldPartitioner(attr='events'))
acc_chunks=[]
acc_events=[]
for i in range(100):
print i
ds=Dataset(np.random.rand(200,100))
ds.sa['targets']=np.remainder(range(200),2)
ds.sa['events']=range(200)
ds.sa
['chunks']=np.concatenate((np.ones(50),np.ones(50)*2,np.ones(50)*3,np.ones(50)*4))
fsel.train(ds)
ds=fsel(ds)
ds_chunks=cv_chunks(ds)
acc_chunks.append(1-np.mean(ds_chunks))
ds_events=cv_events(ds)
acc_events.append(1-np.mean(ds_events))
>>>print np.mean(acc_chunks), np.std(acc_chunks)
0.6366 0.0350633712013
>>>print np.mean(acc_events), np.std(acc_events)
0.6405 0.0350820466906
Thanks!
Dale
On Fri, Apr 20, 2012 at 12:34 PM, Yaroslav Halchenko
<debian at onerussian.com>wrote:
> if we were to talk about bias we would talk about classification of true
> effects ;)
>
> you are trying to learn/classify noise on disbalanced sets -- since you
> have 'events' == range(200), each sample/event is taken out
> separately you have 100 of one target (say 1) and 99 of the other (say
> 0). Since it is a pure noise, classifier might choose just say that it
> is the target with majority samples (regardless of the actual data which
> is noise) since that would simply minimize its objective function during
> training. As a result you would get this "anti-learning" effect. That
> is why we usually suggest to assure to have equal number of samples per
> each category in training set (or chain with Balancer generator if data
> is disbalanced).
>
> On Fri, 20 Apr 2012, Ping-Hui Chiu wrote:
>
> > Dear PyMVPA experts,
> > Isn't a leave-one-out cross-validation supposed to produce a smaller
> bias
> > yet a larger variance in comparison to N-fold cross-validations when
> N<#
> > of samples?
>
> > I ran a sanity check on binary classification of 200 random samples.
> > 4-fold cross-validations produced unbiased estimates (~50% correct),
> > whereas leave-one-out cross-validations consistently produced
> > below-than-chance classification performances (~40% correct). Why?
>
> > Any insight on this will be highly appreciated!
>
> > My code is listed below:
>
> > from mvpa2.suite import *
> > clf = LinearCSVMC();
> > cv_chunks = CrossValidation(clf, NFoldPartitioner(attr='chunks'))
> > cv_events = CrossValidation(clf, NFoldPartitioner(attr='events'))
> > acc_chunks=[]
> > acc_events=[]
> > for i in range(200):
> > �print i
> > �ds=Dataset(np.random.rand(200))
> > �[1]ds.sa['targets']=np.remainder(range(200),2)
> > �[2]ds.sa['events']=range(200)
> > �[3]ds.sa
> ['chunks']=np.concatenate((np.ones(50),np.ones(50)*2,np.ones(50)*3,np.ones(50)*4))
> > �ds_chunks=cv_chunks(ds)
> > �acc_chunks.append(1-np.mean(ds_chunks))
> > �ds_events=cv_events(ds)
> > �acc_events.append(1-np.mean(ds_events))
>
> > >>>print np.mean(acc_chunks), np.std(acc_chunks)
> > 0.50025 0.0442542370853
> > >>>print np.mean(acc_events), np.std(acc_events)
> > 0.40674 0.189247516232
>
> > Thanks!
> > Dale
>
> > References
>
> > Visible links
> > 1. http://ds.sa/
> > 2. http://ds.sa/
> > 3. http://ds.sa/
>
> > _______________________________________________
> > Pkg-ExpPsy-PyMVPA mailing list
> > Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> >
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
>
>
> --
> =------------------------------------------------------------------=
> Keep in touch www.onerussian.com
> Yaroslav Halchenko www.ohloh.net/accounts/yarikoptic
>
> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20120420/5ecb60be/attachment.html>
More information about the Pkg-ExpPsy-PyMVPA
mailing list