[pymvpa] What is the value of using errorfx when using Cross validation?

gal star gal.star3051 at gmail.com
Wed Feb 25 15:36:02 UTC 2015

On Wed, Feb 25, 2015 at 5:15 PM, Nick Oosterhof <
n.n.oosterhof at googlemail.com> wrote:

> On 25 Feb 2015, at 15:49, gal star <gal.star3051 at gmail.com> wrote:
> > I am doing a k-fold cross validation on a data according to the
> following:
> > 1. I'm partioning the data myself - set a train ('0' chunk) and test
> chunks ('1' chunk).
> > 2. Using clf.train() and then clf.predict()
> > 3. print the accuracy result and confusion matrix.
> > And i'm repeating this k times (by running the script attached k times)
> […]
> > The standard diviation among the accuracy results produced when using
> CrossValidation class, and the standard diviation among
> > accuracy results in the way i described are different.
> - your script really is quite messy (lots of code commented out, no
> documentation), which does not invite others to read and understand your
> code. Furthermore, it does not actually allow others to reproduce the
> issue. For future reference It is helpful if you can provide a minimal
> running example so that others can reproduce what you reported.
Of course, sorry about that. Here is the minimal running example:

zscore(fds, param_est=('targets', ['control']))
int = numpy.array([l in ['class A','class B'] for l in fds.sa.targets])
fds = fds[int]

clf = FeatureSelectionClassifier(LinearCSVMC(),
FixedNElementTailSelector(1000 ,tail='upper',mode='select')))

nfold = NFoldPartitioner(attr='chunks')

< Python Code for selecting only '0' chunk for train and '1' for test>
print clf.predict(test.samples)

- from what I understand from the script, you provide ‘fold' as a
> parameter, but that parameter is actually not used in the script for your
> ‘manual’ crossvalidation. In your manual cross validation, you seem to
> always use chunk 0 for training and chunk 1 for testing.

- the nfold for k folds partitioner trains on (k-1) folds, which is more
> than the 1 fold you are training on if k>2. This will generally lead to
> more stable results and thus a lower standard deviation of accuracies.

I am marking '0' for all k-1 folds and only one fold as '1'.
The reason i'm doing that instead of using CrossValidation is because
I'm balancing the data by duplicating some datapoints from 'class B'.

Still missing the idea of errorfx.
Is it different since i'm running it manually?

> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20150225/43548b70/attachment.html>

More information about the Pkg-ExpPsy-PyMVPA mailing list