[pymvpa] Bad accuracy results for cross subject classification

Mon Jan 12 13:24:35 UTC 2015

Thank you for your answer.

I didn't mention that each subject is considered either class A or class B,
it's a binary classification.

I've tried to do what you suggested and consider each subject as test
groups,
where the rest are train group.

Indeed it changed (not 100% anymore), though it's still looks fishy.
I've attached as you requested a short part of my script
and also the results (including confusion matrix).

Thanks,
Gal Star

On Mon, Jan 12, 2015 at 12:38 PM, Nick Oosterhof <
n.n.oosterhof at googlemail.com> wrote:

> On 12 Jan 2015, at 09:23, gal star <gal.star3051 at gmail.com> wrote:
>
> > I am wondering how to perform classification over
> > a cross subject nifti file i've created using FSL. […]
> >
> > I've tried to use NFoldPartitioner with LinearCSVMC, though
> > the results are always 100% accuracy (which looks fishy).
> >
> > I've assured there is no contamination in the data (no scan from trainset
> > exists also in testset).
> >
> > What is the correct way to perform cross subject classification
> > over functional datasets?
>
> At the very least you want to have different values for .sa.chunks for
> different subjects. In that way, you will not train and test on the same
> subject. This can simply be achieved by NFoldPartitioner.
>
> A bit more advanced is using a partitioning scheme where you test on data
> from one subject in one run, after training on the data from all other runs
> in all other subjects. This may reduce run-specific effects shared across
> subjects, if different subjects do exactly the same task during
> corresponding runs. For this you could use a Sifter [1]. Its use is a bit
> more complicated (hopefully less so in the future [2])
>
> If you need more help, it would be useful if you could post a minimally
> short snippet of your script that shows how you load the data, assign
> .sa.targets and .sa.chunks, define the partitioning scheme, and run the
> cross validation.
>
> [1] http://www.pymvpa.org/generated/mvpa2.generators.base.Sifter.html
> [2] https://github.com/PyMVPA/PyMVPA/issues/261
> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20150112/be1b8a10/attachment.html>
-------------- next part --------------
from __future__ import division
import sys
import os
from mvpa2.suite import *
import numpy

type = sys.argv[1]
sub = sys.argv[2]
fold = sys.argv[3]

img_name = '4D_scans_conc.nii.gz'
map_name = 'map.txt'
source 	 = '/home/gals/converted_data/sub_brik_data/' + type + '/' + sub + '/selfmade_cv/' + fold

print "type: %s" % type
print "sub: %s" % sub
print "fold number: %s" % fold

#########################################################
# Read mvpa sample attributes definition from text file #
#########################################################
attr=SampleAttributes(os.path.join(source,map_name))
print "after sampleAttributes"

fds_all=fmri_dataset(samples=os.path.join(source,img_name),targets=attr.targets,chunks=attr.chunks,mask='/home/gals/masks/brain_mask.nii.gz')
fds = vstack(fds_all)

print "passed fmri dataset"
print "after detrending"

interesting = numpy.array([l in ['221','211','3'] for l in fds.sa.targets])
fds = fds[interesting]

zscore(fds, param_est=('targets', ['3']), chunks_attr='chunks')
print "after zscore"
interesting = numpy.array([l in ['221','211'] for l in fds.sa.targets])
fds = fds[interesting]

sens = SensitivityBasedFeatureSelection(OneWayAnova(), FixedNElementTailSelector(1000 ,tail='upper', mode='select'), enable_ca=['sensitivity'])
clf = FeatureSelectionClassifier(LinearCSVMC(),sens)

nfold = NFoldPartitioner (attr='chunks')
confusion = ConfusionMatrix ()
int_train = numpy.array([l in [0] for l in fds.sa.chunks])
int_test = numpy.array([l in [1] for l in fds.sa.chunks])
train = fds[int_train]
test = fds[int_test]

clf.train(train)
predictions = clf.predict(test.samples)
confusion.add(test.targets,predictions)

print confusion.as_string(summary=True)
print "Total accuracy results: %d" % confusion.percent_correct

-------------- next part --------------
A non-text attachment was scrubbed...
Name: results for single subject as testset.png
Type: image/png
Size: 9614 bytes
Desc: not available
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20150112/be1b8a10/attachment.png>