[pymvpa] EEG dataset and chunks
brice.rebsamen at gmail.com
Thu Apr 7 16:42:01 UTC 2011
I have an EEG dataset that contains features extracted from continuous EEG
(as opposed to ERP) from 15 subjects while they were doing a task with 9
experimental conditions. So it's many samples, 9 targets, and I added a
sample attribute called 'sid' to distinguish between subjects (I am mostly
working on a per subject basis, only aggregating the final result).
Basically I want to classify between 2 conditions, and measure the
classification accuracy using cross validation.
So first I select the appropriate samples (for instance for subject S02,
conditions 1 and 2):
subds = ds[ds.sa.sid=='S02' & (ds.targets==1 | ds.targets==2)]
Because I want to do cross validation, I am going to use a partitioner. From
what I understand, they are mostly using the 'chunks' sample attributes,
which I don't have in my dataset. So I decided to add it this way:
subds.sa['chunks'] = np.arange(subds.nsamples)
cv = CrossValidation(SMLR(), NGroupPartitioner(5))
This should run 5 evaluation, using 1/5 of the available data each time to
test the classifier. Correct?
Now, for this to work properly, it requires that targets are properly
randomly distributed in the dataset... for instance if the last 1/5 of the
samples only contain target 2, then it won't work... What do you suggest to
solve this problem? I have tried to use a ChainNode, chaining the
NGroupPartitioner and a Balancer but it didn't work, apparently due to a bug
in Balancer (see another mail on that one).
My main question though is: it seems weird to add chunks attribute like
this. Is it the correct way?
Btw, is there a way to pick at random 80% of the data (with equal number of
samples for each target) for training and the remaining 20% for testing, and
repeat this as many times as I want to obtain a consistant result?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Pkg-ExpPsy-PyMVPA