Hi<br><br>I have an EEG dataset that contains features extracted from continuous EEG (as opposed to ERP) from 15 subjects while they were doing a task with 9 experimental conditions. So it's many samples, 9 targets, and I added a sample attribute called 'sid' to distinguish between subjects (I am mostly working on a per subject basis, only aggregating the final result).<br>
<br>Basically I want to classify between 2 conditions, and measure the classification accuracy using cross validation.<br><br>So first I select the appropriate samples (for instance for subject S02, conditions 1 and 2):<br>
subds = ds[ds.sa.sid=='S02' & (ds.targets==1 | ds.targets==2)]<br><br>Because I want to do cross validation, I am going to use a partitioner. From what I understand, they are mostly using the 'chunks' sample attributes, which I don't have in my dataset. So I decided to add it this way:<br>
<a href="http://subds.sa">subds.sa</a>['chunks'] = np.arange(subds.nsamples)<br><br>And then:<br>cv = CrossValidation(SMLR(), NGroupPartitioner(5))<br><br>This should run 5 evaluation, using 1/5 of the available data each time to test the classifier. Correct?<br>
<br>Now, for this to work properly, it requires that targets are properly randomly distributed in the dataset... for instance if the last 1/5 of the samples only contain target 2, then it won't work... What do you suggest to solve this problem? I have tried to use a ChainNode, chaining the NGroupPartitioner and a Balancer but it didn't work, apparently due to a bug in Balancer (see another mail on that one).<br>
<br>My main question though is: it seems weird to add chunks attribute like this. Is it the correct way?<br><br>Btw, is there a way to pick at random 80% of the data (with equal number of samples for each target) for training and the remaining 20% for testing, and repeat this as many times as I want to obtain a consistant result?<br>
<br>Regards<br>Brice<br>