[pymvpa] EEG dataset and chunks

Fri Apr 8 02:57:17 UTC 2011

Thanks,

CrossValidation(clf,
               Balancer(amount=0.8, limit=None, attr='targets', count=3),
               splitter=Splitter('balanced_set', [True, False]))

This seems like a good idea, but I get this error message:

TypeError: Unexpected keyword argument splitter=<Splitter> for
<CrossValidation>. Valid parameters are ['datasets', 'training_stats',
'raw_results', 'calling_time', 'training_time', 'null_t', 'null_prob',
'stats', 'repetition_results']

I think I understand the role of the 'chunks' attribute, and I see how I
should use it. I guess my samples are not all independent...

Regards
Brice

On Fri, Apr 8, 2011 at 3:29 AM, Yaroslav Halchenko <debian at onerussian.com>wrote:

> >    This should run 5 evaluation, using 1/5 of the available data each
> time
> >    to test the classifier. Correct?
>
> correct in that it should generate for you 5 partitions, where in first
> one you would obtain nsamples/5 first samples (and corresponding
> "chunks" unique per each sample in your case)
>
> >    Now, for this to work properly, it requires that targets are properly
> >    randomly distributed in the dataset...
>
> well... theoretically speaking, if you  have lots of samples, you might
> escape by doing classical leave-1-out cross-validation.  That would be
> implemented by using NFoldPartitioner on your dataset (ie without
> NGroupPartitioner).  But it would take a while to do such a
> cross-validation -- might be not desired unless coded explicitly for it
> (e.g. for SVMs either using CachedKernel to avoid recomputation of
> kernels, or even more trickery...)
>
> > for instance if the last 1/5 of
> >    the samples only contain target 2, then it won't work...
>
> yeap -- that is the catch ;)
>
> you could use NFoldPartitioner(cvtype=2) which would combine all
> possible combinations of 2 chunks with a consecutive Sifter (recently
> introduced) to get only those partitions which carry labels from both
> classes, but, once again, it would be A LOT to cross-validate (i.e.
> roughly (nsamples/2)^2), so I guess not solution for you either
>
> > What do you suggest to solve this problem?
>
> If you have some certainty that samples are independent, then to get
> reasonable generalization estimate, just assign np.arange(nsamples/2)
> (assuming balanced initially classes) as chunks to samples per each
> condition.  Then in each chunk, you would guarantee to have a pair of
> conditions ;)  And then you are welcome to use NGroupPartitioner to
> bring number of partitions to some more cost-effective number , e.g. 10.
>
> > I have tried to use a ChainNode,
> >    chaining the NGroupPartitioner and a Balancer but it didn't work,
>
> if I see it right -- it should have worked, unless you had really
> degenerate case, e.g. one of partitions contained samples only of 1
> category.
>
> >    apparently due to a bug in Balancer (see another mail on that one).
>
> oops -- need to check emails then...
>
> >    My main question though is: it seems weird to add chunks attribute
> like
> >    this. Is it the correct way?
>
> well... if you consider your samples independent from each other, then
> yes -- it is reasonable to assign each sample into a separate chunk.
>
>
> >    Btw, is there a way to pick at random 80% of the data (with equal
> >    number of samples for each target) for training and the remaining 20%
> >    for testing, and repeat this as many times as I want to obtain a
> >    consistant result?
>
> although I think we haven't tried it, but this should do:
>
> CrossValidation(clf,
>                Balancer(amount=0.8, limit=None, attr='targets', count=3),
>                splitter=Splitter('balanced_set', [True, False]))
>
> should do cross-validation by taking 3 of those (raise 3 to the number you
> like).
>
> what we do here -- we say to balance targets, take 80% and mark them True,
> while other 20% False.  Then we proceed to the cross-validation.  That
> thing
> uses an actual splitter which splits dataset into training/testing parts.
> Usually such splitter is not specified and constructed by CrossValidation
> assuming operation on partitions labeled as 0,1 (and possibly 2) usually
> provided by Partitioners.  But now we want to split based on balanced_set
> --
> and we can do that, and instruct it to take 80% True for training, and the
> rest
> (False) for testing.
>
> limit=None is there to say to not limit subsampling to any attribute
> (commonly
> chunks), so in this case you don't even need to have chunks at all.
>
> is that what you needed?
>
> --
> =------------------------------------------------------------------=
> Keep in touch                                     www.onerussian.com
> Yaroslav Halchenko                 www.ohloh.net/accounts/yarikoptic
>
> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20110408/959044ba/attachment-0001.htm>