[pymvpa] mean_group_sample Mapper with NFoldPartitioner

Thu Feb 23 22:56:19 UTC 2012

Hi Michael,

Sorry for the delayed reply -- I have not spotted immediately that this
is the issue I have ran into recently so I have postponed
troubleshooting until ... now.

The short answer is:

  A mapper which you provide to MappedClassifier supposed not to change
  number of samples.  Otherwise it looks like MappedClassifier received
  X samples to predict labels for, but then spits out Y (!=X) labels --
  and "outside" of it logic gets confused.

I even started to workaround that limitation
https://github.com/yarikoptic/PyMVPA/tree/_tent/allow_ch_nsamples not so
long ago but my approach didn't feel "right" and since noone else then
raised desire for such workflow I have stopped, but we might would
like to come back to this.

So - you goal is to estimate cross validation error using mean samples
per each target within each  chunk.  Unless this is a part of bigger
picture (e.g. you are trying also to do statistical assessment through
enabling permutation testing within CrossValidation), then you
could simply do it once prior cross validation, e.g.

ds = mean_group_sample(['targets','chunks'])(ds)

and proceed with

mapper = SVDMapper()

or it would not work for you? (i.e. what is a bigger picture? ;) )

P.S. you better upgrade PyMVPA to at least our latest release 2.0.0
which should (nearly) fully compatible  with 0.6.0~rc2, you just would
need to import from mvpa2 (so you can even co-install them)

> Hi pymvpa list,

> I have a question about the mean_group_sample FxMapper:

> mapper = ChainMapper([mean_group_sample(['targets','chunks']), SVDMapper()])
> clf = MappedClassifier(LinearCSVMC(), mapper)
> cvte = CrossValidation(clf, NFoldPartitioner(),
> enable_ca=['repetition_results','stats'])

> Let's say I have 2 chunks in a dataset each with 2 targets

> ds.C = [1 1 1 1 2 2 2 2]
> ds.T = [1 1 2 2 1 1 2 2]

> The mean_group_sample(['targets','chunks']) mapper returns two chunks
> with the mean targets in each:

> ds.C = [1 1 2 2]
> ds.T = [1 2 1 2]

> That all works, until I try to use it in a ChainMapper with an
> NFoldPartitioner, as shown above.

> It seems that the partitioner doesn't produce the same number of targets
> in the training and testing split. In my case, there are 8 chunks, 25
> stimuli per chunk, divided into 5 targets (5 stimuli per target
> condition). Using mean_group_sample creates the following anomaly:

> ValueError: Collectable 'targets' with length [25] does not match the
> required length [5] of collection '<SampleAttributesCollection>'.
> >/lib/python2.6/site-packages/mvpa/base/collections.py(558)__setitem__()
>     557                                 ulength,
> --> 558                                 str(self)))
>     559         # tell the attribute to maintain the desired length

> Is there a way to use the mean_group_sample mapper with
> NFoldPartitioner() so that the testing and training splits contain the
> correct length collection objects?

> I run pymvpa version 0.6.0~rc2 on posix Linux 2.6.18-308.el5
> (redhat/5.8/Tikanga).

> I already hand-coded what I need, but I want to see if I can understand
> the pymvpa framework better.

> Thank you in advance for any insight into this mapper and partitioner
> interaction.

> Best regards,

> Michael

> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa

-- 
=------------------------------------------------------------------=
Keep in touch                                     www.onerussian.com
Yaroslav Halchenko                 www.ohloh.net/accounts/yarikoptic