[pymvpa] Alternatives to Cross-validation
Jacob Itzhacki
jitzhacki at gmail.com
Mon Oct 29 08:39:25 UTC 2012
Thank you so much for your response, Dr. Halchenko.
However I am embarrased to once again show my inexperience, but I have a
little set of questions.
Let's start off with this line of code:
NFoldPartitioner(len(ds.sa['superord'].unique),
attr='subord'),
## so it should select only those splits where we took 1 from
## each of the superord categories leaving things in balance
Sifter([('partitions', 2),
('superord',
{ 'uvalues': ds.sa['superord'].unique,
'balanced': True})
]),
], space='partitions')
My question in this regard is if the attributes file must be different in
any way for this, or just pointing to a category in question (which I'm
guessing in this case you just named it "subord") will do the trick? Is
there any way to do this for more than 2 superordinates? Sorry for this
last question as I acknowledge you wrote:
"
# And with that NFold + Sifter we achieve desired effect that we would get
only
# those splits where into testing we place 3 different subord categories
with 1
# of each superord
"
but I'm just a little lost.
Anyway, hope I am not being too big a burden. As soon as I have some
feedback to share I will be sure to.
Thanks again! J
On Thu, Oct 25, 2012 at 3:23 PM, Yaroslav Halchenko
<debian at onerussian.com>wrote:
>
> On Thu, 25 Oct 2012, Jacob Itzhacki wrote:
>
> > "e.g. �some super-ordinate category (e.g. �animate-vs-inanimate) �you
> > would like to cross-validate not across functional runs BUT across
> > sub-ordinate stimuli categories (e.g. train on
> > humans/reptiles/shoes/scissors to discriminate animacy and
> > cross-validate into bugs/houses, then continue with another pair to
> take
> > out)."
> > BTW, this exactly what I would like to do but I still don't figure
> out how
> > to leave out the test trials from the training trials, so they don't
> get
> > classified into themselves.
>
> ok then -- the point is to craft such an interesting partitioner. And
> there
> are actually 2 approaches to this. Let's first look into
>
> https://github.com/PyMVPA/PyMVPA/blob/HEAD/mvpa2/tests/test_usecases.py#L50
>
> which I am citing here with some additional comments and omitting import
> statement(s) -- it is a bit more cumbersome since in it we have 6
> subordinate
> categories and 3 superord (not 2 which would make explanation easier):
>
> # Let's simulate the beast -- 6 categories total groupped into 3
> # super-ordinate, and actually without any 'superordinate' effect
> # since subordinate categories independent
>
> # in your case I hope you would have a true superordinate effect like in
> # example study I am referring to below
>
> ds = normal_feature_dataset(nlabels=6,
> snr=100, # pure signal! ;)
> perlabel=30,
> nfeatures=6,
> nonbogus_features=range(6),
> nchunks=5)
> ds.sa['subord'] = ds.sa.targets.copy()
>
> # Here I am creating a new 'superord' category as a remainder of division
> by 3
> # of original 6 categories (in 'subord')
>
> ds.sa['superord'] = ['super%d' % (int(i[1])%3,)
> for i in ds.targets] # 3 superord categories
> # let's override original targets just to be sure that we aren't
> relying on them
> ds.targets[:] = 0
>
> npart = ChainNode([
> ## so we split based on superord
>
> # So now this NFold partitioner would select 3 subord categories (possibly
> where we even
> # have multiple samples from the same superord category)
>
> NFoldPartitioner(len(ds.sa['superord'].unique),
> attr='subord'),
> ## so it should select only those splits where we took 1 from
> ## each of the superord categories leaving things in balance
> Sifter([('partitions', 2),
> ('superord',
> { 'uvalues': ds.sa['superord'].unique,
> 'balanced': True})
> ]),
> ], space='partitions')
>
> # And with that NFold + Sifter we achieve desired effect that we would get
> only
> # those splits where into testing we place 3 different subord categories
> with 1
> # of each superord
>
> # and then do your normal where clf is space='superord'
> clf = LinearCSVMC(space='superord')
>
> cvte_regular = CrossValidation(clf, NFoldPartitioner(),
> errorfx=lambda p,t: np.mean(p==t))
>
> # below we use our NFold + Sifter partitioner instead of a simple NFold on
> chunks
>
> cvte_super = CrossValidation(clf, npart, errorfx=lambda p,t:
> np.mean(p==t))
>
> # apply as usual ;)
>
> accs_regular = cvte_regular(ds)
> accs_super = cvte_super(ds)
>
>
> If you are interested in how that would effect the results -- I would
> invite
> you to look at my recent poster at SfN 2012:
> http://haxbylab.dartmouth.edu/publications/HGG+12_sfn12_famfaces.png
> 2nd column, scatter plot "Why across identities?"
>
> where on x-axis you have z-scores for CV stats across identities while
> cross-validating searchlights on classification of personal familiarity to
> faces across functional runs, while on y-axis -- across pairs of
> individuals.
> Both results are in high agreement BUT in the "blue areas" -- early visual
> cortex, where if we cross-validate across functional runs, classifier might
> just learn identity information. Since identity of a face (subordinate
> category) here has clear association with familiarity (superordinate), it
> would
> provide significant classification results in those areas where there is
> strong
> identity information on stimuli (in our case in early visual cortex since
> the
> faces were actually different ;) ) but possibly no (strong) superord
> effects
> (let's forget for now about possible attention/engagement etc effects). By
> cross-validating across identities (subord), we can easily get rid of those
> subord-specific effects and capture the notion of the superord category
> effects more clearly.
>
> Alternative, even more stricter cross-validation scheme would involve
> cross-validation across runs BUT also bootstrapping then additional folds
> for
> each such a split with generating all those splits across identities. For
> that
> we have ExcludeTargetsCombinationsPartitioner docs for which are
>
> http://www.pymvpa.org/generated/mvpa2.generators.partition.ExcludeTargetsCombinationsPartitioner.html?highlight=excludetargetscombinationspartitioner
> and unittest
>
> https://github.com/PyMVPA/PyMVPA/blob/HEAD/mvpa2/tests/test_generators.py#L266
>
> This one was used in the original hyperalignment paper
> (http://haxbylab.dartmouth.edu/publications/HGC+11.pdf) to do not fall
> into the
> trap of run order effects...
>
> I would be glad to see people reporting back comparing these 3 schemes
> (just
> across runs, across subord, across runs+subord) of cross-validation on
> their
> data with hierarchical categories design. Thanks in advance for sharing
> -- it
> would be great if we get a dialog going instead of my one-way blurbing...
> doh
> -- sharing! ;)
>
> Cheers,
>
> > On Wed, Oct 24, 2012 at 5:15 PM, Jacob Itzhacki <[1]
> jitzhacki at gmail.com>
> > wrote:
>
> > Please do!
>
> > and thank you for all the responses :D
> > Don't want to come across as lazy but I'm not a master coder at all
> so
> > sometimes figuring out what one line of code does can be quite the
> > ordeal, in my case.
> > J
> > On Wed, Oct 24, 2012 at 3:54 PM, Yaroslav Halchenko
> > <[2]debian at onerussian.com> wrote:
>
> > On Wed, 24 Oct 2012, MS Al-Rawi wrote:
> > > � �Cross-validation is fine even in this case, you'll just need
> to
> > rearrange
> > > � �your data in a way to leave-a-set-of-stimuli out, instead of
> > > � �leave-one-run-out. Perhaps PyMVPA has some functionality to
> do
> > this.�
>
> > now it is getting interesting -- I think you got close to what I
> > thought
> > the question was about: �to investigate the conceptual/true
> effect of
> > e.g. �some super-ordinate category (e.g. �animate-vs-inanimate)
> �you
> > would like to cross-validate not across functional runs BUT across
> > sub-ordinate stimuli categories (e.g. train on
> > humans/reptiles/shoes/scissors to discriminate animacy and
> > cross-validate into bugs/houses, then continue with another pair
> to
> > take
> > out). �And that is what I thought for a moment the question was
> > about ;)
>
> > This all can be (was) done with PyMVPA although would require 3-4
> > lines of code instead of 1 to accomplish ATM. �If anyone
> interested I
> > could provide an example ;)... ?
> --
> Yaroslav O. Halchenko
> Postdoctoral Fellow, Department of Psychological and Brain Sciences
> Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
> Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419
> WWW: http://www.linkedin.com/in/yarik
>
> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20121029/3f8c4f9b/attachment-0001.html>
More information about the Pkg-ExpPsy-PyMVPA
mailing list