[pymvpa] Alternatives to Cross-validation

Mon Oct 29 08:39:25 UTC 2012

Thank you so much for your response, Dr. Halchenko.

However I am embarrased to once again show my inexperience, but I have a
little set of questions.

Let's start off with this line of code:

        NFoldPartitioner(len(ds.sa['superord'].unique),
                         attr='subord'),
        ## so it should select only those splits where we took 1 from
        ## each of the superord categories leaving things in balance
        Sifter([('partitions', 2),
                ('superord',
                 { 'uvalues': ds.sa['superord'].unique,
                   'balanced': True})
                 ]),
                   ], space='partitions')

My question in this regard is if the attributes file must be different in
any way for this, or just pointing to a category in question (which I'm
guessing in this case you just named it "subord") will do the trick? Is
there any way to do this for more than 2 superordinates? Sorry for this
last question as I acknowledge you wrote:

"
# And with that NFold + Sifter we achieve desired effect that we would get
only
# those splits where into testing we place 3 different subord categories
with 1
# of each superord
"

but I'm just a little lost.

Anyway, hope I am not being too big a burden. As soon as I have some
feedback to share I will be sure to.

Thanks again! J

On Thu, Oct 25, 2012 at 3:23 PM, Yaroslav Halchenko
<debian at onerussian.com>wrote:

>
> On Thu, 25 Oct 2012, Jacob Itzhacki wrote:
>
> >    "e.g. �some super-ordinate category (e.g. �animate-vs-inanimate) �you
> >    would like to cross-validate not across functional runs BUT across
> >    sub-ordinate stimuli categories (e.g. train on
> >    humans/reptiles/shoes/scissors to discriminate animacy and
> >    cross-validate into bugs/houses, then continue with another pair to
> take
> >    out)."
> >    BTW, this exactly what I would like to do but I still don't figure
> out how
> >    to leave out the test trials from the training trials, so they don't
> get
> >    classified into themselves.
>
> ok then -- the point is to craft such an interesting partitioner.  And
> there
> are actually 2 approaches to this.  Let's first look into
>
> https://github.com/PyMVPA/PyMVPA/blob/HEAD/mvpa2/tests/test_usecases.py#L50
>
> which I am citing here with some additional comments and omitting import
> statement(s) -- it is a bit more cumbersome since in it we have 6
> subordinate
> categories and 3 superord (not 2 which would make explanation easier):
>
>     # Let's simulate the beast -- 6 categories total groupped into 3
>     # super-ordinate, and actually without any 'superordinate' effect
>     # since subordinate categories independent
>
> # in your case I hope you would have a true superordinate effect like in
> # example study I am referring to below
>
>     ds = normal_feature_dataset(nlabels=6,
>                                 snr=100,   # pure signal! ;)
>                                 perlabel=30,
>                                 nfeatures=6,
>                                 nonbogus_features=range(6),
>                                 nchunks=5)
>     ds.sa['subord'] = ds.sa.targets.copy()
>
> # Here  I am creating a new 'superord' category as a remainder of division
> by 3
> # of original 6 categories (in 'subord')
>
>     ds.sa['superord'] = ['super%d' % (int(i[1])%3,)
>                          for i in ds.targets]   # 3 superord categories
>     # let's override original targets just to be sure that we aren't
> relying on them
>     ds.targets[:] = 0
>
>     npart = ChainNode([
>     ## so we split based on superord
>
> # So now this NFold partitioner would select 3 subord categories (possibly
> where we even
> # have multiple samples from the same superord category)
>
>         NFoldPartitioner(len(ds.sa['superord'].unique),
>                          attr='subord'),
>         ## so it should select only those splits where we took 1 from
>         ## each of the superord categories leaving things in balance
>         Sifter([('partitions', 2),
>                 ('superord',
>                  { 'uvalues': ds.sa['superord'].unique,
>                    'balanced': True})
>                  ]),
>                    ], space='partitions')
>
> # And with that NFold + Sifter we achieve desired effect that we would get
> only
> # those splits where into testing we place 3 different subord categories
> with 1
> # of each superord
>
>     # and then do your normal where clf is space='superord'
>     clf = LinearCSVMC(space='superord')
>
>     cvte_regular = CrossValidation(clf, NFoldPartitioner(),
>                                    errorfx=lambda p,t: np.mean(p==t))
>
> # below we use our NFold + Sifter partitioner instead of a simple NFold on
> chunks
>
>     cvte_super = CrossValidation(clf, npart, errorfx=lambda p,t:
> np.mean(p==t))
>
> # apply as usual ;)
>
>     accs_regular = cvte_regular(ds)
>     accs_super = cvte_super(ds)
>
>
> If you are interested in how that would effect the results -- I would
> invite
> you to look at my recent poster at SfN 2012:
> http://haxbylab.dartmouth.edu/publications/HGG+12_sfn12_famfaces.png
> 2nd column, scatter plot "Why across identities?"
>
> where on x-axis you have z-scores for CV stats across identities while
> cross-validating searchlights on classification of personal familiarity to
> faces across functional runs, while on y-axis -- across pairs of
> individuals.
> Both results are in high agreement BUT in the "blue areas" -- early visual
> cortex, where if we cross-validate across functional runs, classifier might
> just learn identity information. Since identity of a face (subordinate
> category) here has clear association with familiarity (superordinate), it
> would
> provide significant classification results in those areas where there is
> strong
> identity information on stimuli (in our case in early visual cortex since
> the
> faces were actually different ;) ) but possibly no (strong) superord
> effects
> (let's forget for now about possible attention/engagement etc effects).  By
> cross-validating across identities (subord), we can easily get rid of those
> subord-specific effects and capture the notion of the superord category
> effects more clearly.
>
> Alternative, even more stricter cross-validation scheme would involve
> cross-validation across runs BUT also bootstrapping then additional folds
> for
> each such a split with generating all those splits across identities.  For
> that
> we have ExcludeTargetsCombinationsPartitioner docs for which are
>
> http://www.pymvpa.org/generated/mvpa2.generators.partition.ExcludeTargetsCombinationsPartitioner.html?highlight=excludetargetscombinationspartitioner
> and unittest
>
> https://github.com/PyMVPA/PyMVPA/blob/HEAD/mvpa2/tests/test_generators.py#L266
>
> This one was used in the original hyperalignment paper
> (http://haxbylab.dartmouth.edu/publications/HGC+11.pdf) to do not fall
> into the
> trap of run order effects...
>
> I would be glad to see people reporting back comparing these 3 schemes
> (just
> across runs, across subord, across runs+subord) of cross-validation on
> their
> data with hierarchical categories design. Thanks in advance for sharing
>  -- it
> would be great if we get a dialog going instead of my one-way blurbing...
> doh
> -- sharing! ;)
>
> Cheers,
>
> >    On Wed, Oct 24, 2012 at 5:15 PM, Jacob Itzhacki <[1]
> jitzhacki at gmail.com>
> >    wrote:
>
> >      Please do!
>
> >      and thank you for all the responses :D
> >      Don't want to come across as lazy but I'm not a master coder at all
> so
> >      sometimes figuring out what one line of code does can be quite the
> >      ordeal, in my case.
> >      J
> >      On Wed, Oct 24, 2012 at 3:54 PM, Yaroslav Halchenko
> >      <[2]debian at onerussian.com> wrote:
>
> >        On Wed, 24 Oct 2012, MS Al-Rawi wrote:
> >        > � �Cross-validation is fine even in this case, you'll just need
> to
> >        rearrange
> >        > � �your data in a way to leave-a-set-of-stimuli out, instead of
> >        > � �leave-one-run-out. Perhaps PyMVPA has some functionality to
> do
> >        this.�
>
> >        now it is getting interesting -- I think you got close to what I
> >        thought
> >        the question was about: �to investigate the conceptual/true
> effect of
> >        e.g. �some super-ordinate category (e.g. �animate-vs-inanimate)
> �you
> >        would like to cross-validate not across functional runs BUT across
> >        sub-ordinate stimuli categories (e.g. train on
> >        humans/reptiles/shoes/scissors to discriminate animacy and
> >        cross-validate into bugs/houses, then continue with another pair
> to
> >        take
> >        out). �And that is what I thought for a moment the question was
> >        about ;)
>
> >        This all can be (was) done with PyMVPA although would require 3-4
> >        lines of code instead of 1 to accomplish ATM. �If anyone
> interested I
> >        could provide an example ;)... ?
> --
> Yaroslav O. Halchenko
> Postdoctoral Fellow,   Department of Psychological and Brain Sciences
> Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
> Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
> WWW:   http://www.linkedin.com/in/yarik
>
> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20121029/3f8c4f9b/attachment-0001.html>