[pymvpa] same sensitivity values for all dataset splits from cross validation using tutorial script snippet

Meng Liang meng.liang at hotmail.co.uk
Mon Jul 7 12:00:18 UTC 2014


Dear Michael,
Thanks for the clarification and the modification of the codes - it works now!
Best,Meng
> Date: Mon, 30 Jun 2014 21:33:42 +0200
> From: mih at debian.org
> To: pkg-exppsy-pymvpa at lists.alioth.debian.org
> Subject: Re: [pymvpa] same sensitivity values for all dataset splits from cross validation using tutorial script snippet
> 
> Hi,
> 
> I am sorry you had to go through this...
> 
> On Mon, Jun 30, 2014 at 02:18:07PM +0100, Meng Liang wrote:
> > I am trying to obtain the sensitivity values for all splits of the
> > dataset during leave-one-out cross-validation (classification using
> > SVM). I found in the tutorial "Classification Model Parameters –
> > Sensitivity Analysis" (
> > http://www.pymvpa.org/tutorial_sensitivity.html ) that
> > RepeatedMeasure(sensana, NFoldPartitioner()) should give the
> > sensitivity values for each fold. Here are the code snippet I used in
> > my script slightly adapted from the tutorial:
> >
> >   clf = LinearNuSVMC()
> >   cv = CrossValidation(clf, NFoldPartitioner(),enable_ca=['stats'])
> >   sensana = clf.get_sensitivity_analyzer()
> >   cv_sensana = RepeatedMeasure(sensana, NFoldPartitioner())
> >   error = cv(ds)
> >   sensmap_cv = cv_sensana(ds)
> >   'print sensmap_cv.shape'
> >
> > gave me: (14L, 87L). 
> >
> > I have 14 subjects and I am using leave-one-subject-out
> > cross-validation, and there are 87 features. So the data structure
> > seems correct. However, when I look at the values of this 14x87 array,
> > all the rows in the array contain exactly the same values (i.e., the
> > first row looks the same with all the other rows). 
> 
> I am afraid you found a bug in the documentation (more specifically a
> bit of code that has not been properly adjusted when we switched from
> dataset splitters to dataset partitioners -- just mentioning it for
> those who have been around for that long...).
> 
> The reason for the behavior you observe is that, in contrast to what is
> advertised in the tutorial, RepeatedMeasure does not split any dataset.
> It does what it says on the label: it repeats a measure, for whatever
> datasets come out of the provided generator -- in your case
> NFoldPartitioner. However, partitioners only add a sample attribute to a
> dataset that indicate the current partitioning scheme -- they do not
> split a dataset -- hence you are actually computing sensitivities,
> repeatedly, from the identical dataset.
> 
> If you want to compute the sensitivities on the respective training
> samples of each data fold (which I think you do) you need to change that
> line to:
> 
> cv_sensana = RepeatedMeasure(sensana,
>                              ChainNode((NFoldPartitioner(),
>                                        Splitter('partitions',
>                                                 attr_values=(1,)))))
> 
> This change amends the partitioner with a splitter that actually takes
> out the training samples of each fold and feeds them into the
> sensitivity measure.
> 
> > A related question about normalizing the sensitivity values: in the
> > "Closing Words" of the tutorial on the same webpage, it says: "It
> > should also be noted that sensitivities can not be directly compared
> > to each other, even if they stem from the same algorithm and are just
> > computed on different dataset splits. In an analysis one would have to
> > normalize them first." My question is: if we cannot compare the
> > sensitivity values from different data splits without normalizing them
> > first, why can we average them or take the maximum value across data
> > splits without applying any normalization (the example script snippets
> > in the tutorial seem to do so)? I would imagine that the average or
> > the max value would also be affected by the scale of the data. 
> 
> Yes, you are right: they could be normalized even more (the dataset in
> the tutorial, however, is a single subject and it was z-scored upfront.
> So it is not that bad...
> 
> Sorry for the bug. I filed a bug report and we'll fix it ASAP.
> 
> Michael
> 
> -- 
> J.-Prof. Dr. Michael Hanke
> Psychoinformatik Labor,    Institut für  Psychologie II
> Otto-von-Guericke-Universität Magdeburg,  Universitätsplatz 2, Geb.24
> Tel.: +49(0)391-67-18481 Fax: +49(0)391-67-11947  GPG: 4096R/7FFB9E9B
> 
> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20140707/83684c1d/attachment.html>


More information about the Pkg-ExpPsy-PyMVPA mailing list