[pymvpa] Cross Validation Output

Mon Apr 15 21:58:50 UTC 2013

print fds.summary()
?

On Mon, 15 Apr 2013, Paul Robinson wrote:

> Hello!

> I am just getting started with pyMVPA, and am a little concerned by
> the output I am seeing of my classifier. Hopefully someone here can
> tell me where I've gone astray. Example code I am executing is as
> follows:

> -------------------------------- Begin python
> ---------------------------------------------------
> > import sys, os, math
> > import numpy as np
> > from mvpa2.tutorial_suite import *

> # Setup the data directory
> > datapath=os.path.join('/home', 'UserName', 'Subject_Data', 'Study')

> # This is just a text file containing the group label for each subject:
> > attr = SampleAttributes(os.path.join(datapath, 'attributes.txt'))

> # Just to see that we do in fact have two unique targets and n
> independent samples (chunks should be independent)
> > print np.unique(attr.targets)
> > print np.unique(attr.chunks)

> # Import data
> > fds = fmri_dataset(samples=os.path.join(datapath, 'all_masked.nii.gz'), targets=attr.targets, chunks=attr.chunks)
> # Wanna see how big this sucker is:
> >print fds.shape

> # Setup an SVM (or kNN) classifier:
> > clf = LinearCSVMC()
> # > clf = kNN(k=1, dfx=one_minus_correlation, voting='majority')

> # Perform cross validation
> > cvte = CrossValidation(clf, NFoldPartitioner(attr='chunks'), errorfx=lambda p, t: np.mean(p == t), enable_ca=['stats'])
> > cv_results = cvte(fds)

> # Get some output
> > print cvte.ca.stats.as_string(description=True)

> ----------------------------- End python
> -------------------------------------------------------------

> My data are a 4D stack of statistical maps (t-maps, all
> co-registered), and I have an attributes.txt file with the first
> column being the target labels ('control' or 'patient') and the second
> column denoting the chunks (each row is a separate subject, so the
> second column is just 0...[# of subjects]). First, is this the
> appropriate way to set up my attributes file? E.g.:

> Control 0
> Control 1
> ...
> Patient 13
> Patient 14
> ...

> I thought I'd just start with this before adding volumetric or
> behavioural data, but I noticed what appeared to be a binary output
> for each iteration of cross validation; like this:

> >>> print cv_results.samples
> [[ 0.]
>  [ 0.]
>  [ 1.]
>  [ 0.]
>  [ 1.]
>  [ 1.]
>  [ 0.]
> ...

> From the tutorial I was expecting some number between 0 and 1 on each
> pass, but maybe I shouldn't if each line corresponds to a fold...

> Anyway, if anyone can see anything obviously wrong with this (either
> in setup or approach) I'd be most grateful for pointers. Please let me
> know, too, if there's other information that would be helpful for you
> to know in order to address the question.

> Thanks very much,
> Paul

> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa

-- 
Yaroslav O. Halchenko
http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org
Senior Research Associate,     Psychological and Brain Sciences Dept.
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik