[pymvpa] Cross Validation Output

Paul Robinson prarobinson at gmail.com
Mon Apr 15 21:55:32 UTC 2013


Hello!

I am just getting started with pyMVPA, and am a little concerned by
the output I am seeing of my classifier. Hopefully someone here can
tell me where I've gone astray. Example code I am executing is as
follows:

-------------------------------- Begin python
---------------------------------------------------
> import sys, os, math
> import numpy as np
> from mvpa2.tutorial_suite import *

# Setup the data directory
> datapath=os.path.join('/home', 'UserName', 'Subject_Data', 'Study')

# This is just a text file containing the group label for each subject:
> attr = SampleAttributes(os.path.join(datapath, 'attributes.txt'))

# Just to see that we do in fact have two unique targets and n
independent samples (chunks should be independent)
> print np.unique(attr.targets)
> print np.unique(attr.chunks)

# Import data
> fds = fmri_dataset(samples=os.path.join(datapath, 'all_masked.nii.gz'), targets=attr.targets, chunks=attr.chunks)
# Wanna see how big this sucker is:
>print fds.shape

# Setup an SVM (or kNN) classifier:
> clf = LinearCSVMC()
# > clf = kNN(k=1, dfx=one_minus_correlation, voting='majority')

# Perform cross validation
> cvte = CrossValidation(clf, NFoldPartitioner(attr='chunks'), errorfx=lambda p, t: np.mean(p == t), enable_ca=['stats'])
> cv_results = cvte(fds)

# Get some output
> print cvte.ca.stats.as_string(description=True)


----------------------------- End python
-------------------------------------------------------------

My data are a 4D stack of statistical maps (t-maps, all
co-registered), and I have an attributes.txt file with the first
column being the target labels ('control' or 'patient') and the second
column denoting the chunks (each row is a separate subject, so the
second column is just 0...[# of subjects]). First, is this the
appropriate way to set up my attributes file? E.g.:

Control 0
Control 1
...
Patient 13
Patient 14
...


I thought I'd just start with this before adding volumetric or
behavioural data, but I noticed what appeared to be a binary output
for each iteration of cross validation; like this:


>>> print cv_results.samples
[[ 0.]
 [ 0.]
 [ 1.]
 [ 0.]
 [ 1.]
 [ 1.]
 [ 0.]
...


>From the tutorial I was expecting some number between 0 and 1 on each
pass, but maybe I shouldn't if each line corresponds to a fold...

Anyway, if anyone can see anything obviously wrong with this (either
in setup or approach) I'd be most grateful for pointers. Please let me
know, too, if there's other information that would be helpful for you
to know in order to address the question.



Thanks very much,
Paul



More information about the Pkg-ExpPsy-PyMVPA mailing list