[pymvpa] unable to get meaningful SMLR, do I have the wrong data format?

Tue May 4 16:44:48 UTC 2010

Hello,

I am new to fMRI and I think you have a great tool - unfortunately I'm 
having a bit of trouble getting the SMLR classifier to work for my 
dataset. I think the problem is in the format of the input data. When I 
run the start_easy.py example, modified with my own data, and print out 
the confusion matrix, I get something like this:
----------.
predictions\targets  0.0   1.0   2.0   3.0   4.0
            `------  ----  ----  ----  ----  ----  P'   N'   FP   FN  
PPV  NPV TPR SPC FDR MCC  AUC
        0.0           20    15    27    21    18   101   0   81    0  
0.2  nan  1   0  0.8  0  0.54
        1.0           0     0     0     0     0     0   35    0   15  
nan 0.57  0   1  nan  0  0.49
        2.0           0     0     0     0     0     0   47    0   27  
nan 0.43  0   1  nan  0  0.21
        3.0           0     0     0     0     0     0   41    0   21  
nan 0.49  0   1  nan  0  0.47
        4.0           0     0     0     0     0     0   38    0   18  
nan 0.53  0   1  nan  0   0.7
Per target:          ----  ----  ----  ----  ----
         P            20    15    27    21    18
         N            81    86    74    80    83
         TP           20    0     0     0     0
         TN           0     20    20    20    20
Summary \ Means:     ----  ----  ----  ----  ---- 20.2 32.2 16.2 16.2 
nan  nan 0.2 0.8 nan  0  0.48
        ACC          0.2
        ACC%         19.8
     # of sets        2

There are 4 stimuli (labels 1-4) and 1 fixation (label 0), corresponding 
to the 5 labels. No matter what I do (i.e. remove fixation from the 
dataset, etc), the classifier always seems to classify everything as the 
first kind of label that it sees. I think this is unusual, because if 
the dataset was just noisy, then I feel like it would randomly assign 
labels. It seems to assign everything to just 1 label, regardless of 
what that label is - it's almost like the feature weights are zero!

My dataset is about 500 files that end in nii.gz; I import them like this:
dataArr = ['/mvpa/snafps0.nii.gz',/mvpa/snafps1.nii.gz',............]

The attributes file has exactly as many entries as the number of nii 
files in my dataArr, which makes sense of course.

I decided to see if it was my dataset being noisy, or something wrong 
with the format of the dataset,  by trying to get the classifier to do 
something useful on the sample Haxby dataset from the Princeton MVPA 
toolkit. They have 10 files of hdr/img pairs, and each file has 121 
volumes in it. What I did was save each volume as its own .nii file, and 
I created an attributes.txt file with 10x121 entries in it by mapping 
the regressors in their dataset into the format pymvpa wants. Because I 
have 10 files each with 121 volumes in it, after converting everything 
to nifti I had 1210 separate nifti files. When I run the same 
start_easy.py on this dataset, I see the same kind of pattern/problem in 
the confusion matrix, which shouldn't be happening, since the folks over 
at Princeton supposedly used this dataset successfully.

The difference between my dataset and the princeton dataset, versus the 
sample dataset in pymvpa, is that it looks like each volume in my 
dataset is a scan of an entire brain - I open them using MRIcroN and 
when I use my mouse wheel I can scroll through what looks like the whole 
brain. Each volume in the example pymvpa bold.nii.gz is a single slice 
of one section of a brain (and the posterior part of the slice at that), 
as opposed to a 3D scan of the brain I can use my mouse to scroll through.

Is this why the classifier is getting confused? Does it not know how to 
handle these whole-brain nii volumes? Any suggestions as to what I might 
be doing wrong? I don't understand why I can't get the Princeton sample 
data, which I think might even be from the same Haxby experiment as the 
PyMVPA sample data, to do something meaningful with the SMLR classifier 
in the start_easy.py example. When I say I don't understand, I mean I am 
sure I am doing something wrong, it is just not obvious to me since I 
have a computer science background and not a neuroscience background!

Thanks so much for this great package - hopefully I can make it work! :-)