[pymvpa] unable to get meaningful SMLR, do I have the wrong data format?
Kinga Laura Dobolyi
kld5r at cs.virginia.edu
Tue May 4 16:44:48 UTC 2010
Hello,
I am new to fMRI and I think you have a great tool - unfortunately I'm
having a bit of trouble getting the SMLR classifier to work for my
dataset. I think the problem is in the format of the input data. When I
run the start_easy.py example, modified with my own data, and print out
the confusion matrix, I get something like this:
----------.
predictions\targets 0.0 1.0 2.0 3.0 4.0
`------ ---- ---- ---- ---- ---- P' N' FP FN
PPV NPV TPR SPC FDR MCC AUC
0.0 20 15 27 21 18 101 0 81 0
0.2 nan 1 0 0.8 0 0.54
1.0 0 0 0 0 0 0 35 0 15
nan 0.57 0 1 nan 0 0.49
2.0 0 0 0 0 0 0 47 0 27
nan 0.43 0 1 nan 0 0.21
3.0 0 0 0 0 0 0 41 0 21
nan 0.49 0 1 nan 0 0.47
4.0 0 0 0 0 0 0 38 0 18
nan 0.53 0 1 nan 0 0.7
Per target: ---- ---- ---- ---- ----
P 20 15 27 21 18
N 81 86 74 80 83
TP 20 0 0 0 0
TN 0 20 20 20 20
Summary \ Means: ---- ---- ---- ---- ---- 20.2 32.2 16.2 16.2
nan nan 0.2 0.8 nan 0 0.48
ACC 0.2
ACC% 19.8
# of sets 2
There are 4 stimuli (labels 1-4) and 1 fixation (label 0), corresponding
to the 5 labels. No matter what I do (i.e. remove fixation from the
dataset, etc), the classifier always seems to classify everything as the
first kind of label that it sees. I think this is unusual, because if
the dataset was just noisy, then I feel like it would randomly assign
labels. It seems to assign everything to just 1 label, regardless of
what that label is - it's almost like the feature weights are zero!
My dataset is about 500 files that end in nii.gz; I import them like this:
dataArr = ['/mvpa/snafps0.nii.gz',/mvpa/snafps1.nii.gz',............]
The attributes file has exactly as many entries as the number of nii
files in my dataArr, which makes sense of course.
I decided to see if it was my dataset being noisy, or something wrong
with the format of the dataset, by trying to get the classifier to do
something useful on the sample Haxby dataset from the Princeton MVPA
toolkit. They have 10 files of hdr/img pairs, and each file has 121
volumes in it. What I did was save each volume as its own .nii file, and
I created an attributes.txt file with 10x121 entries in it by mapping
the regressors in their dataset into the format pymvpa wants. Because I
have 10 files each with 121 volumes in it, after converting everything
to nifti I had 1210 separate nifti files. When I run the same
start_easy.py on this dataset, I see the same kind of pattern/problem in
the confusion matrix, which shouldn't be happening, since the folks over
at Princeton supposedly used this dataset successfully.
The difference between my dataset and the princeton dataset, versus the
sample dataset in pymvpa, is that it looks like each volume in my
dataset is a scan of an entire brain - I open them using MRIcroN and
when I use my mouse wheel I can scroll through what looks like the whole
brain. Each volume in the example pymvpa bold.nii.gz is a single slice
of one section of a brain (and the posterior part of the slice at that),
as opposed to a 3D scan of the brain I can use my mouse to scroll through.
Is this why the classifier is getting confused? Does it not know how to
handle these whole-brain nii volumes? Any suggestions as to what I might
be doing wrong? I don't understand why I can't get the Princeton sample
data, which I think might even be from the same Haxby experiment as the
PyMVPA sample data, to do something meaningful with the SMLR classifier
in the start_easy.py example. When I say I don't understand, I mean I am
sure I am doing something wrong, it is just not obvious to me since I
have a computer science background and not a neuroscience background!
Thanks so much for this great package - hopefully I can make it work! :-)
More information about the Pkg-ExpPsy-PyMVPA
mailing list