[pymvpa] Training and testing on only 1 run (no cross validation)
Lynda Lin
llin90 at illinois.edu
Fri Oct 6 03:07:15 UTC 2017
Hi pyMVPAers,
I can't seem to figure out what is probably a very simple question: is
there a way to get the individual confusion matrices for each
classification (each run) that results from using the halfpartitioner
generator? I'm guessing it has something to do with the
attributes/parameters of the generator? I'm trying to do this both for a
whole brain classification and for a searchlight. I have 2 runs/chunks
(each run has 72 trials and each trial is associated with an
Ingroup/Outgroup target). But I only want it to train on run 1 and test on
run 2 (I'm not theoretically interested in the results from train run 2 and
test run 1 and thus would just want to look at the confusion matrix for
train run1, test run2 - ie an Ingroup/Outgroup classification with 72
trials rather than 144 classifications).
I've tried it 3 different ways and I'm getting different results for each
way so just wanted to know if any of these ways is valid:
1) Using the manual split example from the tutorial and calling the
"training_stats" conditional attribute in the classifier
In the tutorial we can get the individual accuracies for each run through
cv_results.samples but I'm interested in the TPR (True Positive Rate) for
Ingroup and Outgroup separately so I'm looking to print the confusion
matrix to calculate these numbers
ds_split1 = ds[ds.sa.chunks == 1.]
ds_split2 = ds[ds.sa.chunks == 2.]
clf = LinearCSVMC(enable_ca=*['training_stats']*)
clf.set_postproc(BinaryFxNode(mean_mismatch_error,'targets'))
clf.train(ds_split1)
err = clf(ds_split2)
clf.ca.training_stats.as_string(description=True)
2) Using the HalfPartitioner function's "count" argument
clf = LinearCSVMC(enable_ca=['training_stats']) #The training_stats
confusion matrix from this method doesn't match the one above
hpart = HalfPartitioner(*count=1*, attr='chunks')
cvte = CrossValidation(clf,hpart,errorfx=lambda p,t:
np.mean(p==t),enable_ca=['stats'])
cv_results = cvte(ds)
cvte.ca.stats.as_string(description=True)
3) Doing manual counters of the predicted vs actual targets
ds_split1 = ds[ds.sa.chunks == 1.]
ds_split2 = ds[ds.sa.chunks == 2.]
clf = LinearCSVMC()
clf.train(ds_split1)
predictions=clf.predict(ds_split2.samples)
prediction_values=predictions==ds_split2.sa.targets
counter=0
for stimulus in ds_split2.sa.targets:
current_prediction_value=prediction_values[counter]
print current_prediction_value
if stimulus=='I': #Ingroup
if current_prediction_value==True:
num_correct_ingroup+=1.0
counter+=1
else:
counter+=1
elif stimulus=='O': #Outgroup
if current_prediction_value==True:
num_correct_outgroup+=1.0
counter+=1
else:
counter+=1
sensitivity_ingroup=float(num_correct_ingroup/36.0)
sensitivity_outgroup=float(num_correct_outgroup/36.0)
I'm getting different results (Ingroup/Outgroup TPRs) for each of these
methods so just wondering which, if any, of the above mentioned methods
would be the correct method for getting the confusion matrices or TPRs for
Ingroup/Outgroup only training on run 1 and testing on run 2? The last
method I wouldn't be able to use for a searchlight but might be valid for a
wholebrain?
The confusion matrix that I get from method 1 has highly accurate
predictions, which makes me doubt that's the confusion matrix I'm looking
for.
Thank you for your help!
Lynda
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20171005/f040e81b/attachment.html>
More information about the Pkg-ExpPsy-PyMVPA
mailing list