[pymvpa] Classification raining on run1, testing on run2 only (no cross validation)

Wed Aug 2 04:22:06 UTC 2017

Hello,

I'm new to pyMVPA and have read through all of the tutorial and other documentation, but can't seem to figure out what is probably a very simple question: is there a way to get the individual confusion matrices for each classification (each run) that results from using the halfpartitioner generator? I'm guessing it has something to do with the attributes/parameters of the generator?  I'm trying to do this both for a whole brain classification and for a searchlight. I have 2 runs/chunks (each run has 72 trials and each trial is associated with an Ingroup/Outgroup target). But I only want it to train on run 1 and test on run 2 (I'm not interested in the results from train run 2 and test run 1 and thus would just want to look at the confusion matrix for train run1, test run2 - ie an Ingroup/Outgroup classification with 72 trials rather than 144 classifications). For the whole brain my goal is to calculate the TPR for Ingroup and Outgroup targets only training on run 1 and testing on run 2.

I've tried it three different ways and I'm getting different results for each way so just wanted to know if any of these ways is valid:

1) Using the manual split example from the tutorial and calling the "training_stats" conditional attribute in the classifier
In the tutorial we can get the individual accuracies for each run through cv_results.samples but I'm interested in the TPR (True Positive Rate) for Ingroup and Outgroup separately so I'm looking to print the confusion matrix to calculate these numbers

ds_split1 = ds[ds.sa.chunks == 1.]
ds_split2 = ds[ds.sa.chunks == 2.]
clf = LinearCSVMC(enable_ca=['training_stats'])
clf.set_postproc(BinaryFxNode(mean_mismatch_error,'targets'))
clf.train(ds_split1)
err = clf(ds_split2)
clf.ca.training_stats.as_string(description=True)

2) Using the HalfPartitioner function's "count" argument

 clf = LinearCSVMC(enable_ca=['training_stats']) #The training_stats confusion matrix from this method doesn't match the one above
 hpart = HalfPartitioner(count=1, attr='chunks')
 cvte = CrossValidation(clf,hpart,errorfx=lambda p,t: np.mean(p==t),enable_ca=['stats'])
 cv_results = cvte(ds)
 cvte.ca.stats.as_string(description=True)

3) Doing manual counters of the predicted vs actual targets

ds_split1 = ds[ds.sa.chunks == 1.]
ds_split2 = ds[ds.sa.chunks == 2.]
clf = LinearCSVMC()
clf.train(ds_split1)
predictions=clf.predict(ds_split2.samples)
prediction_values=predictions==ds_split2.sa.targets

counter=0
        for stimulus in ds_split2.sa.targets:
            current_prediction_value=prediction_values[counter]
            print current_prediction_value
            if stimulus=='I': #Ingroup
                if current_prediction_value==True:
                    num_correct_ingroup+=1.0
                    counter+=1
                else:
                    counter+=1
            elif stimulus=='O': #Outgroup
                if current_prediction_value==True:
                    num_correct_outgroup+=1.0
                    counter+=1
                else:
                    counter+=1

        sensitivity_ingroup=float(num_correct_ingroup/36.0)
        sensitivity_outgroup=float(num_correct_outgroup/36.0)

I'm getting different results (Ingroup/Outgroup TPRs) for each of these methods so just wondering which, if any, of the above mentioned methods would be the correct method for getting the confusion matrices or TPRs for Ingroup/Outgroup only training on run 1 and testing on run 2? The last method I wouldn't be able to use for a searchlight but might be valid for a wholebrain?

The confusion matrix that I get from method 1 has highly accurate predictions, which makes me doubt that's the confusion matrix I'm looking for.

Thank you for your help!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20170802/1ed0f472/attachment.html>