[pymvpa] How to build the null-distribution

Tue Aug 26 14:46:44 UTC 2014

Hi all,
I have two tasks including two same conditions in the fMRI experiment. Now I want to train a clf(LinearCSVMC) in task1 and then to classify task2. Similarly, I will also train a clf in task2 then to classify task 2.  I want to achieve and test the signficance of prediction accuracy of both classifiers. My questions in the final.
My script:
import npumpy as np
from mvpa2.suite import *
from mvpa2.mappers.detrend import poly_detrend
from mvpa2.datasets.mri import fmri_dataset
from mvpa2.datasets.miscfx import remove_invariant_features
data = [0,1,2,3,4,5]
for i in data:
     #input
     clf = LinearCSVMC()
     FMRIFile1  = 'data/task1.nii.gz'
     FMRIFile2  = 'data/task2.nii.gz'
     LabelFile1 = 'attributes/task1.txt'
     LabelFile2 = 'attributes/task2.txt'
     maskFile   = 'mask/mask.nii.gz' 
     MotionParameterFile1 = 'headmotion/task1.par' 
     MotionParameterFile2 = 'headmotion/task2.par' 

     #load data
     attrs1 = SampleAttributes(LabelFile1)
     ds1 = fmri_dataset(samples=FMRIFile1, targets=attrs1.targets, chunks=attrs1.chunks, mask = maskFile)
     attrs2 = SampleAttributes(LabelFile2)
     ds2 = fmri_dataset(samples=FMRIFile2, targets=attrs2.targets, chunks=attrs2.chunks, mask = maskFile)

     # Detrend with motion correction parameter
     mc1 = McFlirtParams(MotionParameterFile1)
     for param in mc1:
          ds1.sa['mc_' + param] = mc1[param]
     mc2 = McFlirtParams(MotionParameterFile2)
     for param in mc1:
          ds2.sa['mc_' + param] = mc2[param]

     # detrend some dataset with mc params as additonal regressors
     res = poly_detrend(ds1, opt_regs=['mc_x', 'mc_y', 'mc_z', 'mc_rot1', 'mc_rot2', 'mc_rot3'])
     res = poly_detrend(ds2, opt_regs=['mc_x', 'mc_y', 'mc_z', 'mc_rot1', 'mc_rot2', 'mc_rot3'])

     # do chunkswise linear detrending on dataset
     poly_detrend(ds1, polyord=1, chunks_attr='chunks')
     poly_detrend(ds2, polyord=1, chunks_attr='chunks')

     # do z-score data - zscore dataset relative to baseline ('rest') mean
     zscore(ds1, chunks_attr='chunks')
     zscore(ds2, chunks_attr='chunks')

     # select classes for mvpa analysis
     ds1 = ds1[np.array([l%6 == i for l in ds1.sa.time_indices], dtype='bool')]
     ds1 = remove_invariant_features(ds1)
     ds2 = ds2[np.array([l%6 == i for l in ds2.sa.time_indices], dtype='bool')]
     ds2 = remove_invariant_features(ds2) 

    #do classification
     clf.train(ds1)
     predictions1 = clf.predict(ds2.samples)
     acc1 = np.mean(predictions1 == ds2.sa.targets) 
     clf.train(ds2)
     predictions2 = clf.predict(ds1.samples)
     acc2 = np.mean(predictions2 == ds1.sa.targets)

My Questions:
(1)My frist goal is to train a clf in one task, and then used this clf to identify the other task. Can the script above realise this goal? 
(2)My second is to test the significance of prediction accuracy. So I must fristly estimate the null-distribution using permutation testing for these prediction accuacy. However, the PyMVPA manual about this thesis is appropriate to estimate the null-distribution under those conditions that there is only one task and train a clf on some runs and then to use this trained clf to predict the remaining run. However, now I have two independent tasks and two independent data. So I can not use the script in the manual directly. However, I am new  to machine learning and do not know how to write the appropriate scirpt. How should I modify and refresh my script to build the null-distribution using permutation testing for the current prediction accuracy? Any advise should be appreciated!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20140826/1107ec90/attachment-0001.html>