[pymvpa] Train and test on different classes from a dataset
Jan
j.derrfuss at donders.ru.nl
Thu Jan 31 11:29:49 UTC 2013
Michael,
Thank you very much for your help!!! The first step seems to give plausible
results (in the sense that the accuracy maps for training and testing on A/B,
and for training on A/B and testing on C/D look similar, but with the latter
giving somewhat lower accuracies - exactly what I would expect).
Now, I would like to adapt the permutation analysis accordingly. The first thing
I'm not sure about is
a) whether I should permute only C/D labels (such that the classifier is trained
on the real labels, but tested on permuted labels), or
b) whether I should permute only A/B labels, or
c) both.
In the code below, I tried to set up the permutator as described in a). What
happens is that it gives unnaturally good results (in the sense that the p
values are very low even for voxels with rather low accuracies...). So, there
seems to be something wrong with how I set up the permutation analysis. Any
suggestions on where the problem lies would be very welcome!
Best,
Jan
#### relevant code snippets ####
train = ['condA', 'condB']
test = ['condC', 'condD']
labelsList = train + test
labelsDict = {"targets" : ['condC', 'condD']}
ds = fmri_dataset(samples=os.path.join(path, infile),
targets=attr.targets,
chunks=attr.chunks,
mask=os.path.join(mask_path, mask_file))
# preprocessing
(...)
# Selecting labels
ds = ds[np.array([l in labelsList for l in ds.sa.targets], dtype="bool")]
partitioner = NFoldPartitioner(cvtype=1)
class MyFilter(Node):
def __init__(self, target_groups, part_attr, target_attr,
space='filtered_partitions', **kwargs):
self._target_groups = target_groups
self._target_attr = target_attr
self._part_attr = part_attr
Node.__init__(self, space=space, **kwargs)
def generate(self, ds):
# binary mask for training and testing portion
train_part = ds.sa[self._part_attr].value == 1
test_part = ds.sa[self._part_attr].value == 2
# binary mask for the first and second target group
match_1st_group = [t in self._target_groups[0] for t in
ds.sa[self._target_attr].value]
match_2nd_group = [t in self._target_groups[1] for t in
ds.sa[self._target_attr].value]
# removed group1 in the training set and group2 in the testing set
# as I'm only interested in the variant below
# in the second to-be-returned dataset we will blank out
# group2 in the training set and group1 in the testing set
new_part = ds.sa[self._part_attr].value.copy()
new_part[np.logical_and(train_part, match_2nd_group)] = 0
new_part[np.logical_and(test_part, match_1st_group)] = 0
ds.sa[self.get_space()] = new_part
yield ds
chain = ChainNode([partitioner,
MyFilter((train, test),
partitioner.get_space(),
'targets')
])
clf = LinearCSVMC(C=1)
cv = CrossValidation(clf, partitioner, errorfx = lambda p, t: np.mean(p==t),
enable_ca=["stats"])
sl = sphere_searchlight(cv, radius = 3, postproc = mean_sample(),
space='voxel_indices', nproc=n_cpus)
result = sl(ds) # result of searchlight analysis
sphere_acc = result.samples[0]
map2nifti(ds, sphere_acc).to_filename(<filename>)
n_permutations = 100
res_permutation = np.zeros((n_permutations, ds.nfeatures)) # zero-filled 2-D
matrix with permutations x features
for i in range(n_permutations): # run n permutations
permutator = AttributePermutator(attr='targets', count=n_permutations,
assure=True, limit=labelsDict)
ds_tmp = permutator(ds)
# run searchlight on dataset with shuffled class labels
result_p = sl(ds_tmp)
# add to permutation result array
res_permutation[i,:] = result_p.samples[0]
# permutation stats
res_permutation -= result.samples[0]
res_permutation[res_permutation>=0] = 1.
res_permutation[res_permutation!=1] = 0.
inv_p_values = 1 - ((np.sum(res_permutation, axis=0) + 1.) / (n_permutations +
1.))
# save p-value map
map2nifti(ds, p_values).to_filename(<filename>)
More information about the Pkg-ExpPsy-PyMVPA
mailing list