[pymvpa] Train and test on different classes from a ds

Charlotte Murphy cem552 at york.ac.uk
Thu Nov 20 10:27:47 UTC 2014


Dear all,

I am trying to train a classifier on classes A vs. B and test on classes C
vs. D from the same dataset. My dataset consists of 4 conditions (A, B, C,
D) and 4 chunks. I wish to use a cross-validation to approach to train the
classifier on A vs. B for chunks 1-3 and test on C vs.D for chunk 4 and so
on. I found some emails in the archive that have addressed this issue (
http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/2013q1/002369.html),
but I am unsure if I am implementing them correctly in my current analysis.

The attached script appears to work and provides sensible outputs. However,
to ensure the script is partitioning the files correctly I attempted to do
a sanity check by modifying the same script to do three things:
1)  Train on (A vs. B) and test on (A vs. B) - sanity check
2)  Train on (C vs. D) and test on (C vs. D) - sanity check
3)  Train on (A vs. B) and test on (C vs. D) - classifier of interest

The part of the script I modified is the chainnode (highlighted below).
For (1) I ran
  MyFilter((train, train),
  partitioner.get_space(),
   'targets').

For (2) I ran
MyFilter((test, test),
  partitioner.get_space(),
   'targets').

For (3) I ran
MyFilter((train, test),
  partitioner.get_space(),
   'targets').

However, I get identical outputs for all three scripts. Could any point me
in the right direction. I'm using pyMVPA 2.

Cheers,
Charlotte


##################I have attached the relevant part of my script below
################

train = ['A', 'B']
test = ['C', 'D']

labelsList = train + test

# Load in dataset
ds = fmri_dataset(samples = os.path.join(path, infile), mask =
os.path.join(mask_path, mask_file), chunks = sa['chunks'], targets =
sa['targets'])

# preprocessing
(...)

# Selecting labels
ds = ds[np.array([l in labelsList for l in ds.sa.targets], dtype="bool")]

print ds.sa.targets
print train
print test
print labelsList

partitioner = NFoldPartitioner(cvtype=1)

for p in partitioner.generate(ds):
    # convert to string to let it align in the output
    print 'PA', p.sa.partitions.astype('str')


class MyFilter(Node):

    def __init__(self, target_groups, part_attr, target_attr,
                 space='filtered_partitions', **kwargs):
        self._target_groups = target_groups
        self._target_attr = target_attr
        self._part_attr = part_attr
        Node.__init__(self, space=space, **kwargs)

    def generate(self, ds):

        # binary mask for training and testing portion
        train_part = ds.sa[self._part_attr].value == 1
        test_part = ds.sa[self._part_attr].value == 2

        # binary mask for the first and second target group
        match_1st_group = [t in self._target_groups[0] for t in
ds.sa[self._target_attr].value]
        match_2nd_group = [t in self._target_groups[1] for t in
ds.sa[self._target_attr].value]

        # removed group1 in the training set and group2 in the testing set
        # as I'm only interested in the variant below

        # in the second to-be-returned dataset we will blank out
        # group2 in the training set and group1 in the testing set
        new_part = ds.sa[self._part_attr].value.copy()
        new_part[np.logical_and(train_part, match_2nd_group)] = 0
        new_part[np.logical_and(test_part, match_1st_group)] = 0
        ds.sa[self.get_space()] = new_part
        yield ds

*chain = ChainNode([partitioner,*
*                    MyFilter((train, test),  *
*                            partitioner.get_space(),*
*                            'targets')*
*                  ])*

for p in chain.generate(ds):
    # convert to string to let it align in the output
    print 'PA', p.sa.filtered_partitions.astype('str')

# Set up classifier
clf = LinearCSVMC(C=1)

cv = CrossValidation(clf, partitioner, errorfx = mean_match_accuracy,
enable_ca=["stats"])

# Set up searchlight
sl = sphere_searchlight(cv, radius= 3,
                             postproc=mean_sample(), space='voxel_indices')

#Results of searchlight analysis
sl_results = sl(ds)


-- 
Charlotte Murphy
PhD Student
Department of Psychology
University of York,
Heslington, York, YO10 5DD, UK
Email: cem552 at york.ac.uk <fjmr500 at york.ac.uk>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20141120/ec9fe05e/attachment.html>


More information about the Pkg-ExpPsy-PyMVPA mailing list