[pymvpa] clf generalization over a different set of classes
Matthias Ekman
Matthias.Ekman at nf.mpg.de
Tue Aug 4 11:21:02 UTC 2009
Hi,
I just want to ckeck if a classifier trained on two classes A & B could
also discriminate between two different classes C & D.
e.g. "is a classifier trained on face vs house samples also able to make
correct predictions for cat vs. scissors?"
here is my code:
from mvpa.suite import *
attr = SampleAttributes(os.path.join(pymvpa_dataroot, 'attributes.txt'))
ds = NiftiDataset(samples=os.path.join(pymvpa_dataroot, 'bold.nii.gz'),
labels=attr.labels,
chunks=attr.chunks,
mask=os.path.join(pymvpa_dataroot, 'mask.nii.gz'))
detrend(ds, perchunk=True, model='regress', polyord=2)
zscore(ds, perchunk=True, targetdtype='float64', baselinelabels=[0])
# train ds: face vs house
ds1 = ds.select(labels=[1,2])
# predict ds: cat vs scissors
ds2 = ds.select(labels=[4,5])
# no of samples in each ds
s_ds1 = len(ds1.samples)
s_ds2 = len(ds2.samples)
# check for equal no. of samples
if s_ds1 < s_ds2:
ds2 = ds2.getRandomSamples(s_ds1/2)
elif s_ds1 > s_ds2:
ds1 = ds1.getRandomSamples(s_ds2/2)
elif s_ds1 == s_ds2:
print ' > equal no. of samples'
# just check mean of of two classes for each ds
for i in ds1.uniquelabels:
ds_tmp = ds1.select(labels=[i])
print ' > mean of ds1 with label', i, N.mean(ds_tmp.samples)
for i in ds2.uniquelabels:
ds_tmp = ds2.select(labels=[i])
print ' > mean of ds2 with label', i, N.mean(ds_tmp.samples)
tmp_labels = ds2.labels.copy()
# fake labels
new_labels=tmp_labels.copy()
for l in xrange(tmp_labels.shape[0]):
if tmp_labels[l] == 4.0:
new_labels[l]=2.0
elif tmp_labels[l] == 5.0:
new_labels[l]=1.0
# assign new labels to predict ds
ds2.labels = new_labels
#setup clf
clf = LinearNuSVMC(nu=0.5, probability=0)
# setup validation procedure
terr = TransferError(clf)
terr.states.enable('confusion')
terr(ds1, ds1)
error = terr(ds1, ds2)
print terr.confusion
This shows an accuracy of about 40 % ... however changing the assignment
of the faked labels to:
for l in xrange(tmp_labels.shape[0]):
if tmp_labels[l] == 4.0:
new_labels[l]=1.0 # instead of =2.0
elif tmp_labels[l] == 5.0:
new_labels[l]=2.0 # instead of=1.0
leads to an accuracy of about 60 %. I am not sure if this makes sense,
or if I missed something here. For making a statement like "clf trained
on label A/B is also able to discriminate between classes C/D" would you
suggest to run both analysis above and calculate the mean of both
prediction errors... or how is this "usually" done?
Best regards,
Matthias
More information about the Pkg-ExpPsy-PyMVPA
mailing list