[pymvpa] Dataset with multidimensional feature vector per voxel
Ulrike Kuhl
kuhl at cbs.mpg.de
Mon Nov 9 20:42:20 UTC 2015
Hello again!
I'm still stuck at my analysis using multiple diffusion-derived-parameter-values for the same voxel.
Thanks to your help, I think I've got the dataset together, but I run into an error when attempting the final searchlight approach.
This is what I've got by now for setting up the dataset (defined as a function):
def ds_setup ( sub_list, param_list, data_path ):
"Setting up the dataset, loading data etc."
# make space:
dss = []
modality_all = []
modality_idx_all = []
voxel_idx_all = []
subject_all = []
targets_all = []
chunks_all = []
for sub_index, sub in enumerate(sub_list):
if sub.startswith('sub1'):
learn = 1
elif sub.startswith('sub0'):
learn = 0
else:
raise ValueError("Do not know what to do with %s" % sub)
# dss_subj will contain datasets for different features for the same subject
ds_param = []
# idea: collect the feature attributes and glue it all together in the end
modality_param = []
modality_idx_param = []
subject_param = []
targets_param = []
chunks_param = []
voxel_idx_param = []
for suf_index, suf in enumerate(param_list):
ds = fmri_dataset(data_path + '/%s/%s_%s.nii.gz' % (sub,sub,suf))
ds.fa['modality'] = [suf] # for each feature, set the modality attribute
ds.fa['modality_index'] = [suf_index] # this as numeric might come handy for searchlights later
ds.fa['targets'] = [learn]
ds.fa['chunks'] = [sub_index]
modality_param.extend(ds.fa['modality'].value)
modality_idx_param.extend(ds.fa['modality_index'].value)
subject_param.extend(ds.fa['subject'].value)
targets_param.extend(ds.fa['targets'].value)
chunks_param.extend(ds.fa['chunks'].value)
voxel_idx_param.extend(ds.fa['voxel_indices'].value)
ds_param.append(ds)
ds_subj = hstack(ds_param)
# collect future feature attributes:
modality_all.append(modality_param)
modality_idx_all.append(modality_idx_param)
voxel_idx_all.append(voxel_idx_param)
# collect future sample attributes
targets_all.append(learn)
chunks_all.append(sub_index)
subject_all.append(sub)
dss.append(ds_subj)
dsall = vstack(dss)
# create the actual dataset
DS = Dataset(dsall)
DS.fa['modality'] = modality_all[0]
DS.fa['modality_idx'] = modality_idx_all[0]
DS.fa['voxel_indices'] = voxel_idx_all[0]
DS.sa['targets'] = targets_all
DS.sa['chunks'] = chunks_all
DS.sa['subject'] = subject_all
return DS;
##################################
This gives me the following: DS is a dataset with dimensions 'no_subjects' * ('no_voxels' * 'no_parameters'). Importantly, each feature has the corresponding voxel indices associated with it - since the different parameters are just concatenated, the sequence of these indice-arrays repeats for each parameter.
I've simulated a little set of perfect two group toy data (= 100% SNR) and fed the corresponding dataset into an SVM leave-one-out-cross-validation like this:
clf = LinearCSVMC()
cvte = CrossValidation(clf, NFoldPartitioner(),
... errorfx=lambda p, t: np.mean(p == t),
... enable_ca=['stats'])
cv_results = cvte(DS)
The result is indeed perfect - classification performance is 100%. :-) So far, so good.
However, when I want to use this dataset for a searchlight approach, I run into an error. Here's the code:
sl = sphere_searchlight(cvte, radius=3, postproc=mean_sample())
res = sl(DS)
print res_clean
ValueError: IndexQueryEngine has insufficient information about the dataset spaces. It is required to specify an ROI generator for each feature space in the dataset (got: ['voxel_indices'], #describable: 125000, #actual features: 375000).
So the code picks up on the fact that there are multiple instances of the same voxel-index-array within one sample (three parameters = three times the same index-array). I wrongly expected that it would take all values with the corresponding indices into account for MVPA.
How do I "glue" these multiple values per sample for the same voxel index-array together such that the searchlight algorithm works? Am I on the right track at all or did I get carried away?
Thanks so much for your help!
Best,
Ulrike
----- Original Message -----
From: "Yaroslav Halchenko" <debian at onerussian.com>
To: "pkg-exppsy-pymvpa" <pkg-exppsy-pymvpa at lists.alioth.debian.org>
Sent: Thursday, 5 November, 2015 15:52:55
Subject: Re: [pymvpa] Dataset with multidimensional feature vector per voxel
On Thu, 05 Nov 2015, Ulrike Kuhl wrote:
> Thanks again for your super quick and super helpful reply!
> There is still one thing that I don't quite get at the moment:
> As expected, 'dsall' is a numpy.ndarray of dimensions 1*(numberVoxels*numberSubjects*numberParameters) - so just a really long vector.
> Since this is 'only' a vector, it does not have any attributes or other information associated with it. This information is still contained in 'dss', which is a list of size (numberSubjects*numberParameters) containing the respective datasets.
> I don't see right now how to bridge this information to the 'dsall'-array. After all, 'sphere_searchlight()' should also be called with a dataset, using this dataset's coordinates to determine the local neighborhoods, right?
> So I guess I still need the 'glue' to get it all together... ;-)
> Can you give me a hint on that?
my bad -- I overlooked this by subject construct... you need to vstack those
across subjects while hstacking features within subject
sub_list = [list of subjects]
param_list = [list of parameters]
dss = []
for sub_index, sub in enumerate(sub_list):
# yoh: will contain datasets for different features for the same subject
dss_subj = []
for suf_index, suf in enumerate(param_list):
ds = fmri_dataset('/path/to/image/file',mask='/path/to/mask/file')
# yoh: no need to broadcast -- should do automagically
ds.fa['modality'] = [suf] # for each feature, set the modality attribute
ds.fa['modality_index'] = [suf_index] # this as numeric might come handy for searchlights later
dss_subj.append(ds)
ds_subj = hstack(dss_subj)
if sub.startswith('L'):
learn = 1
elif sub.startswith('N'):
learn = 0
else:
# yoh: make it explicit, otherwise you would assign previous learn to next one not L or N
raise ValueError("Do not know what to do with %s" % sub)
# yoh: seems it was incorrectly indented into elif
ds_subj.sa['targets'] = [learn]
ds_subj.sa['chunks'] = [sub_index]
dss.append(ds_subj)
dsall = vstack(dss)
####################################
> Cheers,
> Ulrike
> ----- Original Message -----
> From: "Yaroslav Halchenko" <debian at onerussian.com>
> To: "pkg-exppsy-pymvpa" <pkg-exppsy-pymvpa at lists.alioth.debian.org>
> Sent: Thursday, 5 November, 2015 15:14:24
> Subject: Re: [pymvpa] Dataset with multidimensional feature vector per voxel
> On Thu, 05 Nov 2015, Ulrike Kuhl wrote:
> > Dear Yaroslav,
> > thanks a lot for your reply.
> > With your snipped it was really easy for me to set up the matrix as you described it. :-)
> > For those interested, this is the code that works for me:
> > sub_list = [list of subjects]
> > param_list = [list of parameters]
> > dss = []
> > for sub_index, sub in enumerate(sub_list):
> > for suf_index, suf in enumerate(param_list):
> > ds = fmri_dataset('/path/to/image/file',mask='/path/to/mask/file')
> > ds.fa['modality'] = [suf] * ds.nfeatures # for each feature, set the modality attribute
> > ds.fa['modality_index'] = [suf_index] * ds.nfeatures # this as numeric might come handy for searchlights later
> > if sub.startswith('L'):
> > learn = 1
> > elif sub.startswith('N'):
> > learn = 0
> > ds.sa['targets'] = [learn]
> > ds.sa['chunks'] = [sub_index]
> > dss.append(ds)
> > dsall = hstack(dss)
> Here is my tune up with # yoh: comments
> sub_list = [list of subjects]
> param_list = [list of parameters]
> dss = []
> for sub_index, sub in enumerate(sub_list):
> for suf_index, suf in enumerate(param_list):
> ds = fmri_dataset('/path/to/image/file',mask='/path/to/mask/file')
> # yoh: no need to broadcast -- should do automagically
> ds.fa['modality'] = [suf] # for each feature, set the modality attribute
> ds.fa['modality_index'] = [suf_index] # this as numeric might come handy for searchlights later
> if sub.startswith('L'):
> learn = 1
> elif sub.startswith('N'):
> learn = 0
> else:
> # yoh: make it explicit, otherwise you would assign previous learn to next one not L or N
> raise ValueError("Do not know what to do with %s" % sub)
> # yoh: seems it was incorrectly indented into elif
> ds.sa['targets'] = [learn]
> ds.sa['chunks'] = [sub_index]
> dss.append(ds)
> dsall = hstack(dss)
> cheers
--
Yaroslav O. Halchenko
Center for Open Neuroscience http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419
WWW: http://www.linkedin.com/in/yarik
_______________________________________________
Pkg-ExpPsy-PyMVPA mailing list
Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
--
Max Planck Institute for Human Cognitive and Brain Sciences
Department of Neuropsychology (A219)
Stephanstraße 1a
04103 Leipzig
Phone: +49 (0) 341 9940 2625
Mail: kuhl at cbs.mpg.de
Internet: http://www.cbs.mpg.de/staff/kuhl-12160
More information about the Pkg-ExpPsy-PyMVPA
mailing list