[pymvpa] Dataset with multidimensional feature vector per voxel
Yaroslav Halchenko
debian at onerussian.com
Mon Nov 9 21:26:45 UTC 2015
Few fixs up lister below inline in code
On Mon, 09 Nov 2015, Ulrike Kuhl wrote:
> Hello again!
> I'm still stuck at my analysis using multiple diffusion-derived-parameter-values for the same voxel.
> Thanks to your help, I think I've got the dataset together, but I run into an error when attempting the final searchlight approach.
> This is what I've got by now for setting up the dataset (defined as a function):
> def ds_setup ( sub_list, param_list, data_path ):
> "Setting up the dataset, loading data etc."
> # make space:
> dss = []
> modality_all = []
> modality_idx_all = []
> voxel_idx_all = []
> subject_all = []
> targets_all = []
> chunks_all = []
> for sub_index, sub in enumerate(sub_list):
> if sub.startswith('sub1'):
> learn = 1
> elif sub.startswith('sub0'):
> learn = 0
> else:
> raise ValueError("Do not know what to do with %s" % sub)
> # dss_subj will contain datasets for different features for the same subject
> ds_param = []
> # idea: collect the feature attributes and glue it all together in the end
> modality_param = []
> modality_idx_param = []
> subject_param = []
> targets_param = []
> chunks_param = []
> voxel_idx_param = []
> for suf_index, suf in enumerate(param_list):
> ds = fmri_dataset(data_path + '/%s/%s_%s.nii.gz' % (sub,sub,suf))
wasn't there some kind of a mask?
> ds.fa['modality'] = [suf] # for each feature, set the modality attribute
> ds.fa['modality_index'] = [suf_index] # this as numeric might come handy for searchlights later
> ds.fa['targets'] = [learn]
> ds.fa['chunks'] = [sub_index]
those targets and chunks should go to '.sa' (samples attributes)
not '.fa' feature attributes... you don't need all those manually
constructed _param lists -- everything should be simply contained in the
dataset(s). It is somewhat important since then necessary checks would
be done during hstack/vstacking those datasets and if some fa or sa
values don't match (i.e you don't have exactly aligning features or
samples) -- those fa/sa would be dropped... is that what you have
observed may be?
> modality_param.extend(ds.fa['modality'].value)
> modality_idx_param.extend(ds.fa['modality_index'].value)
> subject_param.extend(ds.fa['subject'].value)
> targets_param.extend(ds.fa['targets'].value)
> chunks_param.extend(ds.fa['chunks'].value)
> voxel_idx_param.extend(ds.fa['voxel_indices'].value)
> ds_param.append(ds)
> ds_subj = hstack(ds_param)
> # collect future feature attributes:
> modality_all.append(modality_param)
> modality_idx_all.append(modality_idx_param)
> voxel_idx_all.append(voxel_idx_param)
> # collect future sample attributes
> targets_all.append(learn)
> chunks_all.append(sub_index)
> subject_all.append(sub)
> dss.append(ds_subj)
> dsall = vstack(dss)
> # create the actual dataset
> DS = Dataset(dsall)
> DS.fa['modality'] = modality_all[0]
> DS.fa['modality_idx'] = modality_idx_all[0]
> DS.fa['voxel_indices'] = voxel_idx_all[0]
> DS.sa['targets'] = targets_all
> DS.sa['chunks'] = chunks_all
> DS.sa['subject'] = subject_all
> return DS;
> This gives me the following: DS is a dataset with dimensions 'no_subjects' * ('no_voxels' * 'no_parameters'). Importantly, each feature has the corresponding voxel indices associated with it - since the different parameters are just concatenated, the sequence of these indice-arrays repeats for each parameter.
> I've simulated a little set of perfect two group toy data (= 100% SNR) and fed the corresponding dataset into an SVM leave-one-out-cross-validation like this:
> clf = LinearCSVMC()
> cvte = CrossValidation(clf, NFoldPartitioner(),
you could say explicitly here what you would like to cross-validate
over -- makes code easier to read (and then you don't need to define
overgeneric 'chunks'):
NFoldPartitioner(attr='subject')
> ... errorfx=lambda p, t: np.mean(p == t),
> ... enable_ca=['stats'])
> cv_results = cvte(DS)
> The result is indeed perfect - classification performance is 100%. :-) So far, so good.
> However, when I want to use this dataset for a searchlight approach, I run into an error. Here's the code:
> sl = sphere_searchlight(cvte, radius=3, postproc=mean_sample())
> res = sl(DS)
> print res_clean
> ValueError: IndexQueryEngine has insufficient information about the dataset spaces. It is required to specify an ROI generator for each feature space in the dataset (got: ['voxel_indices'], #describable: 125000, #actual features: 375000).
> So the code picks up on the fact that there are multiple instances of the same voxel-index-array within one sample (three parameters = three times the same index-array). I wrongly expected that it would take all values with the corresponding indices into account for MVPA.
> How do I "glue" these multiple values per sample for the same voxel index-array together such that the searchlight algorithm works? Am I on the right track at all or did I get carried away?
that is why I defined that 'modality_index' ... Let me explain what is
happening briefly in hope that it makes sense ;-) -- our "search for the
neighborhood" by default relies only on .fa.voxel_indices and has an
assumption that those are somewhat unique. Now that you have 3
different values per each physical voxel it gets confused. So we just
need to tell it more about what is the neighborhood really is and not
rely on sphere_searchlight helper
sl = Searchlight(
cvte,
IndexQueryEngine(
voxel_indices=Sphere(3), # that is pretty much what sphere_searchlight(radius=3) does
modality_index=Sphere(10)), # 10 is just big enough to cover all 3 values you have
postproc=mean_sample())
if you run this sl(DS) it will work BUT it would pretty much do the same
analysis 3 times. So we could restrict it to search for
"modality_index" neighbors only when we are looking at
modality_index==0.
sl = Searchlight(
cvte,
IndexQueryEngine( # so we look for neighbors in both space and across modality_index
voxel_indices=Sphere(3), # that is pretty much what sphere_searchlight(radius=3) does
modality_index=Sphere(10)), # 10 is just big enough to cover all 3 values you have
postproc=mean_sample(),
roi_ids=np.where(DS.fa.modality_index == 0)[0]
)
This should produce results of necessary "1 brain" dimensionality ;)
I know -- this all is a bit of unnatural. Primarily such functionality
was created to carry out "hybrid" spatial-temporal searchlighting where
you could explore both space and time. But it seems that such use-case
becomes popular enough that we better implement it without requiring so
much of code/magic ;)
Let me know how it goes
--
Yaroslav O. Halchenko
Center for Open Neuroscience http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419
WWW: http://www.linkedin.com/in/yarik
More information about the Pkg-ExpPsy-PyMVPA
mailing list