[pymvpa] Dataset with multidimensional feature vector per voxel

Mon Nov 9 20:42:20 UTC 2015

Hello again!

I'm still stuck at my analysis using multiple diffusion-derived-parameter-values for the same voxel.
Thanks to your help, I think I've got the dataset together, but I run into an error when attempting the final searchlight approach.

This is what I've got by now for setting up the dataset (defined as a function):

def ds_setup ( sub_list, param_list, data_path ):
    "Setting up the dataset, loading data etc."

    # make space:
    dss = []
    modality_all = []
    modality_idx_all = []
    voxel_idx_all = []
    subject_all = []
    targets_all = []
    chunks_all = []

    for sub_index, sub in enumerate(sub_list):
        if sub.startswith('sub1'):
            learn = 1
        elif sub.startswith('sub0'):
            learn = 0
        else:
            raise ValueError("Do not know what to do with %s" % sub)

        # dss_subj will contain datasets for different features for the same subject
        ds_param = []
        # idea: collect the feature attributes and glue it all together in the end
        modality_param = []
        modality_idx_param = []
        subject_param = []
        targets_param = []
        chunks_param = []
        voxel_idx_param = []

        for suf_index, suf in enumerate(param_list):
            ds = fmri_dataset(data_path + '/%s/%s_%s.nii.gz' % (sub,sub,suf))
            ds.fa['modality'] = [suf]             # for each feature, set the modality attribute
            ds.fa['modality_index'] = [suf_index] # this as numeric might come handy for searchlights later
            ds.fa['targets'] = [learn]
            ds.fa['chunks'] = [sub_index]
            modality_param.extend(ds.fa['modality'].value)
            modality_idx_param.extend(ds.fa['modality_index'].value)
            subject_param.extend(ds.fa['subject'].value)
            targets_param.extend(ds.fa['targets'].value)
            chunks_param.extend(ds.fa['chunks'].value)
            voxel_idx_param.extend(ds.fa['voxel_indices'].value)
            ds_param.append(ds)

        ds_subj = hstack(ds_param)
        # collect future feature attributes:
        modality_all.append(modality_param)
        modality_idx_all.append(modality_idx_param)
        voxel_idx_all.append(voxel_idx_param)

        # collect future sample attributes
        targets_all.append(learn)
        chunks_all.append(sub_index)
        subject_all.append(sub)

        dss.append(ds_subj)

    dsall = vstack(dss)

    # create the actual dataset
    DS = Dataset(dsall)
    DS.fa['modality'] = modality_all[0]
    DS.fa['modality_idx'] = modality_idx_all[0]
    DS.fa['voxel_indices'] = voxel_idx_all[0]
    DS.sa['targets'] = targets_all
    DS.sa['chunks'] = chunks_all
    DS.sa['subject'] = subject_all

    return DS;

##################################

This gives me the following: DS is a dataset with dimensions 'no_subjects' * ('no_voxels' * 'no_parameters'). Importantly, each feature has the corresponding voxel indices associated with it - since the different parameters are just concatenated, the sequence of these indice-arrays repeats for each parameter. 

I've simulated a little set of perfect two group toy data (= 100% SNR) and fed the corresponding dataset into an SVM leave-one-out-cross-validation like this:

clf = LinearCSVMC()
cvte = CrossValidation(clf, NFoldPartitioner(),
...                        errorfx=lambda p, t: np.mean(p == t),
...                        enable_ca=['stats'])
cv_results = cvte(DS)

The result is indeed perfect - classification performance is 100%. :-) So far, so good. 

However, when I want to use this dataset for a searchlight approach, I run into an error. Here's the code:

sl = sphere_searchlight(cvte, radius=3, postproc=mean_sample())
res = sl(DS)
print res_clean

ValueError: IndexQueryEngine has insufficient information about the dataset spaces. It is required to specify an ROI generator for each feature space in the dataset (got: ['voxel_indices'], #describable: 125000, #actual features: 375000).

So the code picks up on the fact that there are multiple instances of the same voxel-index-array within one sample (three parameters = three times the same index-array). I wrongly expected that it would take all values with the corresponding indices into account for MVPA. 

How do I "glue" these multiple values per sample for the same voxel index-array together such that the searchlight algorithm works? Am I on the right track at all or did I get carried away?

Thanks so much for your help!

Best,
Ulrike

----- Original Message -----
From: "Yaroslav Halchenko" <debian at onerussian.com>
To: "pkg-exppsy-pymvpa" <pkg-exppsy-pymvpa at lists.alioth.debian.org>
Sent: Thursday, 5 November, 2015 15:52:55
Subject: Re: [pymvpa] Dataset with multidimensional feature vector per voxel

On Thu, 05 Nov 2015, Ulrike Kuhl wrote:

> Thanks again for your super quick and super helpful reply! 

> There is still one thing that I don't quite get at the moment: 

> As expected, 'dsall' is a numpy.ndarray of dimensions 1*(numberVoxels*numberSubjects*numberParameters) - so just a really long vector.
> Since this is 'only' a vector, it does not have any attributes or other information associated with it. This information is still contained in 'dss', which is a list of size (numberSubjects*numberParameters) containing the respective datasets.
> I don't see right now how to bridge this information to the 'dsall'-array. After all, 'sphere_searchlight()' should also be called with a dataset, using this dataset's coordinates to determine the local neighborhoods, right?
> So I guess I still need the 'glue' to get it all together... ;-)
> Can you give me a hint on that?

my bad -- I overlooked this by subject construct...  you need to vstack those
across subjects while hstacking features within subject

sub_list = [list of subjects]
param_list = [list of parameters]
dss = []

for sub_index, sub in enumerate(sub_list):
    # yoh: will contain datasets for different features for the same subject
    dss_subj = []
    for suf_index, suf in enumerate(param_list):
        ds = fmri_dataset('/path/to/image/file',mask='/path/to/mask/file')
        # yoh: no need to broadcast -- should do automagically
        ds.fa['modality'] = [suf]             # for each feature, set the modality attribute
        ds.fa['modality_index'] = [suf_index] # this as numeric might come handy for searchlights later
        dss_subj.append(ds)
    ds_subj = hstack(dss_subj) 
    if sub.startswith('L'):
        learn = 1
    elif sub.startswith('N'):
        learn = 0
    else:
        # yoh: make it explicit, otherwise you would assign previous learn to next one not L or N
        raise ValueError("Do not know what to do with %s" % sub)
    # yoh: seems it was incorrectly indented into elif
    ds_subj.sa['targets'] = [learn]
    ds_subj.sa['chunks'] = [sub_index]
    dss.append(ds_subj)
dsall = vstack(dss)

####################################

> Cheers,
> Ulrike

> ----- Original Message -----
> From: "Yaroslav Halchenko" <debian at onerussian.com>
> To: "pkg-exppsy-pymvpa" <pkg-exppsy-pymvpa at lists.alioth.debian.org>
> Sent: Thursday, 5 November, 2015 15:14:24
> Subject: Re: [pymvpa] Dataset with multidimensional feature vector per voxel

> On Thu, 05 Nov 2015, Ulrike Kuhl wrote:

> > Dear Yaroslav,

> > thanks a lot for your reply.

> > With your snipped it was really easy for me to set up the matrix as you described it. :-)
> > For those interested, this is the code that works for me:

> > sub_list = [list of subjects]
> > param_list = [list of parameters]
> > dss = []

> > for sub_index, sub in enumerate(sub_list):
> >     for suf_index, suf in enumerate(param_list):
> >         ds = fmri_dataset('/path/to/image/file',mask='/path/to/mask/file')
> >         ds.fa['modality'] = [suf] * ds.nfeatures             # for each feature, set the modality attribute
> >         ds.fa['modality_index'] = [suf_index] * ds.nfeatures # this as numeric might come handy for searchlights later
> >     if sub.startswith('L'):
> >         learn = 1
> >     elif sub.startswith('N'):
> >         learn = 0
> >         ds.sa['targets'] = [learn]                           
> >         ds.sa['chunks'] = [sub_index]
> >         dss.append(ds)
> > dsall = hstack(dss)

> Here is my tune up with # yoh: comments

> sub_list = [list of subjects]
> param_list = [list of parameters]
> dss = []

> for sub_index, sub in enumerate(sub_list):
>     for suf_index, suf in enumerate(param_list):
>         ds = fmri_dataset('/path/to/image/file',mask='/path/to/mask/file')
>         # yoh: no need to broadcast -- should do automagically
>         ds.fa['modality'] = [suf]             # for each feature, set the modality attribute
>         ds.fa['modality_index'] = [suf_index] # this as numeric might come handy for searchlights later
>     if sub.startswith('L'):
>         learn = 1
>     elif sub.startswith('N'):
>         learn = 0
>     else:
>         # yoh: make it explicit, otherwise you would assign previous learn to next one not L or N
>         raise ValueError("Do not know what to do with %s" % sub)
>     # yoh: seems it was incorrectly indented into elif
>     ds.sa['targets'] = [learn]
>     ds.sa['chunks'] = [sub_index]
>     dss.append(ds)
> dsall = hstack(dss)

> cheers
-- 
Yaroslav O. Halchenko
Center for Open Neuroscience     http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik        

_______________________________________________
Pkg-ExpPsy-PyMVPA mailing list
Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
-- 
Max Planck Institute for Human Cognitive and Brain Sciences 
Department of Neuropsychology (A219) 
Stephanstraße 1a 
04103 Leipzig 

Phone: +49 (0) 341 9940 2625 
Mail: kuhl at cbs.mpg.de 
Internet: http://www.cbs.mpg.de/staff/kuhl-12160