[pymvpa] Concatenate Nifti data sets

Fri Mar 3 13:10:23 UTC 2017

> On 3 Mar 2017, at 07:20, Anaelia Ovalle <anaeliaovalle at gmail.com> wrote:
> 
> Hi, this is my first time working with PyMVPA and I'm looking to build one fmri dataset from 10 individual fmri runs. I was wondering how I would go about this? As a brute force approach, I was thinking to import the first run, then looping through the rest of the runs and appending the data. However, how would I go about adding the attribute targets and chunks to it?? Below is an attempt to do this....
> 
> Any help would be much appreciated! 
> 
> ************************************************
> def loadData():
>     
>     data_path = "/Users/..."
>     
>     #start with run1 and append runs 2-10
>     
>     bold_fname = os.path.join(data_path, 'run1_mask+orig.nii.gz')
> 
>     #upload BOLD data with its mask
>     ds = fmri_dataset(samples = bold_fname)
> 
>     #iterate through runs 2-10 and append each fmri dataset to ds
>     for i in range(2,11):
>         path = 'run'+str(i)+'_mask+orig.nii.gz'
>         bold_fname = os.path.join(data_path, path)
>         sub = fmri_dataset(samples = bold_fname)
> 
>         ds = vstack((ds,sub))
> 
>     
>     #add the attribute targets/chunks to the data
>     ds.sa.targets = attr.targets
>     ds.sa.chunks = np.array(attr.chunks)

In fMRI, chunks are usually assigned so that the i-th run has chunk value i. So you could set .sa.chunks to a value between 1 and 10 depending on the run. 

For .sa.targets, it would be useful if you told a bit more about the contents of the datasets corresponding to the fMRI runs. Is this timecourse data? Or data from a GLM? Beta estimates, t-statistics, something else? Can your experimental design be described by a limited number of conditions (e.g. different trials with presentation of pictures of fluffy cats or dogs dancing on a rainbow) or not (e.g. continuous movie data about fluffy pets dancing)?

As a side note, your approach of stacking should work, although it's not very efficient in terms of memory allocation. Better is to collect the datasets for each run in a list, and then use a single stack operation on the list.