[pymvpa] additional data shuffling/cleaning after loading up data using fmri_dataset

Sat Sep 9 12:03:55 UTC 2017

> On 9 Sep 2017, at 06:28, Regina Lapate <lapate at gmail.com> wrote:
> 
> --Can one do regular python operations such as shuffling trials (or excluding trials with extreme outlier values) after loading up a dataset (nifti & targets) using mvpa2.datasets.mri.fmri_dataset?
> 
> I assumed so, but upon trying to shuffle trials of a given condition using numpy:
> np.random.shuffle(ds[ds.targets==1])

Shuffling can mean at least two things in this context:

1) randomly re-order the order of the samples and the associated sample attributes; this can be achieved by simple indexing. For example, if a dataset ds has 4 samples, then ds[[3,2,1,0]]  would reverse the order of the samples and the associated sample attributes in .sa. Also, ds[[0,2]] would select the first and third sample.
2) randomly change condition labels (targets), for example to generate a null distribution. AttributePermutator in mvpa2.generators.permutation may be helpful for this.

Which one applies to your question?

For the second option: My personal preferred strategy would be to split the dataset by unique chunks, then randomly re-assign targets for each sub-dataset, and then stack these sub-datasets back into a big dataset. This seems better than the 'simple' strategy - at least in an fMRI context - because that can break independence assumptions. 
However I did not find this option available (using strategy='chunks' gave an error). Maybe I missed it - or if not, we may consider adding it.