[pymvpa] additional data shuffling/cleaning after loading up data using fmri_dataset
Nick Oosterhof
n.n.oosterhof at googlemail.com
Sat Sep 9 12:03:55 UTC 2017
> On 9 Sep 2017, at 06:28, Regina Lapate <lapate at gmail.com> wrote:
>
> --Can one do regular python operations such as shuffling trials (or excluding trials with extreme outlier values) after loading up a dataset (nifti & targets) using mvpa2.datasets.mri.fmri_dataset?
>
> I assumed so, but upon trying to shuffle trials of a given condition using numpy:
> np.random.shuffle(ds[ds.targets==1])
Shuffling can mean at least two things in this context:
1) randomly re-order the order of the samples and the associated sample attributes; this can be achieved by simple indexing. For example, if a dataset ds has 4 samples, then ds[[3,2,1,0]] would reverse the order of the samples and the associated sample attributes in .sa. Also, ds[[0,2]] would select the first and third sample.
2) randomly change condition labels (targets), for example to generate a null distribution. AttributePermutator in mvpa2.generators.permutation may be helpful for this.
Which one applies to your question?
For the second option: My personal preferred strategy would be to split the dataset by unique chunks, then randomly re-assign targets for each sub-dataset, and then stack these sub-datasets back into a big dataset. This seems better than the 'simple' strategy - at least in an fMRI context - because that can break independence assumptions.
However I did not find this option available (using strategy='chunks' gave an error). Maybe I missed it - or if not, we may consider adding it.
More information about the Pkg-ExpPsy-PyMVPA
mailing list