[pymvpa] Random choosing samples within each fold

Mon Dec 1 15:58:47 UTC 2014

I'm not sure if you can do this automatically, but I've done this manually
by creating 'pseudo-runs' and using that attribute as the basis for
partition.  This gives you arbitrary amounts of control -- for instance, I
wanted to make it so that each pseudo-run incorporated elements from every
real run; but if you really don't care about that, you could just generate
an array with the number of events, divide that by the number of samples
you want per fold, create the runs, then permute that array, assign it as a
sample attribute, and tell NFoldPartitioner to use that attribute for the
folds, ie:

In [*1*]: samples = 20

In [*2*]: folds = 4

In [*3*]: samples_per_fold = samples / folds

In [*7*]: pseudo_runs = [_ for _ in range(folds) for x in
range(samples_per_fold)]

In [*8*]: pseudo_runs

Out[*8*]: [0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3]

In [*11*]: random.shuffle(pseudo_runs)

In [*12*]: pseudo_runs

Out[*12*]: [2, 2, 3, 1, 3, 2, 0, 0, 0, 1, 1, 0, 0, 1, 3, 2, 3, 1, 2, 3]

then do my_dataset.sa['pseudo_runs'] = psuedo_runs, etc.  Note that some
people will object vehemently about this method of cross-validation
however, due to temporal correlation and general non-independence of the
samples.

S
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20141201/1e6b4582/attachment.html>