[pymvpa] Appending datasets

Shane Hoversten shanusmagnus at gmail.com
Wed Oct 1 03:15:06 UTC 2014


Hi -

I have some code that creates a dataset from a bunch of ornate processing
(figuring out which volumes to censor based on subject performance, subject
motion, other scanning params; creating aggregate event types for certain
events; etc.)  Dataset creation has been done, to this point, per-session:
subjects were scanned twice, and each session is processed separately; and
all that ornate stuff that is done will be different from session to
session.

Now I want to aggregate these two sessions and do MVPA things after
throwing all the data into the hopper.  Specifically, instead of using
NFoldPartitioner on a day's worth of runs (there are 4 runs per day) and
leaving one run out, I want to run it on the combined runs from both days
(8 runs) and leave two out.

PyMVPA is awesome and so it would be clear how to do this if all the data
were aggregated together; but as I mentioned, to get the dataset in the
right format I have to do a bunch of processing, and it would be a pain to
combine all the various files to make this into one single aggregate set.
What I'd rather do is just glue together two separate datasets, which have
already been processed in the ways they require, s.t. the new dataset just
had the samples, targets, and associated attributes from the second
session's dataset glued onto the first session's dataset.

The ds.samples variable reports as being a numpy.ndarray, so I figured I
could just stuff them together with array operations, for instance:

combined_ds = np.append(ds_1.samples, ds_2.samples)

and so on for the targets, sample attributes, etc.  But nope, this gets
screwed up immediately:

In [*57*]: ds_1 = m.MVPAMaster("tp101", 1, "dc",
"new_temporal_tp101_day1.nii")

In [*58*]: len(ds_1.ds)

Out[*58*]: 290

In [*57*]: ds_2 = m.MVPAMaster("tp101", 2, "dc",
"new_temporal_tp101_day2.nii")

In [*58*]: len(ds_2.ds)

Out[*58*]: 290

In [*64*]: combined = np.append(ds_1.ds.samples, ds_2.ds.samples)

In [*65*]: len(combined)

Out[*65*]: 16960360

I'm thinking this is a mapper unrolling everything behind the scenes,
maybe?  I could beat my ahead against this for a while, but I figured first
I'd ask and see if there's a straightforward method to extending a dataset
in this fashion?

Thanks,

Shane
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20140930/d4a24e10/attachment.html>


More information about the Pkg-ExpPsy-PyMVPA mailing list