[pymvpa] Chunks for structural analysis

Yaroslav Halchenko debian at onerussian.com
Tue Dec 16 14:19:16 UTC 2014


On Tue, 16 Dec 2014, Nick Oosterhof wrote:


> On 16 Dec 2014, at 14:37, Thomas Nickson <thomas.nickson at gmail.com> wrote:

> > The chunks are supposed to define sets that are independent but I'm not sure how to encode this. If I'm doing structural all of my scans are independent of each other right? So my chunks should just be an array of distinct numbers of the length of the number of subjects I have?

> Indeed.

> > Or is independence the same as the diagnosis groupings?

> If you have different groups (e.g. patients and healthy controls) and you want to see if you can discriminate between these groups, you can encode this in the .sa.targets attribute.

The trick here though is still to "keep balance" since in most of the
scenarios you might have relatively small number of samples so you would
still like to have balanced groups (e.g. patients and controls) in
training and testing to avoid possible "training winner takes it
all" situation.

Here e.g. the possible partitioning beast which would do smth like that
if you assign a different chunk to every sample (since they are
independent)

mvpa2.seed(1)  # reproducible balancing
partitioner = ChainNode([NFoldPartitioner(cvtype=2),
                         Sifter([('partitions', 2),
                                 ('targets',
                                   dict(uvalues=['patient', 'control'],
                                        balanced=True))]),
                         Balancer(attr='targets',
                                  count=1, # can set > 1 if you dont have "enough"
                                  limit='partitions',
                                  apply_selection=True
                                 )],
                            space='partitions')

it would grab out all possible pairs of patient/controls, assure that they are
balanced (have 1 patient, 1 control), balances out number of samples across
groups through subselection (if you have e.g. more controls than patients;
otherwise -- remove Balancer)

related issue  which I hope one of us will tackle soon or you are welcome to
contribute is https://github.com/PyMVPA/PyMVPA/issues/261 which should provide
"merged" NFoldPartitioner + Sifter  to avoid current inefficiency of the setup,
and cumbersome construct ;)

-- 
Yaroslav O. Halchenko, Ph.D.
http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org
Research Scientist,            Psychological and Brain Sciences Dept.
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik        



More information about the Pkg-ExpPsy-PyMVPA mailing list