[pymvpa] cross-validation

Yaroslav Halchenko debian at onerussian.com
Tue Mar 16 15:06:34 UTC 2010


On Tue, 16 Mar 2010, J.A. Etzel wrote:
> - "balancing" the number of examples in the training data (i.e. so
> equal numbers of examples in each of the classes you are
> classifying) is critical, as Yaroslav already mentioned. My usual
> practice is to randomly subset the examples in the bigger class to
> achieve balance, then do cross-validation on the balanced data set.

just wondering, are you doing it manually or taking advantage of
nperlabel='equal', nrunspersplit=...
we have in splitters? ;)

> the number of tests that are performed rather drastically, though,
> and requires careful planning for significance testing (permutation
> tests should be performed using precisely the same structure as the
> true labeling).
yeap, so it seems to be logical procedure to swap labels across chunks
(runs), not just to permute labels within or across chunks.  I am yet to
demonstrate that in the tutorial... heh heh

> - The consensus seems to be that partitioning on the runs is
> "safest" for fMRI data, since it avoids potentially inflating the
> performance. The inflation can occur because volumes from within the
> same run will probably be more similar than volume from other runs,
> because of scanner drift, subject fatigue, subject movement,
> physiological changes, etc.
and run-order effects...

> But sometimes it's not very practical.
unless an experiment design had that in mind from the beginning
(whenever possible of cause) ;)

> What I do in this case is compare the performance of partitioning on
> the runs with the performance on random "runs". In other words, I
> permute the run labels within each class and perform partitioning on
> the runs again. If this procedure makes a difference on
> classification accuracy, I know that there is a problem, and look at
> the design, preprocessing, temporal compression, etc. But if
> performance with these random "runs" is about the same as
> partitioning on the actual runs I feel safer moving to a
> non-run-partitioning scheme.
that sounds like a good idea!
Now supported by PyMVPA (in development branch) -- coding in ... 
;-)
-- 
                                  .-.
=------------------------------   /v\  ----------------------------=
Keep in touch                    // \\     (yoh@|www.)onerussian.com
Yaroslav Halchenko              /(   )\               ICQ#: 60653192
                   Linux User    ^^-^^    [175555]


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20100316/9edc336f/attachment.pgp>


More information about the Pkg-ExpPsy-PyMVPA mailing list