[pymvpa] unbalanced datasets

J.A. Etzel jetzel at artsci.wustl.edu
Wed Aug 8 19:37:26 UTC 2012


What you describe is one option. I talked about those types of schemes 
and when they can be ok (in my opinion!) in 
http://dx.doi.org/10.1016/j.neuroimage.2010.08.050

As general advice, it seems best to try to partition so that the number 
of examples of each case in each cross-validation fold is roughly equal. 
Sometimes that's just plain not possible. For example, I have a dataset 
with a large number of runs, but only trials the person gets correct are 
analyzed, so the number of examples in some runs for some people varies 
drastically. What we did in this case was to partition on groups of 
runs, so one fold is to leave runs 1,2,3, and 4 out. This scheme 
equalized the number of examples somewhat (though I still subsetted 
examples to make them exactly equal), and seemed to help the amount of 
variation.

Jo


On 8/7/2012 10:52 AM, Edmund Chong wrote:
> Hi all,
>
> I recently asked a question on dealing with unbalanced datasets and
> here's a follow-up question.
> So let's say I have empty runs, or runs where there are zero samples for
> one of the conditions. This leads to problems if that run happens to be
> the test run on a leave-one-run-out cross-validation procedure.
>
> My workaround for that was this: if I had one of such runs with empty
> conditions, then I would set NFoldPartitioner(cvtype=2), together with
> Balancer() so that any combination of two runs would have at least one
> sample per condition. But if I had two of such runs with empty
> conditions, then I would set cvtype=3, and so on. However this means I
> have less data for the training set on each classification fold.
>
> Is there any other possible solution for this? In fact, is it possible
> to do leave-n-samples-out classification: So on each fold I randomly
> select n samples per condition to test on, and use the remaining samples
> (after balancing) for training, disregarding the chunks structure?
>
> Thanks!
> -Edmund
>
>
> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
>

-- 
Joset A. Etzel, Ph.D.
Research Analyst
Cognitive Control & Psychopathology Lab
Washington University in St. Louis
http://mvpa.blogspot.com/



More information about the Pkg-ExpPsy-PyMVPA mailing list