[pymvpa] Balancing with searchlight and statistical issues.

Tue Mar 1 23:02:31 UTC 2016

On 3/1/2016 4:54 PM, Richard Dinga wrote:
>  > I balanced the dataset within runs, so if I have 8A and 2B, after
> balancing I will have 2A and 2B chosen randomly (by pymvpa), since I
> could have some high unbalanced runs (2A vs 2B)
>
> I don't see benefit in balancing the data within each run. You will just
> loose lots of valuable data. Depending on your approach, you might want
> to balance the data within whole training set.

Quick reply to this part: yes, I totally agree that the critical thing 
is to have the *training set* balanced. I started referring to runs 
since the original question involved cross-validation on the runs, but 
didn't mean to imply that balancing should always be within each run; 
the cross-validation scheme really matters.

Jo