[pymvpa] Balancing with searchlight and statistical issues.

Tue Mar 1 17:43:23 UTC 2016

So, thank you Jo for your response and sorry because I didn't explained
clearly my strategy as well.

I balanced the dataset within runs, so if I have 8A and 2B, after balancing
I will have 2A and 2B chosen randomly (by pymvpa), since I could have some
high unbalanced runs (2A vs 2B) I decided to use a two run out
cross-validation, in order to have more samples in the testing set, thus a
less biased accuracy (with 2 samples per class, I can have 0, 0.5, 1
accuracies) , but I did not replicate the balancing process, because I
definetely increase the computational time (using either a two run out
cross-validation).

So do you suggest to use more balanced dataset replications and a leave one
run out cross-validation?
Do you think that using a data oriented balancing (e.g. remove beta images
that are not similar to the image average) or I am introducing some other
bias?

OT: I always thought that SVM was not so sensible to unbalancing, because
it uses only few samples as support vectors!

Thank you,
Roberto

On 1 March 2016 at 16:00, Jo Etzel <jetzel at wustl.edu> wrote:

> Here's a response to the second part of your question:
>
> On 2/29/2016 11:30 AM, Roberto Guidotti wrote:
>
>>     Also, you say the dataset is unbalanced, but has 12 runs, each with
>>     10 trials, half A and half B. That sounds balanced to me
>>
>> I classified in few subject the motor response with good accuracies, but
>> now I would like to decode decision, since is a decision task, which is
>> the main reason why my dataset is unbalanced. Stimuli are balanced,
>> since the subject views half A and half B, but he has to respond if the
>> stimulus is either A or B, thus I could have runs with unbalanced
>> condition (e.g. 8 A vs 2 B, etc.).
>>
>
> I see; you're classifying decisions, not stimuli, and the people's
> decisions were unbalanced. (As far as the classifier is concerned, the
> balanced stimuli are totally irrelevant; it's the labels (decisions, here)
> that matter.)
>
> Classifying with an imbalanced training set is not at all a good idea in
> most cases; you'll need to balance it so that you have equal numbers of
> each class. I'll try to get a demo up with more explanation, but the short
> version is that linear SVMs (and many other common MVPA algorithms) are
> exquisitely sensitive to imbalance: a training set with 21 of one class and
> 20 of the other can make seriously skewed results.
>
> While there are ways to adjust example weighting, etc, with fMRI datasets
> I generally recommend subsetting examples for balance instead. Since you
> have 12 runs, you might find that the balance is a bit closer if you do
> leave-two-runs-out (or even three or four) instead of leave-one-run-out
> cross-validation.
>
> Say you have 21 of one class and 20 of the other in a training set. You'll
> then want to remove one of the larger class (at random), so that there are
> 20 examples of both classes. To make sure you didn't happen to remove a
> "weird" example (and so your results were totally dependent on which
> example was removed), the balancing process should be repeated several
> times (e.g. 5, 10, depending on how serious the imbalance is) and results
> averaged over those replications.
>
> I don't know how to set it up in pymvpa, but when dealing with imbalanced
> datasets my usual practice is to look at how many examples are present for
> each person, and figure out a cross-validation scheme that will minimize
> the imbalance as much as possible. They I precalculate which examples will
> be omitted in each person for each replication (e.g., the first replication
> leave out the 3rd "A" in run 2, the second replication, omit the 5th).
> Ideally, I omit examples before classifying, so that all cross-validation
> folds will be fully balanced, then do the classification with that balanced
> dataset. (This parallels the idea of dataset-wise permutation testing -
> first balance the dataset, then do the cross-validation.)
>
> hope this makes sense,
> Jo
>
>
>
>
>
> ____________________________________________
>
>> Pkg-ExpPsy-PyMVPA mailing list
>> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
>> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
>>
>>
> --
> Joset A. Etzel, Ph.D.
> Research Analyst
> Cognitive Control & Psychopathology Lab
> Washington University in St. Louis
> http://mvpa.blogspot.com/
>
> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20160301/4e7b3f0d/attachment.html>