[pymvpa] Number of data points per condition: what are your guidelines?

Tue May 8 20:39:29 UTC 2012

Sorry for the late reply. My comments below inline.

On 05/02/2012 09:21 PM, J.A. Etzel wrote:
> On 5/1/2012 11:27 AM, Vadim Axel wrote:
>> Hi experts,
>>
>> I am talking about basic pattern classification (e.g. no feature
>> selection etc). SVM algorithm (with built-in regularization).
>>
>> 1. A small number of data points with large dimension (ROI size)  can
>> cause overfitting, which is  high prediction on training set and bad
>> test set. Now, suppose, I have a beyond chance classification on test
>> set, which was validated using within subject permutation test and
>> across subjects t-test vs. chance. Can my results be still unreliable?
>> If so, how can I test it?

Overfitting is a different issue from assessing the accuracy on the test set.
The first one is related to training, the second one to testing. You can do fairly
suboptimal training and still be highly confident in better-than-chance
accuracy from the predictions on the test set. Or you can be very sound
in training and still get chance level estimated accuracy on the test set.
Of course a tragic training has less hope to provide a classifier highly accurate
on the test set :-)

You have a small test set and claim to be beyond chance both at
subject level (permutation test) and at the group level (t-test). Are
your results reliable?

* The permutation test is reliable when "the data distribution is adequately
represented by the sample data" (Golland and Fischl, IPMI 2003 - a paper that
promotes the use of the permutation test). Are "a small number of data points"
enough to represent your high-dimensional space? Maybe. Or maybe not. Being
able to check this assumption seems to be an open problem itself.

* The t-test assumes Gaussian distribution of the statistic of interest,
i.e. the accuracy or error rate of each single subject. Note that
accuracy/error ranges in [0,1] so the Gaussian distribution - which has infinite
support - cannot describe it properly. I wouldn't use it.

>>
>> 2. Practically, is 10 independent data points (averaged block value or
>> beta values) with the ROI of 100 voxels is safe enough?
> I don't know about "safe", but this is in the range of reasonable things to try. I 
> currently have a dataset that works well with a few hundred voxels and only 6 examples, 
> and others that have more examples and fewer voxels.

Would you be sufficiently confident that a coin is not fair after tossing it
10 times (or 6 times) and observing always head? I guess even
the binomial test would disagree. In order to assess the evidence
of the data in support to the better-than-chance hypothesis I would
proceed in a different way, like in the reference I mention below.

Again, the size of the ROI is of little importance here, in my opinion.

>
>>
>> 3. Do you know about any imaging papers which tested / discussed this issue?
> Mukherjee, S., Golland, P., Panchenko, D.: Permutation Tests for Classification. AI Memo 
> 2003-019. Massachusetts Institute of Technology Computer Science and Artificial 
> Intelligence Laboratory (2003)
>
> Klement, S., Madany Mamlouk, A., Martinetz, T., Kurková, V., Neruda, R., Koutník, J.: 
> Reliability of Cross-Validation for SVMs in High-Dimensional, Low Sample Size Scenarios 
> Artificial Neural Networks - ICANN 2008. Vol. 5163. Springer Berlin / Heidelberg (2008) 
> 41-50
>
>

I would suggest this one to support my points:
Emanuele Olivetti, Sriharsha Veeramachaneni, Ewa Nowakowska, Bayesian hypothesis testing 
for pattern discrimination in brain decoding, Pattern Recognition, 45, 2012. 
http://dx.doi.org/10.1016/j.patcog.2011.04.025
I know self citations suck, still I haven't found a more convincing one.

Best,

Emanuele