[pymvpa] Cross validation model selection.

Fri Jul 4 13:32:34 UTC 2014

Hi,

On Thu, Jul 03, 2014 at 05:49:10PM +0200, Roberto Guidotti wrote:
> I have a classifier inside my cross-validation procedure, after cv(ds) call
> I have error rate for each fold, if then I run clf.predict(brand_new_ds)
> will I have predictions based on last cross-validated classifier or based
> on best cross-validation classifier or something else?

It will be the classifier trained on the last fold.

> Does It make sense to look for the "best" cross-validated classifier, to
> make predictions on a unseen and unlabeled dataset (like resting-state) ?

I don't think it would make much sense. "Best" is higher accuracy when
predicting a particular test dataset compared to all others. I doesn;t
necessarily mean that that particular model fit is really better, it
could also be a "better/cleaner" test set.

You could explore this with more complicated data-folding schemes and
evaluate each trained classifier on multiple different test sets. But
I am not sure what you would gain from doing so...

Cheers,

Michael

-- 
Michael Hanke
http://mih.voxindeserto.de