[pymvpa] Question about new Nested_CV example
Yaroslav Halchenko
debian at onerussian.com
Mon Oct 18 21:03:36 UTC 2010
Dear Arteaga,
Thank you for spotting the bug in the nested_cv.py example. I have
fixed it the way you suggested, and to disambiguate if even
further I have used dataset_ argument name within select_best_clf to
avoid any possible confusion ;) I also tuned it up a bit to make
example more lightweight and faster.
Fix was pushed into maint/0.5 and master (main development toward
0.6) branches of our main repository:
http://github.com/PyMVPA/PyMVPA
Cheers,
Yarik
On Mon, 18 Oct 2010, Arteaga, Dan (NIH/NINDS) [F] wrote:
> Hi Yaroslav,
> In the function: def select_best_clf(dataset, clfs), it is my
> understanding that only the training splits of each CV fold are used
> to calculate the average NCV transfer error for each classifier across
> the CNV folds via:
> try:
> error = np.mean(cv(dstrain))
> However, to get the cheating classifier results, the entire dataset is
> sent to the select_best_clf function via the line:
> cheating_clf, cheating_error = select_best_clf(dataset,
> clfswh['!gnpp'])
> Yet, it still uses the error = np.mean(cv(dstrain)) to calculate the
> cheating CV transfer errors as well. Replacing it with error =
> np.mean(cv(dataset)) gives different results for the cheating errors,
> but not for the original NCV errors.
> In addition, the select_best_clf only introduces the dataset variable,
> whereas dstrain is only used in the for loop?
> If I am misunderstanding a fundamental programming concept I apologize
> and you can ignore this if you want. And I am extremely appreciative
> of this new example.
> Best,
> Dan
--
Yaroslav O. Halchenko
Postdoctoral Fellow, Department of Psychological and Brain Sciences
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419
WWW: http://www.linkedin.com/in/yarik
More information about the Pkg-ExpPsy-PyMVPA
mailing list