[pymvpa] Question about new Nested_CV example

Mon Oct 18 21:03:36 UTC 2010

Dear Arteaga,

Thank you for spotting the bug in the nested_cv.py example.  I have
fixed it the way you suggested, and to disambiguate if even
further I have used dataset_ argument name within select_best_clf to
avoid any possible confusion ;)  I also tuned it up a bit to make
example more lightweight and faster.

Fix was pushed into maint/0.5 and master (main development toward
0.6) branches of our main repository:

http://github.com/PyMVPA/PyMVPA

Cheers,
Yarik

On Mon, 18 Oct 2010, Arteaga, Dan (NIH/NINDS) [F] wrote:

>    Hi Yaroslav,

>    In the function:  def select_best_clf(dataset, clfs), it is my
>    understanding that only the training splits of each CV fold  are used
>    to calculate the average NCV transfer error for each classifier across
>    the CNV folds via:

>                    try:

>          error = np.mean(cv(dstrain))

>    However, to get the cheating classifier results, the entire dataset is
>    sent to the select_best_clf function via the line:

>                    cheating_clf, cheating_error = select_best_clf(dataset,
>    clfswh['!gnpp'])

>    Yet, it still uses the error = np.mean(cv(dstrain)) to calculate the
>    cheating CV transfer errors as well. Replacing it with error =
>    np.mean(cv(dataset)) gives different results for the cheating errors,
>    but not for the original NCV errors.

>    In addition, the select_best_clf only introduces the dataset variable,
>    whereas dstrain is only used in the for loop?

>    If I am misunderstanding a fundamental programming concept I apologize
>    and you can ignore this if you want.  And I am extremely appreciative
>    of this new example.

>    Best,

>    Dan

-- 
Yaroslav O. Halchenko
Postdoctoral Fellow,   Department of Psychological and Brain Sciences
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik