[pymvpa] mvpa2.clfs.transerror.chisquare

Fri May 12 15:44:57 UTC 2017

Hi Marco,

you are absolutely right, and I was wrong about the type of chi-square run :-) Thanks for the example! Looking more carefully at the code, especially at this line that indicates the degrees of freedom (https://github.com/PyMVPA/PyMVPA/blob/e57f92af83004d40c0944d75ee4e4c9cae758102/mvpa2/misc/stats.py#L83 <https://github.com/PyMVPA/PyMVPA/blob/e57f92af83004d40c0944d75ee4e4c9cae758102/mvpa2/misc/stats.py#L83>), it is indeed using a goodness of fit (1-dim) chi-square (where df = ncells - 1; in a test of independence it should be df = (nrow - 1)*(ncol-1)).

We should look at the original reference (Yarik, do you remember which one it was?), but after some thought, here’s what I think, and I look forward to hearing your comments. While we do have a contingency table (Predicted vs. Target), we do not necessarily need to run a test of independence, because we are testing the performance of the classifier against chance level (or the confusion matrix generated by chance performance), so we have a precise prediction of what the confusion matrix should look like if it were random. Indeed, with your example confusion matrix, these are the marginals

In [46]: mat
Out[46]:
array([[22, 19],
       [26, 29]])

In [47]: print('Marginal across rows: {0}; Marginal across columns: {1}'.format(mat.sum(axis=0), mat.sum(axi
    ...: s=1)))
Marginal across rows: [48 48]; Marginal across columns: [41 55]

so given that this is Predicted (rows) vs Targets (columns), it shows that the classification is balanced (same number of target samples in each class, that is 48). A classifier performing at random would have 50% accuracy, resulting in this confusion matrix
[[24, 24], [24, 24]]

so we should be testing our confusion matrix against this “chance” confusion matrix.

Note that if we instead run a test of independence, the expected confusion matrix would be

In [49]: x, p, df, exp = chi2_contingency(mat, correction=False)
In [50]: print(exp)
[[ 20.5  20.5]
 [ 27.5  27.5]]

which is not correct under the assumption that the classifier is performing at chance level, because it is weighing more one class than the other, based on the predictions.

Hope it makes sense,
Matteo

> On May 12, 2017, at 01:12, marco tettamanti <mrctttmnt at gmail.com> wrote:
> 
> Dear Matteo,
> thank you for your reply!
> This was also my understanding when I looked at the mvpa2.clfs.transerror.chisquare function's help.
> However, the reality seems to be somewhat different, at least in my hands.
> 
> Given any confusion matrices as yielded by the pymvpa2 toolbox:
>         print cvte.ca.stats.matrix
>         [[22 19]
>         [26 29]]
> 
> The calculated chi square:
>         print 'Chi^2: %.6f (p=%.6f)' % cvte.ca.stats.stats["CHI^2"]
>         Chi^2: 2.416667 (p=0.490540)
> 
> Produces the same result as the 1-dimensional chisquare test:
>         from scipy.stats import chisquare
>         chisquare([22, 26, 19, 29], f_exp=[24, 24, 24, 24])
>         Power_divergenceResult(statistic=2.416666666666667, pvalue=0.49053966890385647)
> 
>         help(chisquare)
>         Help on function chisquare in module scipy.stats.stats:
>         chisquare(f_obs, f_exp=None, ddof=0, axis=0) Calculates a one-way chi square test.
> 
> And not as would be with the 2-dimensional independence test:
>         from scipy.stats import chi2_contingency
>         obs = np.array([[22, 26], [19, 29]])
>         g, p, dof, expctd = chi2_contingency(obs, correction=False)
>         print g, p, dof, expctd
>         0.383148558758 0.535922979308 1 [[ 20.5  27.5] [ 20.5  27.5]]
>  
>         help(chi2_contingency)
>         Help on function chi2_contingency in module scipy.stats.contingency:
>         chi2_contingency(observed, correction=True, lambda_=None)
>         Chi-square test of independence of variables in a contingency table.
> 
> 
> Am I doing something wrong?
> Thank you and best wishes,
> Marco
> 
> > *Matteo Visconti di Oleggio Castello* matteo.visconti at gmail.com
> > <mailto:pkg-exppsy-pymvpa%40lists.alioth.debian.org?Subject=Re%3A%20%5Bpymvpa%5D%20mvpa2.clfs.transerror.chisquare&In-Reply-To=%3CEAED4394-634F-4DED-A42F-99FFBCC9E7F5%40gmail.com%3E> <mailto:pkg-exppsy-pymvpa%40lists.alioth.debian.org?Subject=Re%3A%20%5Bpymvpa%5D%20mvpa2.clfs.transerror.chisquare&In-Reply-To=%3CEAED4394-634F-4DED-A42F-99FFBCC9E7F5%40gmail.com%3E>
> >
> > 
> /Thu May 11 17:04:46 UTC 2017/
> > ------------------------- Hi Marco,
> > 
> > looking at the code, the chi-square being run is a test of
> > independence, and not goodness-of-fit.The actual confusion matrix is
> > tested against the expected values were rows and column independent.
> > In the case of balanced classes (i.e., marginal row count is equal to
> > marginal column count for each row and column), the expected value
> > will be a matrix with identical values (row marginal * column
> > marginal / number of observations; or, as it is computed in the code,
> > number of observations/number of cells).
> > 
> > Hope this helps, Matteo
> >> / On May 11, 2017, at 08:53, marco tettamanti <mrctttmnt at
> >> gmail.com
> >> <http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa> <http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa>>
> >> wrote:
> > />/ />/ Dear Yaroslav, />/ thank you for your reply. />/ I might be
> > wrong in the specific case of MVPA, but I think the 1-dimension
> > Goodness-of-fit test is appropriate in case />/ you have something
> > like one dice and you are expecting each of the 6 sides to occur with
> > equal frequencies. />/ The N x N confusion matrix rather reflects the
> > case in which you can a variable with N classes (targets) and you />/
> > measure how frequent these classes distribute across the levels of a
> > different variable (predictions). In such a case, />/ a 2-dimension
> > Pearson's test seems more appropriate. />/ />/ Best, />/ Marco />/ 
> > />/ On Thu, 11 May 2017, marco tettamanti wrote: />/ />/ >    Dear
> > all, />/ >    I apologize if this has been asked before, or else is
> > too trivial. />/ />/ >    I have been trying to understand how the
> > the pymvpa2 toolbox calculates />/ >    the chi-square test of a
> > confusion matrix. />/ />/ >    In a cross-validation (e.g.,
> > cvte.ca.stats), it seems that by default this />/ >    is done by
> > means of a one-dimensional Goodness-of-fit chi-square test with />/ >
> > expected uniform frequency distribution. />/ />/ >    I was wondering
> > whether the bi-dimensional Pearson's chi square wouldn't />/ >    be
> > more appropriate, as it seems to me that this would more closely />/
> > >    reflect the "predictions vs targets N x N" matrix structure. />/
> >  />/ Hi Marco, />/ />/ might as well be -- I would need to read
> > on/check... IIRC we were just />/ following instructions on
> > chi-square test to be done on contingency />/ tables. />/ />/ -- />/
> > Yaroslav O. Halchenko />/ Center for Open Neuroscience
> > http://centerforopenneuroscience.org <http://centerforopenneuroscience.org/>
> > <http://centerforopenneuroscience.org/> <http://centerforopenneuroscience.org/> />/ Dartmouth College, 419
> > Moore Hall, Hinman Box 6207, Hanover, NH 03755 />/ Phone: +1 (603)
> > 646-9834                       Fax: +1 (603) 646-1419 />/ WWW:
> > http://www.linkedin.com/in/yarik <http://www.linkedin.com/in/yarik> <http://www.linkedin.com/in/yarik> <http://www.linkedin.com/in/yarik>
> >  />/ />/ -- />/ Marco Tettamanti, Ph.D. />/ Nuclear Medicine
> > Department & Division of Neuroscience />/ IRCCS San Raffaele
> > Scientific Institute />/ Via Olgettina 58 />/ I-20132 Milano, Italy 
> > />/ Phone ++39-02-26434888 />/ Fax ++39-02-26434892 />/ Email:
> > tettamanti.marco at hsr.it
> > <http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa> <http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa>
> > <mailto:tettamanti.marco at hsr.it
> > <mailto:tettamanti.marcoathsr.it> <http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa> <http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa>>
> >
> > 
> />/ Skype: mtettamanti
> > />/ _______________________________________________ />/
> > Pkg-ExpPsy-PyMVPA mailing list />/ Pkg-ExpPsy-PyMVPA at
> > lists.alioth.debian.org
> > <http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa> <http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa>
> >
> > 
> />/ http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa <http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa>
> > / -- Matteo Visconti di Oleggio Castello Ph.D. Candidate in Cognitive
> > Neuroscience Dartmouth College
> > 
> > +1 (603) 646-8665 mvdoc.me || linkedin.com/in/matteovisconti ||
> > github.com/mvdoc
> > 
> > -------------- next part -------------- An HTML attachment was
> > scrubbed... URL:
> > <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20170511/8e8c7f55/attachment.html> <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20170511/8e8c7f55/attachment.html>
> >
> > 
> -------------------------
> > 
> > * Previous message: [pymvpa] mvpa2.clfs.transerror.chisquare
> > <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/2017q2/003618.html> <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/2017q2/003618.html>
> >
> > 
> * Next message: [pymvpa] mvpa2.clfs.transerror.chisquare <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/2017q2/003620.html> <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/2017q2/003620.html>
> > * *Messages sorted by:* [ date ]
> > <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/2017q2/date.html#3619> <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/2017q2/date.html#3619>
> > [ thread ]
> > <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/2017q2/thread.html#3619> <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/2017q2/thread.html#3619>
> > [ subject ]
> > <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/2017q2/subject.html#3619> <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/2017q2/subject.html#3619>
> > [ author ]
> > <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/2017q2/author.html#3619> <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/2017q2/author.html#3619>
> > 
> > 
> > ------------------------- More information about the
> > Pkg-ExpPsy-PyMVPA mailing list
> > <http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa> <http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa>
> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa

--
Matteo Visconti di Oleggio Castello
Ph.D. Candidate in Cognitive Neuroscience
Dartmouth College

+1 (603) 646-8665
mvdoc.me || linkedin.com/in/matteovisconti || github.com/mvdoc

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20170512/224069ed/attachment-0001.html>