[pymvpa] Pkg-ExpPsy-PyMVPA Digest, Vol 64, Issue 3

Thu Jul 11 09:06:03 UTC 2013

Hi Marco,

Sorry, I missed your reply because of the change in the subject.

The posterior probabilities have to sum up to 1. If that is not the case, then 
we should
dig into the details.

There might be numerical instabilities in the computation of likelihoods and 
posteriors because
of the extremely low values involved, but I believe this is unlikely because I 
did my best to avoid
this problem. So my current best guess is that the problem may derive from the 
use of
cross-validation. In my original formulation[0] I did not considered 
cross-validation (it is now work
in progress). Perhaps Michael (Hanke), who implemented the glue between the 
algorithms
[1] and PyMVPA, can comment on cross-validation.

With respect to the snippet you sent and according to here
https://github.com/PyMVPA/PyMVPA/blob/master/mvpa2/clfs/transerror.py
I confirm that you are getting the loglikelihood and the log of the posteriors, 
as you said.

About the posterior probability of the most likely hypothesis being just 0.014, 
consider
that your have many hypotheses, i.e. 203 (so a problem with 6 classes). If you 
adopted
the uniform prior probability over all hypotheses, i.e. p(H_i) = 1/203 = 
0.00493, then
then posterior probability of the most likely one increased almost 3 times: 
0.014 / 0.00493 = 2.84.
This means that the data are supporting that hypothesis more than you believed in
your prior. I don't have the full results of your analysis but your should check 
whether
you have a similar increase with other hypotheses or not.

About the Bayes factor >1, consider that different values of the Bayes Factor have
different interpretation. In Kass and Raftery (JASA 1995) or here
   http://en.wikipedia.org/wiki/Bayes_factor#Interpretation
you can find commonly accepted guidelines for the interpretation of that value.
So you should look at your Bayes Factors according to that. If, for example, you
have values not much greater than 1, then the evidence supporting your most
likely hypothesis is weak.

Best,

Emanuele

[0]: http://dx.doi.org/10.1109/prni.2012.14
[1]: https://github.com/emanuele/inference_with_classifiers

PS: yes the docstring may be improved. Consider submitting a pull request ;)

On 07/05/2013 12:30 PM, marco tettamanti wrote:
> Dear Emanuele,
> sorry for the late reply, It took me a while until I could get back to the data.
>
> Thank you very much for the very helpful clarifications!
> Shouldn't the BayesConfusionHypothesis documentation be updated to mention 
> that also the log posterior probabilities are calculated?
>
> Can you just please confirm that given:
>
> clfsvm = LinearCSVMC()
> Baycvsvm = CrossValidation(clfsvm, NFoldPartitioner(), errorfx=None, 
> postproc=ChainNode((Confusion(labels=fds.UT), BayesConfusionHypothesis())))
> Baycvsvm_res = Baycvsvm(fds)
>
> the 2 columns of values in 'Baycvsvm_res.samples', indeed correspond to, 
> respectively, the log likelihoods (1st column) and to the log posterior 
> probabilities (2nd column), as in:
>
> print Baycvsvm_res.fa.stat
>     ['log(p(C|H))' 'log(p(H|C))']
>
>
> I have a couple of further questions:
> I thought from your reply that the sum of all p(H_i | CM) should give 1, but 
> this does not seem to be the case for the inverse log values of the 2nd column.
> Or is it rather that the sum of all p(H_i) should give 1?
>
> Also, if the above is correct, and regarding my data specifically: over 203 
> possible partitions, the most likely hypothesis has a Bayes factor >1 over all 
> competing hypotheses, which I guess should constitute sufficient evidence to 
> support it.
> However, the posterior probability of the most likely hypothesis seems quite 
> small (0.014). Is this something to be expected?
>
> Thank you a lot again and best wishes,
> Marco
>
>
>> Date: Tue, 25 Jun 2013 08:48:34 -0400
>> From: Emanuele Olivetti<emanuele at relativita.com>
>> To: pkg-exppsy-pymvpa at lists.alioth.debian.org
>> Subject: Re: [pymvpa] BayesConfusionHypothesis
>> Message-ID:<51C991A2.9060208 at relativita.com>
>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>
>> Dear Marco,
>>
>> Sorry for the late reply, I'm traveling during these days.
>>
>> BayesConfusionHypothesis, as default, computes the posterior probabilities of
>> each hypothesis
>> tested on the confusion matrix. As you correctly report, there is one hypothesis
>> for each possible partition of the set of the class labels. For example for 
>> three
>> class labels, (A,B,C), there are 5 possible partitions: H_1=((A),(B),(C)),
>> H_2=((A,B),(C)),
>> H_3=((A,C),(B)), H_4=((A),(B,C)), H_5=((A,B,C)).
>>
>> The posterior probability of each hypothesis is computed in the usual way (let
>> CM be the
>> confusion matrix):
>>
>>     p(H_i | CM) = p(CM | H_i) * p(H_i) / (sum_j p(CM | H_j) * p(H_j))
>>
>> where p(H_i) is the prior probability of each hypothesis and p(CM | H_i) is
>> the (integrated) likelihood of each hypothesis.
>> The default value for p(H_j) is  p(H_i) = 1/(number of hypotheses), i.e. no
>> hypothesis is preferred. You can specify a different one from the "prior_Hs"
>> parameter of BayesConfusionHypothesis.
>>
>> The measures that are popped out by BayesConfusionHypothesis, i.e. the posterior
>> probabilities of each hypothesis, quantify how likely is each hypothesis in the
>> light
>> of the data and of the priors that you assumed. So those values should be what
>> you are
>> looking for.
>>
>> If you set "postprob=False" in BayesConfusionHypothesis, you will get the
>> likelihoods
>> of each model/hypothesis, i.e. p(CM | H_i), instead of posterior probabilities.
>> This is a
>> different quantity. Note that, differently from p(H_i | CM), if you sum all the
>> p(CM | H_i) you
>> will not get one. The likelihoods (which is an "integrated likelihood", or a
>> Bayesian
>> likelihood) are useful to compare hypotheses in pairs. For example if you 
>> want to
>> know how much evidence is in the data in favor of discriminating all classes,
>> i.e. H_5=((A),(B),(C)), compared to not discriminating any class, i.e.
>> H_1=((A,B,C)),
>> then you can look at the ratio B_51 = p(CM|H_5) / p(CM|H_1), which is called
>> Bayes factor (similar to the likelihood ratio of the frequentist approach, 
>> but note
>> that the likelihoods are not frequentist likelihoods). If that number is>1,
>> then the
>> evidence of the data supports H_5 more than H_1. More detailed guidelines to
>> interpret
>> the value of the Bayes factor can be found for example in Kass and Raftery (JASA
>> 1995).
>>
>> In the paper Olivetti et al (PRNI 2012) I presented the Bayes factor way, but I
>> believe that looking at the posterior probabilities - which is the PyMVPA's 
>> default
>> I proposed - is simpler and more clear especially in the case of many
>> hypotheses/partitions.
>> I am describing these things in an article in preparation.
>>
>> The parameters "space" and "hypotheses" of BayesConfusionHypothesis have
>> the following meaning:
>>
>> - "space" stores the string of the dataset's field where the posterior 
>> probabilities
>> are stored. That dataset is the output of BayesConfusionHypothesis. You might
>> want to change the default name "hypothesis". Or not :).
>
> oops, sorry! I should have read in the documentation a bit further and see 
> that this is just a name string....
>
>> - "hypotheses" may be useful if you want to define your own set of
>> hypotheses/partitions
>> instead of relying on all possible partitions of the set of classes. The default
>> value "None" triggers the internal computation of all possible partitions. If 
>> you
>> do not have strong reasons to change this default behavior, I guess your should
>> stick with the default value.
>>
>> Best,
>>
>> Emanuele
>> Olivetti
>>
>> On 06/21/2013 08:47 AM, marco tettamanti wrote:
>>> Dear all,
>>> first of all I take my first chance to thank the authors for making such a
>>> great software as pymvpa available!
>>>
>>> I have some (beginner) questions regarding the BayesConfusionHypothesis
>>> algorithm for for multiclass pattern discrimination.
>>>
>>> If I understand it correctly, what the algorithm does is to compare all
>>> possible partitions of classes and it then reports the most likely
>>> partitioning hypothesis to explain the confusion matrix (i.e. highest log
>>> likelihood among those of all possible hypotheses, as stored in the .sample
>>> attribute).
>>>
>>> Apart from being happy to see confimed my hypothesis of all classes being
>>> discriminable from each other, is there any way to obtain or calculate some
>>> measures of how likely it is that the most likely hypothesis is truly
>>> strongly/weakly superior than some or all of the alternative hypotheses?
>>> For instance, Olivetti et al (PRNI 2012) state that a BF>1 is sufficient to
>>> support H1 over H0 and report Bayes Factor and binomial tests in tables.
>>>
>>> I assume I should know the answer, so forgive me for my poor statistics.
>>>
>>> On a related matter: I see form the BayesConfusionHypothesis documentation,
>>> that there should be parameters to define a hypothesis space (space=) or some
>>> specific hypotheses (hypotheses=).
>>> Could anybody please provide some examples on how to fill in these parameters?
>>>
>>> Thank you and all the best,
>>> Marco
>>>
>>
>>
>>
>>
>> ------------------------------
>>
>> Subject: Digest Footer
>>
>> _______________________________________________
>> Pkg-ExpPsy-PyMVPA mailing list
>> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
>> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
>>
>> ------------------------------
>>
>> End of Pkg-ExpPsy-PyMVPA Digest, Vol 64, Issue 3
>> ************************************************
>> .
>>
>