[pymvpa] Pkg-ExpPsy-PyMVPA Digest, Vol 64, Issue 3 (Emanuele Olivetti)

Thu Jul 11 14:07:19 UTC 2013

Sorry Emanuele,
and sorry all for the many emails.

In fact there was indeed something I was doing wrong: I was computing the base 
10 logarithm, instead of the natural logarithm...

Now the posterior probabilities sum up to 1, as they should!

Sorry and thank you again for your help!
Best,
Marco

On 07/11/2013 02:30 PM, marco tettamanti wrote:
> Dear Emanuele,
> thank you again for all the very helpful clarifications!
>
> As for the posterior probabilities not summing up to 1, I am afraid I cannot
> help much, except to provide some further details. I may be very well doing
> something wrong.
>
> As it stands, the inverse logs of the log posterior probabilities by far do not
> sum up to 1 (SUM=0.0622138449).
> If it can be of any use, I have uploaded a table here:
>
> https://dl.dropboxusercontent.com/u/58155846/abssom_36s_6cond_Baycvsvm_res.samples.xls
>
> The table reports the log likelihoods and post.probs for the 203 partitions of
> my dataset, as given by BayesConfusionHypothesis, and the calculated posterior
> probabilities.
>
> All the best,
> Marco
>
> ------------------------------------------------------------------------
> # Code as previously posted:
>
> clfsvm = LinearCSVMC()
>
> Baycvsvm = CrossValidation(clfsvm, NFoldPartitioner(), errorfx=None,
> postproc=ChainNode((Confusion(labels=fds.UT), BayesConfusionHypothesis())))
>
> Baycvsvm_res = Baycvsvm(fds)
> ------------------------------------------------------------------------
>
>> Date: Thu, 11 Jul 2013 11:06:03 +0200 From: Emanuele
>> Olivetti<emanuele at relativita.com>  To:
>> pkg-exppsy-pymvpa at lists.alioth.debian.org Subject: Re: [pymvpa]
>> Pkg-ExpPsy-PyMVPA Digest, Vol 64, Issue 3
>> Message-ID:<51DE757B.90005 at relativita.com>  Content-Type: text/plain;
>> charset=ISO-8859-1; format=flowed
>>
>> Hi Marco,
>>
>> Sorry, I missed your reply because of the change in the subject.
>>
>> The posterior probabilities have to sum up to 1. If that is not the case,
>> then we should dig into the details.
>>
>> There might be numerical instabilities in the computation of likelihoods and
>> posteriors because of the extremely low values involved, but I believe this
>> is unlikely because I did my best to avoid this problem. So my current best
>> guess is that the problem may derive from the use of cross-validation. In my
>> original formulation[0] I did not considered cross-validation (it is now
>> work in progress). Perhaps Michael (Hanke), who implemented the glue between
>> the algorithms [1] and PyMVPA, can comment on cross-validation.
>>
>> With respect to the snippet you sent and according to here
>> https://github.com/PyMVPA/PyMVPA/blob/master/mvpa2/clfs/transerror.py I
>> confirm that you are getting the loglikelihood and the log of the
>> posteriors, as you said.
>>
>> About the posterior probability of the most likely hypothesis being just
>> 0.014, consider that your have many hypotheses, i.e. 203 (so a problem with 6
>> classes). If you adopted the uniform prior probability over all hypotheses,
>> i.e. p(H_i) = 1/203 = 0.00493, then then posterior probability of the most
>> likely one increased almost 3 times: 0.014 / 0.00493 = 2.84. This means that
>> the data are supporting that hypothesis more than you believed in your prior.
>> I don't have the full results of your analysis but your should check whether
>> you have a similar increase with other hypotheses or not.
>>
>> About the Bayes factor>1, consider that different values of the Bayes Factor
>> have different interpretation. In Kass and Raftery (JASA 1995) or here
>> http://en.wikipedia.org/wiki/Bayes_factor#Interpretation you can find
>> commonly accepted guidelines for the interpretation of that value. So you
>> should look at your Bayes Factors according to that. If, for example, you
>> have values not much greater than 1, then the evidence supporting your most
>> likely hypothesis is weak.
>>
>> Best,
>>
>> Emanuele
>>
>> [0]: http://dx.doi.org/10.1109/prni.2012.14 [1]:
>> https://github.com/emanuele/inference_with_classifiers
>>
>> PS: yes the docstring may be improved. Consider submitting a pull request ;)
>>
>> On 07/05/2013 12:30 PM, marco tettamanti wrote:
>>> Dear Emanuele, sorry for the late reply, It took me a while until I could
>>> get back to the data.
>>>
>>> Thank you very much for the very helpful clarifications! Shouldn't the
>>> BayesConfusionHypothesis documentation be updated to mention that also the
>>> log posterior probabilities are calculated?
>>>
>>> Can you just please confirm that given:
>>>
>>> clfsvm = LinearCSVMC() Baycvsvm = CrossValidation(clfsvm,
>>> NFoldPartitioner(), errorfx=None,
>>> postproc=ChainNode((Confusion(labels=fds.UT),
>>> BayesConfusionHypothesis()))) Baycvsvm_res = Baycvsvm(fds)
>>>
>>> the 2 columns of values in 'Baycvsvm_res.samples', indeed correspond to,
>>> respectively, the log likelihoods (1st column) and to the log posterior
>>> probabilities (2nd column), as in:
>>>
>>> print Baycvsvm_res.fa.stat ['log(p(C|H))' 'log(p(H|C))']
>>>
>>>
>>> I have a couple of further questions: I thought from your reply that the
>>> sum of all p(H_i | CM) should give 1, but this does not seem to be the case
>>> for the inverse log values of the 2nd column. Or is it rather that the sum
>>> of all p(H_i) should give 1?
>>>
>>> Also, if the above is correct, and regarding my data specifically: over
>>> 203 possible partitions, the most likely hypothesis has a Bayes factor>1
>>> over all competing hypotheses, which I guess should constitute sufficient
>>> evidence to support it. However, the posterior probability of the most
>>> likely hypothesis seems quite small (0.014). Is this something to be
>>> expected?
>>>
>>> Thank you a lot again and best wishes, Marco
>>>
>>>
>>>> Date: Tue, 25 Jun 2013 08:48:34 -0400 From: Emanuele
>>>> Olivetti<emanuele at relativita.com>  To:
>>>> pkg-exppsy-pymvpa at lists.alioth.debian.org Subject: Re: [pymvpa]
>>>> BayesConfusionHypothesis Message-ID:<51C991A2.9060208 at relativita.com>
>>>> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>>>>
>>>> Dear Marco,
>>>>
>>>> Sorry for the late reply, I'm traveling during these days.
>>>>
>>>> BayesConfusionHypothesis, as default, computes the posterior
>>>> probabilities of each hypothesis tested on the confusion matrix. As you
>>>> correctly report, there is one hypothesis for each possible partition of
>>>> the set of the class labels. For example for three class labels, (A,B,C),
>>>> there are 5 possible partitions: H_1=((A),(B),(C)), H_2=((A,B),(C)),
>>>> H_3=((A,C),(B)), H_4=((A),(B,C)), H_5=((A,B,C)).
>>>>
>>>> The posterior probability of each hypothesis is computed in the usual way
>>>> (let CM be the confusion matrix):
>>>>
>>>> p(H_i | CM) = p(CM | H_i) * p(H_i) / (sum_j p(CM | H_j) * p(H_j))
>>>>
>>>> where p(H_i) is the prior probability of each hypothesis and p(CM | H_i)
>>>> is the (integrated) likelihood of each hypothesis. The default value for
>>>> p(H_j) is  p(H_i) = 1/(number of hypotheses), i.e. no hypothesis is
>>>> preferred. You can specify a different one from the "prior_Hs" parameter
>>>> of BayesConfusionHypothesis.
>>>>
>>>> The measures that are popped out by BayesConfusionHypothesis, i.e. the
>>>> posterior probabilities of each hypothesis, quantify how likely is each
>>>> hypothesis in the light of the data and of the priors that you assumed.
>>>> So those values should be what you are looking for.
>>>>
>>>> If you set "postprob=False" in BayesConfusionHypothesis, you will get
>>>> the likelihoods of each model/hypothesis, i.e. p(CM | H_i), instead of
>>>> posterior probabilities. This is a different quantity. Note that,
>>>> differently from p(H_i | CM), if you sum all the p(CM | H_i) you will not
>>>> get one. The likelihoods (which is an "integrated likelihood", or a
>>>> Bayesian likelihood) are useful to compare hypotheses in pairs. For
>>>> example if you want to know how much evidence is in the data in favor of
>>>> discriminating all classes, i.e. H_5=((A),(B),(C)), compared to not
>>>> discriminating any class, i.e. H_1=((A,B,C)), then you can look at the
>>>> ratio B_51 = p(CM|H_5) / p(CM|H_1), which is called Bayes factor (similar
>>>> to the likelihood ratio of the frequentist approach, but note that the
>>>> likelihoods are not frequentist likelihoods). If that number is>1, then
>>>> the evidence of the data supports H_5 more than H_1. More detailed
>>>> guidelines to interpret the value of the Bayes factor can be found for
>>>> example in Kass and Raftery (JASA 1995).
>>>>
>>>> In the paper Olivetti et al (PRNI 2012) I presented the Bayes factor way,
>>>> but I believe that looking at the posterior probabilities - which is the
>>>> PyMVPA's default I proposed - is simpler and more clear especially in the
>>>> case of many hypotheses/partitions. I am describing these things in an
>>>> article in preparation.
>>>>
>>>> The parameters "space" and "hypotheses" of BayesConfusionHypothesis have
>>>> the following meaning:
>>>>
>>>> - "space" stores the string of the dataset's field where the posterior
>>>> probabilities are stored. That dataset is the output of
>>>> BayesConfusionHypothesis. You might want to change the default name
>>>> "hypothesis". Or not :).
>>>
>>> oops, sorry! I should have read in the documentation a bit further and see
>>> that this is just a name string....
>>>
>>>> - "hypotheses" may be useful if you want to define your own set of
>>>> hypotheses/partitions instead of relying on all possible partitions of
>>>> the set of classes. The default value "None" triggers the internal
>>>> computation of all possible partitions. If you do not have strong reasons
>>>> to change this default behavior, I guess your should stick with the
>>>> default value.
>>>>
>>>> Best,
>>>>
>>>> Emanuele Olivetti
>>>>
>>>> On 06/21/2013 08:47 AM, marco tettamanti wrote:
>>>>> Dear all, first of all I take my first chance to thank the authors for
>>>>> making such a great software as pymvpa available!
>>>>>
>>>>> I have some (beginner) questions regarding the
>>>>> BayesConfusionHypothesis algorithm for for multiclass pattern
>>>>> discrimination.
>>>>>
>>>>> If I understand it correctly, what the algorithm does is to compare
>>>>> all possible partitions of classes and it then reports the most likely
>>>>> partitioning hypothesis to explain the confusion matrix (i.e. highest
>>>>> log likelihood among those of all possible hypotheses, as stored in the
>>>>> .sample attribute).
>>>>>
>>>>> Apart from being happy to see confimed my hypothesis of all classes
>>>>> being discriminable from each other, is there any way to obtain or
>>>>> calculate some measures of how likely it is that the most likely
>>>>> hypothesis is truly strongly/weakly superior than some or all of the
>>>>> alternative hypotheses? For instance, Olivetti et al (PRNI 2012) state
>>>>> that a BF>1 is sufficient to support H1 over H0 and report Bayes Factor
>>>>> and binomial tests in tables.
>>>>>
>>>>> I assume I should know the answer, so forgive me for my poor
>>>>> statistics.
>>>>>
>>>>> On a related matter: I see form the BayesConfusionHypothesis
>>>>> documentation, that there should be parameters to define a hypothesis
>>>>> space (space=) or some specific hypotheses (hypotheses=). Could anybody
>>>>> please provide some examples on how to fill in these parameters?
>>>>>
>>>>> Thank you and all the best, Marco
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------
>>>>
>>>> Subject: Digest Footer
>>>>
>>>> _______________________________________________ Pkg-ExpPsy-PyMVPA mailing
>>>> list Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
>>>> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
>>>>
>>>>
>>>>
> ------------------------------
>>>>
>>>> End of Pkg-ExpPsy-PyMVPA Digest, Vol 64, Issue 3
>>>> ************************************************ .
>>>>
>>>
>> ------------------------------
>>
>> Subject: Digest Footer
>>
>> _______________________________________________ Pkg-ExpPsy-PyMVPA mailing
>> list Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
>> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
>>
>> ------------------------------
>>
>> End of Pkg-ExpPsy-PyMVPA Digest, Vol 65, Issue 6
>> ************************************************
>
>

-- 
Marco Tettamanti, Ph.D.
Nuclear Medicine Department & Division of Neuroscience
San Raffaele Scientific Institute
Via Olgettina 58
I-20132 Milano, Italy
Phone ++39-02-26434888
Fax ++39-02-26434892
Email: tettamanti.marco at hsr.it
Skype: mtettamanti
--------------------------------------------------------------------------
LA TUA CURA E' SCRITTA NEL TUO DNA. AL SAN RAFFAELE LA STIAMO REALIZZANDO.
AIUTA LA RICERCA, DAI IL TUO 5XMILLE - CF: 07636600962
info:www.5xmille at hsr.it - www.5xmille.org

Disclaimer added by CodeTwo Exchange Rules 2007	
http://www.codetwo.com