[pymvpa] Pkg-ExpPsy-PyMVPA Digest, Vol 64, Issue 3

Fri Jul 5 10:30:37 UTC 2013

Dear Emanuele,
sorry for the late reply, It took me a while until I could get back to the data.

Thank you very much for the very helpful clarifications!
Shouldn't the BayesConfusionHypothesis documentation be updated to mention that 
also the log posterior probabilities are calculated?

Can you just please confirm that given:

clfsvm = LinearCSVMC()
Baycvsvm = CrossValidation(clfsvm, NFoldPartitioner(), errorfx=None, 
postproc=ChainNode((Confusion(labels=fds.UT), BayesConfusionHypothesis())))
Baycvsvm_res = Baycvsvm(fds)

the 2 columns of values in 'Baycvsvm_res.samples', indeed correspond to, 
respectively, the log likelihoods (1st column) and to the log posterior 
probabilities (2nd column), as in:

print Baycvsvm_res.fa.stat
	['log(p(C|H))' 'log(p(H|C))']

I have a couple of further questions:
I thought from your reply that the sum of all p(H_i | CM) should give 1, but 
this does not seem to be the case for the inverse log values of the 2nd column.
Or is it rather that the sum of all p(H_i) should give 1?

Also, if the above is correct, and regarding my data specifically: over 203 
possible partitions, the most likely hypothesis has a Bayes factor >1 over all 
competing hypotheses, which I guess should constitute sufficient evidence to 
support it.
However, the posterior probability of the most likely hypothesis seems quite 
small (0.014). Is this something to be expected?

Thank you a lot again and best wishes,
Marco

> Date: Tue, 25 Jun 2013 08:48:34 -0400
> From: Emanuele Olivetti<emanuele at relativita.com>
> To: pkg-exppsy-pymvpa at lists.alioth.debian.org
> Subject: Re: [pymvpa] BayesConfusionHypothesis
> Message-ID:<51C991A2.9060208 at relativita.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Dear Marco,
>
> Sorry for the late reply, I'm traveling during these days.
>
> BayesConfusionHypothesis, as default, computes the posterior probabilities of
> each hypothesis
> tested on the confusion matrix. As you correctly report, there is one hypothesis
> for each possible partition of the set of the class labels. For example for three
> class labels, (A,B,C), there are 5 possible partitions: H_1=((A),(B),(C)),
> H_2=((A,B),(C)),
> H_3=((A,C),(B)), H_4=((A),(B,C)), H_5=((A,B,C)).
>
> The posterior probability of each hypothesis is computed in the usual way (let
> CM be the
> confusion matrix):
>
>     p(H_i | CM) = p(CM | H_i) * p(H_i) / (sum_j p(CM | H_j) * p(H_j))
>
> where p(H_i) is the prior probability of each hypothesis and p(CM | H_i) is
> the (integrated) likelihood of each hypothesis.
> The default value for p(H_j) is  p(H_i) = 1/(number of hypotheses), i.e. no
> hypothesis is preferred. You can specify a different one from the "prior_Hs"
> parameter of BayesConfusionHypothesis.
>
> The measures that are popped out by BayesConfusionHypothesis, i.e. the posterior
> probabilities of each hypothesis, quantify how likely is each hypothesis in the
> light
> of the data and of the priors that you assumed. So those values should be what
> you are
> looking for.
>
> If you set "postprob=False" in BayesConfusionHypothesis, you will get the
> likelihoods
> of each model/hypothesis, i.e. p(CM | H_i), instead of posterior probabilities.
> This is a
> different quantity. Note that, differently from p(H_i | CM), if you sum all the
> p(CM | H_i) you
> will not get one. The likelihoods (which is an "integrated likelihood", or a
> Bayesian
> likelihood) are useful to compare hypotheses in pairs. For example if you want to
> know how much evidence is in the data in favor of discriminating all classes,
> i.e. H_5=((A),(B),(C)), compared to not discriminating any class, i.e.
> H_1=((A,B,C)),
> then you can look at the ratio B_51 = p(CM|H_5) / p(CM|H_1), which is called
> Bayes factor (similar to the likelihood ratio of the frequentist approach, but note
> that the likelihoods are not frequentist likelihoods). If that number is>1,
> then the
> evidence of the data supports H_5 more than H_1. More detailed guidelines to
> interpret
> the value of the Bayes factor can be found for example in Kass and Raftery (JASA
> 1995).
>
> In the paper Olivetti et al (PRNI 2012) I presented the Bayes factor way, but I
> believe that looking at the posterior probabilities - which is the PyMVPA's default
> I proposed - is simpler and more clear especially in the case of many
> hypotheses/partitions.
> I am describing these things in an article in preparation.
>
> The parameters "space" and "hypotheses" of BayesConfusionHypothesis have
> the following meaning:
>
> - "space" stores the string of the dataset's field where the posterior probabilities
> are stored. That dataset is the output of BayesConfusionHypothesis. You might
> want to change the default name "hypothesis". Or not :).

oops, sorry! I should have read in the documentation a bit further and see that 
this is just a name string....

> - "hypotheses" may be useful if you want to define your own set of
> hypotheses/partitions
> instead of relying on all possible partitions of the set of classes. The default
> value "None" triggers the internal computation of all possible partitions. If you
> do not have strong reasons to change this default behavior, I guess your should
> stick with the default value.
>
> Best,
>
> Emanuele
> Olivetti
>
> On 06/21/2013 08:47 AM, marco tettamanti wrote:
>> Dear all,
>> first of all I take my first chance to thank the authors for making such a
>> great software as pymvpa available!
>>
>> I have some (beginner) questions regarding the BayesConfusionHypothesis
>> algorithm for for multiclass pattern discrimination.
>>
>> If I understand it correctly, what the algorithm does is to compare all
>> possible partitions of classes and it then reports the most likely
>> partitioning hypothesis to explain the confusion matrix (i.e. highest log
>> likelihood among those of all possible hypotheses, as stored in the .sample
>> attribute).
>>
>> Apart from being happy to see confimed my hypothesis of all classes being
>> discriminable from each other, is there any way to obtain or calculate some
>> measures of how likely it is that the most likely hypothesis is truly
>> strongly/weakly superior than some or all of the alternative hypotheses?
>> For instance, Olivetti et al (PRNI 2012) state that a BF>1 is sufficient to
>> support H1 over H0 and report Bayes Factor and binomial tests in tables.
>>
>> I assume I should know the answer, so forgive me for my poor statistics.
>>
>> On a related matter: I see form the BayesConfusionHypothesis documentation,
>> that there should be parameters to define a hypothesis space (space=) or some
>> specific hypotheses (hypotheses=).
>> Could anybody please provide some examples on how to fill in these parameters?
>>
>> Thank you and all the best,
>> Marco
>>
>
>
>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
>
> ------------------------------
>
> End of Pkg-ExpPsy-PyMVPA Digest, Vol 64, Issue 3
> ************************************************
> .
>

-- 
Marco Tettamanti, Ph.D.
Nuclear Medicine Department & Division of Neuroscience
San Raffaele Scientific Institute
Via Olgettina 58
I-20132 Milano, Italy
Phone ++39-02-26434888
Fax ++39-02-26434892
Email: tettamanti.marco at hsr.it
Skype: mtettamanti
--------------------------------------------------------------------------
LA TUA CURA E' SCRITTA NEL TUO DNA. AL SAN RAFFAELE LA STIAMO REALIZZANDO.
AIUTA LA RICERCA, DAI IL TUO 5XMILLE - CF: 07636600962
info:www.5xmille at hsr.it - www.5xmille.org

Disclaimer added by CodeTwo Exchange Rules 2007	
http://www.codetwo.com