[pymvpa] permutation test for unbalanced data

Anna Manelis anna.manelis at gmail.com
Tue Oct 17 15:10:01 UTC 2017


Dear Experts,

Can anyone respond to my request (in this thread) about doing Monte Carlo
testing when the dataset is unbalanced and partitioning like

par = ChainNode([NFoldPartitioner(cvtype=2, attr='chunks'),
                  Sifter([('partitions', 2),
                          ('targets', dict(uvalues=['cond1', 'cond2'],
               balanced=True))])
                 ], space='partitions')

is used?

I will really appreciate your help.

Thank you,
Anna.

On Wed, Oct 11, 2017 at 12:16 PM, Anna Manelis <anna.manelis at gmail.com>
wrote:

>
> Dear Experts,
>
> I have an unbalanced dataset (group1=16 subjects ('better'), group2=12
> subjects ('worse'). I would like to perform Monte-Carlo testing of the SMLR
> between-subject classification analysis (performed on cope images).  I used
> the permutation_test.py script as an example and tried to adjust it for my
> unbalanced dataset.
>
> Any feedback on whether the script below makes sense is greatly
> appreciated.
>
>
> lm = 0.2 # The value was determined through the nested cross-validation
> procedure
>
> clf = SMLR(lm=lm, fit_all_weights=False, descr='SMLR(lm=%g)' % lm)
>
> ### how often do we want to shuffle the data
> repeater = Repeater(count=200)
>
> ### permute the training part of a dataset exactly ONCE
> permutator = AttributePermutator('targets', limit={'partitions': 1},
> count=1)
>
> ### Make sure that each group ("better" and "worse") has the same number
> of samples
> par = ChainNode([NFoldPartitioner(cvtype=2, attr='chunks'),
>                   Sifter([('partitions', 2),
>                           ('targets', dict(uvalues=['better', 'worse'],
>                balanced=True))])
>                  ], space='partitions')
>
> ### I believe this will apply permutator on each fold created by par.
> par_perm = ChainNode([par, permutator], space=par.get_space())
>
> # CV with null-distribution estimation that permutes the training data for
> # each fold independently
> null_cv = CrossValidation(
>             clf,
>             par_perm,
>             errorfx=mean_mismatch_error)
>
> # Monte Carlo distribution estimator
> distr_est = MCNullDist(repeater, tail='left', measure=null_cv,
>                        enable_ca=['dist_samples'])
>
> # actual CV with H0 distribution estimation
> cv = CrossValidation(clf, par, errorfx=mean_mismatch_error,
>                      null_dist=distr_est, enable_ca=['stats'])
>
>
> err = cv(ds_mni_bw)
>
> p = err.ca.null_prob
> np.asscalar(p)
>
>
> Thank you very much,
> Anna.
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20171017/865be104/attachment.html>


More information about the Pkg-ExpPsy-PyMVPA mailing list