[pymvpa] permutation test for unbalanced data
Anna Manelis
anna.manelis at gmail.com
Wed Oct 11 16:16:08 UTC 2017
Dear Experts,
I have an unbalanced dataset (group1=16 subjects ('better'), group2=12
subjects ('worse'). I would like to perform Monte-Carlo testing of the SMLR
between-subject classification analysis (performed on cope images). I used
the permutation_test.py script as an example and tried to adjust it for my
unbalanced dataset.
Any feedback on whether the script below makes sense is greatly appreciated.
lm = 0.2 # The value was determined through the nested cross-validation
procedure
clf = SMLR(lm=lm, fit_all_weights=False, descr='SMLR(lm=%g)' % lm)
### how often do we want to shuffle the data
repeater = Repeater(count=200)
### permute the training part of a dataset exactly ONCE
permutator = AttributePermutator('targets', limit={'partitions': 1},
count=1)
### Make sure that each group ("better" and "worse") has the same number of
samples
par = ChainNode([NFoldPartitioner(cvtype=2, attr='chunks'),
Sifter([('partitions', 2),
('targets', dict(uvalues=['better', 'worse'],
balanced=True))])
], space='partitions')
### I believe this will apply permutator on each fold created by par.
par_perm = ChainNode([par, permutator], space=par.get_space())
# CV with null-distribution estimation that permutes the training data for
# each fold independently
null_cv = CrossValidation(
clf,
par_perm,
errorfx=mean_mismatch_error)
# Monte Carlo distribution estimator
distr_est = MCNullDist(repeater, tail='left', measure=null_cv,
enable_ca=['dist_samples'])
# actual CV with H0 distribution estimation
cv = CrossValidation(clf, par, errorfx=mean_mismatch_error,
null_dist=distr_est, enable_ca=['stats'])
err = cv(ds_mni_bw)
p = err.ca.null_prob
np.asscalar(p)
Thank you very much,
Anna.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20171011/d5747f85/attachment.html>
More information about the Pkg-ExpPsy-PyMVPA
mailing list