[pymvpa] permutation test for unbalanced data

Wed Oct 18 03:21:09 UTC 2017

Thanks for your feedback!

1. Nested cross-validation is done across subjects. For a set of 28
subjects (16 (gr1) + 12 (gr2)), 27 subjects are selected and the SMLR
accuracies are computed for several lms (e.g., 0.2, 0.5, 1.0) using a
leave-one-pair-of-subjects-from-different-groups cross-validation (so train
on 25 subjects, test on 2). This procedure is repeated 27 times (so one
different subject is out for each round of NCV). The mean accuracy is
computed across all splits within each round across all 27 rounds. The lm
corresponding to the highest mean accuracy is selected as the best
parameter and used in the full sample (n=28) SMLR.

2. It would be great to balance training sets also (to have equal number
per group), but I was not sure if one set of randomly selected subjects
from a large group  (that is equal to the size of the smaller group) would
do, or if the procedure should be repeated multiple times (to make sure
that all subjects were included across training sets). I am also not sure
how to incorporate generation of equal random sets to the scripts.

Processing time may become an issue. It takes slightly more than 2 min to
run it for 28 subjects (given 192 splits produced by ChainNode). With more
training sets (and potentially more subjects or even more unbalanced
datasets) the processing time can become infinitely longer...

The script with Monte Carlo testing (with 1000 repetitions) described in my
first e-mail  never finished running (well I stopped it after a day or so).
There should be some more efficient way of creating a null distribution for
unbalanced datasets...

3. Below is the output of
print cv.ca.stats.matrix
[[141  68]
 [ 51 124]]

 Note there are 192 splits.

ACC%=69.01%
TPR "better" = 0.73
TPR "worse" = 0.65

Confidence intervals (computed using BinomialProportionCI(width=.95,
meth='jeffreys')) are
64.3%< ACC% < 73.5%.

4. So that new error function in
https://github.com/PyMVPA/PyMVPA/pull/541 computes
a mean of TPRs?

Thank you,
Anna.

On Tue, Oct 17, 2017 at 4:59 PM, Yaroslav Halchenko <debian at onerussian.com>
wrote:

>
> On Wed, 11 Oct 2017, Anna Manelis wrote:
>
> > Dear Experts,
>
> > I have an unbalanced dataset (group1=16 subjects ('better'), group2=12
> > subjects ('worse').
>
> is there any groupping of subjects or each has a dedicated "chunk"?
>
> > I would like to perform Monte-Carlo testing of the SMLR
> > between-subject classification analysis (performed on cope images).  I
> used
> > the permutation_test.py script as an example and tried to adjust it for
> my
> > unbalanced dataset.
>
> > Any feedback on whether the script below makes sense is greatly
> appreciated.
>
> well -- all the disbalanced cases are "tricky" to say the least ;)
>
> I think, that for the peace of mind, you should also try  your pipeline on
> the
> totally random data (ds.samples = np.random.normal(size=ds.shape))  to
> make
> sure that "garbage in, garbage out", and nothing is "significant" (well,
> should
> only be in "p" portion of the cases ;))
>
> > lm = 0.2 # The value was determined through the nested cross-validation
> > procedure
>
> if it was nested, you were most likely to end up with different lm per
> each split.  or in other words: how was it done and among which lm did
> you check?  Feels like some double dipping might be going on
>
> > clf = SMLR(lm=lm, fit_all_weights=False, descr='SMLR(lm=%g)' % lm)
>
> > ### how often do we want to shuffle the data
> > repeater = Repeater(count=200)
>
> > ### permute the training part of a dataset exactly ONCE
> > permutator = AttributePermutator('targets', limit={'partitions': 1},
> > count=1)
>
> > ### Make sure that each group ("better" and "worse") has the same number
> of
> > samples
> > par = ChainNode([NFoldPartitioner(cvtype=2, attr='chunks'),
> >                   Sifter([('partitions', 2),
> >                           ('targets', dict(uvalues=['better', 'worse'],
> >                balanced=True))])
>
> so, here you are saying to select the partitioning where you have
> balanced # of samples in the testing set... so training set will remain
> dis-balanced.  That would lead SMLR to tend to classify as the "better".
> How does your cv.ca.stats look like -- is my fear fulfilled?
>
> here is an example:
>
> In [51]: import mvpa2.suite as mv
>
> In [52]: cv = mv.CrossValidation(mv.SMLR(), mv.NFoldPartitioner(),
> errorfx=mv.mean_mismatch_error, enable_ca=['stats'])
>
> In [53]: ds = mv.normal_feature_dataset(perlabel=10, nlabels=2,
> nfeatures=4, nonbogus_features=[0,1],snr=0)
>
> In [54]: res = cv(ds); print cv.ca.stats  # on random data -- near chance
> result
> ----------.
> predictions\targets  L0   L1
>             `------  ---  --- P' N'  FP  FN  PPV  NPV  TPR  SPC  FDR  MCC
> F1  AUC
>          L0           5    6  11  9  6   5  0.45 0.44  0.5  0.4 0.55 -0.1
> 0.48 0.3
>          L1           5    4   9 11  5   6  0.44 0.45  0.4  0.5 0.56 -0.1
> 0.42 0.3
> Per target:          ---  ---
>          P           10   10
>          N           10   10
>          TP           5    4
>          TN           4    5
> Summary \ Means:     ---  --- 10 10 5.5 5.5 0.45 0.45 0.45 0.45 0.55 -0.1
> 0.45 0.3
>        CHI^2         0.4 p=0.94
>         ACC         0.45
>         ACC%         45
>      # of sets        5   ACC(i) = 0.6-0.075*i p=0.32 r=-0.57 r^2=0.32
>
>
> In [55]: ds_ = ds[[0, 2, 4, 6, 8, 10] + range(11, 20)]
>
> In [56]: print ds_.summary()
> Dataset: 15x4 at float64, <sa: chunks,targets>, <fa: nonbogus_targets>, <a:
> bogus_features,nonbogus_features>
> stats: mean=0.00169964 std=0.315902 var=0.0997943 min=-0.600959
> max=0.737558
>
> Counts of targets in each chunk:
>   chunks\targets  L0  L1
>                  --- ---
>         0         1   2
>         1         1   2
>         2         1   2
>         3         1   2
>         4         1   2
>
> Summary for targets across chunks
>   targets mean std min max #chunks
>     L0      1   0   1   1     5
>     L1      2   0   2   2     5
>
> Summary for chunks across targets
>   chunks mean std min max #targets
>     0     1.5 0.5  1   2      2
>     1     1.5 0.5  1   2      2
>     2     1.5 0.5  1   2      2
>     3     1.5 0.5  1   2      2
>     4     1.5 0.5  1   2      2
> Sequence statistics for 15 entries from set ['L0', 'L1']
> Counter-balance table for orders up to 2:
> Targets/Order O1    |  O2    |
>      L0:       4 1  |   3 2  |
>      L1:       0 9  |   0 8  |
> Correlations: min=-0.5 max=0.7 mean=-0.071 sum(abs)=5.8
>
> In [57]: res = cv(ds_); print cv.ca.stats  # on random and disbalanced data
> ----------.
> predictions\targets   L0   L1
>             `------   ---  ---  P'  N'  FP  FN  PPV  NPV  TPR  SPC  FDR
> MCC   F1  AUC
>          L0           1     3   4   11  3   4  0.25 0.64  0.2  0.7 0.75
> -0.11 0.22 0.6
>          L1           4     7   11  4   4   3  0.64 0.25  0.7  0.2 0.36
> -0.11 0.67 0.6
> Per target:           ---  ---
>          P            5    10
>          N            10    5
>          TP           1     7
>          TN           7     1
> Summary \ Means:      ---  --- 7.5 7.5 3.5 3.5 0.44 0.44 0.45 0.45 0.56
> -0.11 0.44 0.6
>        CHI^2         3.4  p=0.33
>         ACC          0.53
>         ACC%        53.33
>      # of sets        5    ACC(i) = 0.67-0.067*i p=0.65 r=-0.28 r^2=0.08
>
>
> # although overall accuracy might be low, you can see that classifier
> tends to misclassify into L1
>
>
> and overall mean accuracy is a "suboptimal" measure in this case.
>
> What we could have estimated was the mean of the accuracies per each
> class -- that one behaves better in case of the disbalanced datasets
>
> So in this case, with mean_mismatch_error (mean_match_accuracy reported
> in the ca.stats) we are getting
>
> In [63]: (1+7)/(5.+10)
> Out[63]: 0.5333333333333333
>
> whenever we should use for such cases (and may be overall, while also
> taking care about chunks, since splits could be disbalanced differently,
> in this example I just assume a single "measurement"):
>
> In [62]: np.mean([1/5., 7/10.])
> Out[62]: 0.44999999999999996
>
> although numbers here are  not really demonstrative, let's consider more
> critical case, where all the samples would have gone into the "majority"
> class:
>
> In [70]: cm = mv.ConfusionMatrix()
>
> In [71]: cm.add(['L1']*5 + ['L2']*10, ['L2'] * 15)
>
> In [72]: print cm
> ----------.
> predictions\targets   L1   L2
>             `------   ---  ---  P'  N'  FP  FN  PPV  NPV TPR SPC  FDR MCC
> F1
>          L1           0     0   0   15  0   5   nan 0.67  0   1   nan  0
>  0
>          L2           5    10   15  0   5   0  0.67  nan  1   0  0.33  0
> 0.8
> Per target:           ---  ---
>          P            5    10
>          N            10    5
>          TP           0    10
>          TN           10    0
> Summary \ Means:      ---  --- 7.5 7.5 2.5 2.5  nan  nan 0.5 0.5  nan  0
> 0.4
>        CHI^2          15  p=0.0018
>         ACC          0.67
>         ACC%        66.67
>      # of sets        1
>
>
> whohoo -- 66% correct!
> but if we compute the mean of per-class accuracies:
>
> In [73]: np.mean([0/5., 10/10.])
> Out[73]: 0.5
>
> 1.  Unfortunately "lazy" us didn't implement such a dedicated error
>     function yet, BUT  you have that  value as a part of the
>
>     In [84]: print cm.stats['mean(TPR)']
>     0.5
>
>     which is the mean of TPRs per each label, which in turn is what
>     we are interested here
>
> 2.  I am not quite certain if your above approach (though see below
> comment about postproc)  is completely without ground.  Since you assure
> that the test set always has one sample of each class, training
> set is always consistently disbalanced, and your null distribution
> should account for all those "tend to classify as the majority class",
> so I expect  p value for the true classification performance being
> "legit".
>
> how does your chance distribution look like? (distr_est.ca.dist_samples)
>
> meanwhile I will prep the "mean_tpr" function (which should match
> mean_match_accuracy in balanced case) and "mean_fnr" (which should match
> mean_mismatch_error in balanced case) (if I didn't mix anything
> up) to use in such cases.
>
>
> > ### I believe this will apply permutator on each fold created by par.
> > par_perm = ChainNode([par, permutator], space=par.get_space())
>
> > # CV with null-distribution estimation that permutes the training data
> for
> > # each fold independently
> > null_cv = CrossValidation(
> >             clf,
> >             par_perm,
> >             errorfx=mean_mismatch_error)
>
> such null_cv and cv below will  provide err per each split (do print err
> to see
> that).  You should add   postproc=mean_sample()   to both of them.
>
> > # Monte Carlo distribution estimator
> > distr_est = MCNullDist(repeater, tail='left', measure=null_cv,
> >                        enable_ca=['dist_samples'])
>
> > # actual CV with H0 distribution estimation
> > cv = CrossValidation(clf, par, errorfx=mean_mismatch_error,
> >                      null_dist=distr_est, enable_ca=['stats'])
>
>
> > err = cv(ds_mni_bw)
>
> > p = err.ca.null_prob
> > np.asscalar(p)
>
> --
> Yaroslav O. Halchenko
> Center for Open Neuroscience     http://centerforopenneuroscience.org
> Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
> Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
> WWW:   http://www.linkedin.com/in/yarik
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20171017/1946d2ac/attachment-0001.html>