> In my experience I have accuracies on permuted-label data sets (true
> data, only the labels permuted, maintaining run, subject, etc.
> structure) ranging from the upper 40s to low 50s; I don't think I've
> ever seen an (averaged across subjects) average below 45% or above
> 55% (or even near). It's been awhile, but I've looked at the
> permutation test results in individual subjects as well, and still
> found highly significant below-chance accuracy.

Permutation testing is indeed useful but it might not allow to spot
issues like you've mentioned (inherent error of the trials) or temporal
dis-balance within stimulation sequences, i.e. although trials follow in
random order, there still could be not perfect counter-balance with
preceeding trials leading to inherent autocorrelation which might
(anti)correlate with the hemodynamics/motion/whatever.

So, once again, it would depend on the design, experimental task,

I guess first step to help out, would be to add to PyMVPA smth like
which might come handy to spot such effects

> If permuted-label data produce a near-50% accuracy (quite near 50%
> mean; distribution nicely overlapping 50% and assuming a decent
> number of permutations were performed), it does not strike me as a
> situation where a chance performance estimation of 50% was
> optimistic.

To me, the question here is not about the mean of the chance
performances -- sure thing in a simple balanced case with equal
altogether probability of both conditions, mean chance performance
should be 50%.  The question is, what resultant performance should be
considered significant to reject null-hypothesis originally stated.  Or
to say, is 10% (mis)generalization is really the effect or just a
side-effect; or the opposite -- how good generalization should be to
state the finding with high assurance.

