[pymvpa] biased accuracy with nperlabel='equal'?
David V. Smith
david.v.smith at duke.edu
Fri Oct 28 18:16:02 UTC 2011
Hi,
Just following up on this... I've done some more testing on this problem, and I can permute the labels and still get 55.71%.
Is this a bug? Why would it be systematically biased when I permute the labels, which are forced to be equal?
Please let me know if I can provide any additional information that will help solve this problem.
Thanks,
David
On Oct 23, 2011, at 2:33 PM, David V. Smith wrote:
>
> Hi,
>
> I have 140 structural images: 78 are in class A and 62 are in class B. To ensure that the training algorithm (LinearNuSVMC) doesn't build a biased model, I am using the nperlabel='equal' option in my splitter. I know this part of my code is working (see below), so I'm confused why my CVs (leave-one-scan-out) are biased with random data (e.g., 55.71%). Can someone please clarify why I'm not getting 50% with random data? I suspect I'm just not understanding something simple...
>
> Thanks!
> David
>
>
> In [11]: print ds.summary()
> Dataset / float64 140 x 20068
> uniq: 140 chunks 2 labels
> stats: mean=0.114425 std=0.318326 var=0.101332 min=0 max=1
> No details due to large number of labels or chunks. Increase maxc and maxl if desired
> Summary per label across chunks
> label mean std min max #chunks
> 1 0.443 0.497 0 1 62
> 2 0.557 0.497 0 1 78
>
>
> In [10]: print '\n'.join([d.summary() for d in list(NFoldSplitter(nperlabel='equal')(ds))[0]])
>
> Dataset / float64 122 x 20068
> uniq: 122 chunks 2 labels
> stats: mean=0.107628 std=0.30991 var=0.0960441 min=0 max=1
> No details due to large number of labels or chunks. Increase maxc and maxl if desired
> Summary per label across chunks
> label mean std min max #chunks
> 1 0.5 0.5 0 1 61
> 2 0.5 0.5 0 1 61
>
> Dataset / float64 1 x 20068
> uniq: 1 chunks 1 labels
> stats: mean=0.077935 std=0.268069 var=0.0718612 min=0 max=1
>
> Counts of labels in each chunk:
> chunks\labels 1.0
> ---
> 1.0 1
>
> Summary per label across chunks
> label mean std min max #chunks
> 1 1 0 1 1 1
>
> Summary per chunk across labels
> chunk mean std min max #labels
> 1 1 0 1 1 1
>
>
> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
More information about the Pkg-ExpPsy-PyMVPA
mailing list