[pymvpa] RFE and dataset splits

Thu Jun 23 20:29:40 UTC 2011

That is the odd thing...cross-validation using linear SVM

clf = LinearCSVMC()
cvte = CrossValidation(clf, NFoldPartitioner(), enable_ca=['stats'])
cv_results=cvte(avgds)

gives a 64.6% accuracy, shouldn't RFE start at something like 36% error and
improve from there? (SMLR, slightly lower, 58.3% acc, but still better than
chance?)

Here are the dataset details:

Dataset: 48x135168 at float64, <sa: chunks,targets,time_coords,time_indices>,
<fa: voxel_indices>, <a: imghdr,imgtype,mapper,voxel_dim,voxel_eldim>
stats: mean=0.00193459 std=1.16806 var=1.36437 min=-139.459 max=116.782

Counts of targets in each chunk:
  chunks\targets nwrd word
                  ---  ---
       0.0         1    1
       1.0         1    1
       2.0         1    1
       3.0         1    1
       4.0         1    1
       5.0         1    1
       6.0         1    1
       7.0         1    1
       8.0         1    1
       9.0         1    1
      10.0         1    1
      11.0         1    1
      12.0         1    1
      13.0         1    1
      14.0         1    1
      15.0         1    1
      16.0         1    1
      17.0         1    1
      18.0         1    1
      19.0         1    1
      20.0         1    1
      21.0         1    1
      22.0         1    1
      23.0         1    1

Summary for targets across chunks
  targets mean std min max #chunks
   nwrd     1   0   1   1     24
   word     1   0   1   1     24

Summary for chunks across targets
  chunks mean std min max #targets
    0      1   0   1   1      2
    1      1   0   1   1      2
    2      1   0   1   1      2
    3      1   0   1   1      2
    4      1   0   1   1      2
    5      1   0   1   1      2
    6      1   0   1   1      2
    7      1   0   1   1      2
    8      1   0   1   1      2
    9      1   0   1   1      2
   10      1   0   1   1      2
   11      1   0   1   1      2
   12      1   0   1   1      2
   13      1   0   1   1      2
   14      1   0   1   1      2
   15      1   0   1   1      2
   16      1   0   1   1      2
   17      1   0   1   1      2
   18      1   0   1   1      2
   19      1   0   1   1      2
   20      1   0   1   1      2
   21      1   0   1   1      2
   22      1   0   1   1      2
   23      1   0   1   1      2
Sequence statistics for 48 entries from set ['nwrd', 'word']
Counter-balance table for orders up to 2:
Targets/Order O1     |  O2     |
    nwrd:      0 24  |  23  0  |
    word:     23  0  |   0 23  |
Correlations: min=-1 max=1 mean=-0.021 sum(abs)=47

If the cross-validation performance is better than chance, I am guessing it
must be RFE that is not working, and not the dataset. Would that be a
correct assumption?

And of course, thank you very much for your time!

Kimberly Zhou

Message: 2
> Date: Wed, 22 Jun 2011 20:32:57 -0400
> From: Yaroslav Halchenko <debian at onerussian.com>
> To: Development and support pf PyMVPA
>        <pkg-exppsy-pymvpa at lists.alioth.debian.org>
> Subject: Re: [pymvpa] RFE and dataset splits
> Message-ID: <20110623003257.GM17261 at onerussian.com>
> Content-Type: text/plain; charset=us-ascii
>
> well, it is indeed weird that it is always 0.5 ... code looks ok
>
> what if you share the details of the dataset:
>
> print avgds.summary()
>
> what is the cross-validation performance on full featureset using e.g.
> SMLR?  could it be indeed that there is no categorical information among
> the data or it is too hard to dig it out with RFE
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20110623/f00ac944/attachment.html>