[pymvpa] RFE and dataset splits
Kimberly Zhou
kyqzhou at gmail.com
Thu Jun 23 20:29:40 UTC 2011
That is the odd thing...cross-validation using linear SVM
clf = LinearCSVMC()
cvte = CrossValidation(clf, NFoldPartitioner(), enable_ca=['stats'])
cv_results=cvte(avgds)
gives a 64.6% accuracy, shouldn't RFE start at something like 36% error and
improve from there? (SMLR, slightly lower, 58.3% acc, but still better than
chance?)
Here are the dataset details:
Dataset: 48x135168 at float64, <sa: chunks,targets,time_coords,time_indices>,
<fa: voxel_indices>, <a: imghdr,imgtype,mapper,voxel_dim,voxel_eldim>
stats: mean=0.00193459 std=1.16806 var=1.36437 min=-139.459 max=116.782
Counts of targets in each chunk:
chunks\targets nwrd word
--- ---
0.0 1 1
1.0 1 1
2.0 1 1
3.0 1 1
4.0 1 1
5.0 1 1
6.0 1 1
7.0 1 1
8.0 1 1
9.0 1 1
10.0 1 1
11.0 1 1
12.0 1 1
13.0 1 1
14.0 1 1
15.0 1 1
16.0 1 1
17.0 1 1
18.0 1 1
19.0 1 1
20.0 1 1
21.0 1 1
22.0 1 1
23.0 1 1
Summary for targets across chunks
targets mean std min max #chunks
nwrd 1 0 1 1 24
word 1 0 1 1 24
Summary for chunks across targets
chunks mean std min max #targets
0 1 0 1 1 2
1 1 0 1 1 2
2 1 0 1 1 2
3 1 0 1 1 2
4 1 0 1 1 2
5 1 0 1 1 2
6 1 0 1 1 2
7 1 0 1 1 2
8 1 0 1 1 2
9 1 0 1 1 2
10 1 0 1 1 2
11 1 0 1 1 2
12 1 0 1 1 2
13 1 0 1 1 2
14 1 0 1 1 2
15 1 0 1 1 2
16 1 0 1 1 2
17 1 0 1 1 2
18 1 0 1 1 2
19 1 0 1 1 2
20 1 0 1 1 2
21 1 0 1 1 2
22 1 0 1 1 2
23 1 0 1 1 2
Sequence statistics for 48 entries from set ['nwrd', 'word']
Counter-balance table for orders up to 2:
Targets/Order O1 | O2 |
nwrd: 0 24 | 23 0 |
word: 23 0 | 0 23 |
Correlations: min=-1 max=1 mean=-0.021 sum(abs)=47
If the cross-validation performance is better than chance, I am guessing it
must be RFE that is not working, and not the dataset. Would that be a
correct assumption?
And of course, thank you very much for your time!
Kimberly Zhou
Message: 2
> Date: Wed, 22 Jun 2011 20:32:57 -0400
> From: Yaroslav Halchenko <debian at onerussian.com>
> To: Development and support pf PyMVPA
> <pkg-exppsy-pymvpa at lists.alioth.debian.org>
> Subject: Re: [pymvpa] RFE and dataset splits
> Message-ID: <20110623003257.GM17261 at onerussian.com>
> Content-Type: text/plain; charset=us-ascii
>
> well, it is indeed weird that it is always 0.5 ... code looks ok
>
> what if you share the details of the dataset:
>
> print avgds.summary()
>
> what is the cross-validation performance on full featureset using e.g.
> SMLR? could it be indeed that there is no categorical information among
> the data or it is too hard to dig it out with RFE
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20110623/f00ac944/attachment.html>
More information about the Pkg-ExpPsy-PyMVPA
mailing list