[pymvpa] RFE and dataset splits
Yaroslav Halchenko
debian at onerussian.com
Wed Jul 6 20:32:16 UTC 2011
Hi Kimberly,
sorry for the delay -- we are finally back from HBM and getting back
into the routine pace... I am about to start the RFE on your dataset
but I immediately spotted that data wasn't normed to become
SVM-friendly:
In [8]: print ds.summary()
Dataset: 48x135168 at float64, <sa: chunks,targets,time_coords,time_indices>, <fa: voxel_indices>, <a: imghdr,imgtype,mapper,voxel_dim,voxel_eldim>
stats: mean=0.00193459 std=1.16806 var=1.36437 min=-139.459 max=116.782
So, please do (in your script to may be see if everything is ok
yourself):
1.
zscore(ds, chunks_attr=None)
which would do standartization through all samples (it is meaningless to
do it per each chunk in your case since you have only 2 samples per each
chunk), which would lead you to
stats: mean=9.39588e-16 std=0.574634 var=0.330204 min=-6.70418 max=6.77941
2. I guess you didn't really mask (excluded non-brain voxels) the
volume, thus have lots of invariant features (e.g. having 0s through
all samples):
so you could get rid of them:
[25]: ds = remove_invariant_features(ds)
In [29]: print ds.summary()
Dataset: 48x44633 at float64, <sa: chunks,targets,time_coords,time_indices>, <fa: voxel_indices>, <a: imghdr,imgtype,mapper,voxel_dim,voxel_eldim>
stats: mean=2.84548e-15 std=1 var=1 min=-6.70418 max=6.77941
which should just help it to get through faster ;-)
let's hope that it was that ;)
On Thu, 23 Jun 2011, Yaroslav Halchenko wrote:
> On Thu, 23 Jun 2011, Kimberly Zhou wrote:
> > cv_results=cvte(avgds)
> > gives a 64.6% accuracy, shouldn't RFE start at something like 36% error
> > and improve from there? (SMLR, slightly lower, 58.3% acc, but still
> > better than chance?)
> > >...<
> > If the cross-validation performance is better than chance, I am
> > guessing it must be RFE that is not working, and not the dataset. Would
> > that be a correct assumption?
> I would come to the same conclusion ;)
> Is there a chance you could share that dataset? would make it easier to
> figure out WTF
> just do
> h5save('/tmp/dataset.hdf5', avgds)
> and make that file available online or in dropbox or whatever other
> means... not sure if email would tollerate the size though but you could
> try emailing it to debian at onerussian.com
--
=------------------------------------------------------------------=
Keep in touch www.onerussian.com
Yaroslav Halchenko www.ohloh.net/accounts/yarikoptic
More information about the Pkg-ExpPsy-PyMVPA
mailing list