[pymvpa] RFE & Permutation

Thu Jan 28 07:24:56 UTC 2010

Hi,

> could you share more details about your data? number of labels/samples?
> what error are you getting from cv (may be confusion matrix).

its a binary clf-problem. 144 samples (72 each class); 12 chunks with
counterbalanced amount of samples per class. Accuracy btw 60-85%,
depending on the subject. A previous permutationtest with feature
selection and FixedNElementTailSelector showed that there might be some
true signal in the data... but for some subjects it was significant only
at p < .05. In the case of RFE + MC I am getting p-values below
0.00000001 for the same subjects. (Accuracies are a little bit lower for
RFE compared to fixed elements feature selection)

> 
> If your classifier manages actually to learn data then you can expect
> null_prob be actually very very small, so everything depends on your
> data and results.
> 
> we also might actually "fix" null_prob by accounting for correct
> labeling as well which would then assure non-0 values for null_prob in
> all cases...
> 
>> Btw, how should this line look like for upcoming version 0.5.0?:
>> sensitivity_analyzer=rfesvm_split.getSensitivityAnalyzer(combiner=FirstAxisMean,
>> transformer=N.abs
> 
>> I guess "transformer=N.abs" becomes "mapper=absolute_features()", but
>> what about the combiner? I am getting a TypeError here:
> 
> any modifications of sensitivities now could be absorbed into the provided
> mapper which could be arbitrarily complex.  If it is a simple function to
> perform per each feature, you just need to wrap it into FxMapper (just
> see source of absolute_features).
> 
> absolute_features (as well as maxofabs_sample and sumofabs_sample) are just
> convenience factories, so for instance sumofabs_sample
> 
> def sumofabs_sample():
>     """Returns a mapper that returns the sum of absolute values of all samples.
>     """
>     return FxMapper('samples', lambda x: N.abs(x).sum())
> 
> is effectively what you need to replace combiner=FirstAxisMean,
> transformer=N.abs -- it just would lack division by number of items
> (sensitivities) but that would be irrelevant for ranking done by
> FeatureSelector.

thank you for the detailed answer!

> how does your cv.states.null_dist.dist_samples would look?

looks pretty 'okay' with mean about .5 -- so I guess everything is fine.

mmh... but still wondering why p-values are so low for RFE (compared to
feature selection with fixed number of elements).

Thanks!
 Matthias