[pymvpa] Regression estimates?

Thu Apr 7 16:19:13 UTC 2011

Hi Everyone:

Thanks for your responses!  I figured I should follow up and let you
know my progress.

First, a bit more description of the data:

The 20 samples will definitely grow as we get more participants.
Essentially, I'm trying to predict individual differences in various
psychological measures (e.g., working memory capacity, personality
traints, etc...) based on functional activity during a task.  This is
similar to classifying whether or not people have some neurological
disease from looking at their structural MRIs.

The reason I have so many features is that I have three conditions in
the task and ran an FIR with 8 time points for each condition, leaving
me with 3*8*nvoxels per sample (in my case where the data have been
upsampled, this leaves me with about 3 million features).

To tame the data and attempt to prevent over fitting I wanted to
reduce those features down.  Running an SVD on all those data with
loads of features and only 20 samples didn't converge, so I ended up
splitting the features into smaller chunks and concatenating SVDs from
less data.  After this step I had about 4K features down from around 3
million.

I then used a GLMNET_R sparse regression classifier to predict
individual psychological measures at a time.  This was able to
replicate previously published data, for example I can predict with an
R=.6 what someone's neuroticism level will be (it turns out it's
pretty much all amygdal activity for that one, which is already in the
literature).  On all my other measures I'm doing quite poorly,
however.

What I really wanted to do was run either a partial least squares
regression or a cannonical correlation on these data, trying to
predict all psychological measures at once to see interactions between
them, but all my attempts at that (with R and matlab because I don't
have python code for either) have not worked at all.  I got the PLS
regression to run (via RPy2), but it didn't even replicate my other
analysis, failing to fit anything.

So, I'm still plugging away, but if anyone has any thoughts or ideas,
I'd love to hear them.

Best,
Per

On Thu, Apr 7, 2011 at 12:03 PM, Yaroslav Halchenko
<debian at onerussian.com> wrote:
> yeap! and that is why I was also skeptical about results based on
> 10-20 samples ;)
>
> On Thu, 07 Apr 2011, Emanuele Olivetti wrote:
>
>> Cute indeed :-). Figure 1 is pretty scary, especially if we replace
>> "number of classifiers" with "number of features". Of course the
>> assumption is that classifiers (or features) are independent. But still...
>
>> Best,
>
>> E.
> --
> =------------------------------------------------------------------=
> Keep in touch                                     www.onerussian.com
> Yaroslav Halchenko                 www.ohloh.net/accounts/yarikoptic
>
> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa
>