[pymvpa] 回复：Re: pymvpa help

Tue Feb 16 02:41:04 UTC 2016

On Tue, 16 Feb 2016, ibinliu at sina.com wrote:
>    ''how many tries for each of those 2 conditions do you have?
>    how many subjects do you have?''
>    There are 30 trials for each condition and I have 48 subjects.
>    ''what was design (block or even-related, ISI, randomized how)?''
>    It was a variable-ISI ranging from 4-6 sec event-related
>    design experiments. 

ok, so no block design and if your designs were randomized across subjects then
may be it could all still be kosher ;-)  If they weren't randomized -- I
wouldn't trust results a bit...

>    ''is that bold.nii.gz already those beta maps or original data?''

>    Yes, the bold.nii.gz is already the beta maps. 

>    How could I set the partitioner for 5 fold or leave one out cross-validation?

leave one out -- simple, just assign each sample into individual
'chunk':

ds.sa['chunks'] = np.arange(len(ds))

if ds is your dataset.  Then note that you shouldn't zscore within
chunks, so do

zscore(ds, chunks_attr=None)

when normalizing your data.   And then use regular NFoldPartitioner(1)

The caveat I would expect, unlikely since you have 30 trials per each
category but still possible in some cases, that then classifier might
still learn the 'majority' class (since you would have 1 class with 29
samples and the other  with 30) leading to strong
'anti-classifications'.  Thus you might prefer to do as below for 5 fold
but with '30-fold'

5 fold:  since you have 30 divisible by 5, you can indeed get balanced
groups of trials.  The most logical would be to group all of your 30
trials per each category into 5 chunks, 6 samples each, i.e. something
like below (I also generated dummy dataset which would mimic yours)

In [10]: ds = mv.Dataset(np.arange(60), sa=dict(targets=np.repeat([1,2], 30), chunks=np.arange(60)))

In [11]: for t in ds.UT: ds.chunks[ds.T == t] = np.repeat(np.arange(5), 6)

In [12]: print ds.summary()
Dataset: 60x1 at int64, <sa: chunks,targets>
stats: mean=29.5 std=17.3181 var=299.917 min=0 max=59

Counts of targets in each chunk:
  chunks\targets  1   2
                 --- ---
        0         6   6
        1         6   6
        2         6   6
        3         6   6
        4         6   6

Summary for targets across chunks
  targets mean std min max #chunks
    1       6   0   6   6     5
    2       6   0   6   6     5

Summary for chunks across targets
  chunks mean std min max #targets
    0      6   0   6   6      2
    1      6   0   6   6      2
    2      6   0   6   6      2
    3      6   0   6   6      2
    4      6   0   6   6      2
Sequence statistics for 60 entries from set [1, 2]
Counter-balance table for orders up to 2:
Targets/Order O1     |  O2     |
      1:      29  1  |  28  2  |
      2:       0 29  |   0 28  |
Correlations: min=-1 max=0.93 mean=-0.017 sum(abs)=29

so just check your ds.summary()  if everything is balanced alright.  And then,
again, just use NFoldPartitioner(1), so it will first take out samples of one
chunk, than another, etc.

You could have also done 5 fold using NFoldPartitioner(6*2) + Sifter  to cross
validate across all possible combinations of 6 samples out for each of the 2
categories, but I don't think that you need such a hassle ;)

-- 
Yaroslav O. Halchenko
Center for Open Neuroscience     http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik