[pymvpa] searchlight for data with different runs with different masks
Yaroslav Halchenko
debian at onerussian.com
Tue Jan 5 20:56:19 UTC 2016
On Tue, 05 Jan 2016, Kaustubh Patil wrote:
> Hi Yaroslav,
> I hope you had some good downtime during holidays.
> I am wondering if there is any straightforward solution for getting
> balanced accuracy using PyMVPA?
ah -- thanks for the buzz
looking at the dataset summary I would recommend
1. assign coarser chunks (e.g. group each two chunks), smth like
ds.sa['runs'] = ds.sa.chunks # so you still have a "record"
ds.sa.chunks = ds.sa.chunks // 2
this is so you don't have some runs where you have only 2 samples of a
category, which would widen distribution of mean_error for that CV fold
2. use Balancer to balance those training/testing splits
so if you want to do nfold, just use following partitioner
partitioner = ChainNode([NFoldPartitioner(cvtype=1),
Balancer(attr='targets',
count=5, # 5 is abitrarily "high"
limit='partitions',
apply_selection=True
)],
space='partitions')
instead of a plain NFoldPartitioner
Let me know how it goes
> Best regards
> Kaustubh
> PS: Happy new year!!!
> On Sat, Dec 19, 2015 at 11:50 PM, Kaustubh Patil
> <kaustubh.patil at gmail.com> wrote:
> Hi Yaroslav, thanks for help.
> Here is summary for the dataset of one subject:
> Dataset: 180x71039 at float64, <sa: chunks,regressors,targets,volumes>,
> <fa: voxel_indices>, <a:
> add_regs,imgaffine,imghdr,imgtype,mapper,voxel_dim,voxel_eldim>
> stats: mean=3.55452e-15 std=1 var=1 min=-4.07657 max=3.95942
> Counts of targets in each chunk:
> A chunks\targetsA 0A A 1
> A A A A A A A A A A A A A A A A --- ---
> A A A A A A A 1A A A A A A A A 7A A 11
> A A A A A A A 2A A A A A A A A 16A 2
> A A A A A A A 3A A A A A A A A 16A 2
> A A A A A A A 4A A A A A A A A 13A 5
> A A A A A A A 5A A A A A A A A 9A A 9
> A A A A A A A 6A A A A A A A A 9A A 9
> A A A A A A A 7A A A A A A A A 13A 5
> A A A A A A A 8A A A A A A A A 15A 3
> A A A A A A A 9A A A A A A A A 13A 5
> A A A A A A 10A A A A A A A A 9A A 9
> Summary for targets across chunks
> A targets mean std min max #chunks
> A A A 0A A A A A 12A 3.1A 7A A 16A A A 10
> A A A 1A A A A A A 6A 3.1A 2A A 11A A A 10
> Summary for chunks across targets
> A chunks mean std min max #targets
> A A A 1A A A A A 9A A 2A A 7A A 11A A A A 2
> A A A 2A A A A A 9A A 7A A 2A A 16A A A A 2
> A A A 3A A A A A 9A A 7A A 2A A 16A A A A 2
> A A A 4A A A A A 9A A 4A A 5A A 13A A A A 2
> A A A 5A A A A A 9A A 0A A 9A A 9A A A A A 2
> A A A 6A A A A A 9A A 0A A 9A A 9A A A A A 2
> A A A 7A A A A A 9A A 4A A 5A A 13A A A A 2
> A A A 8A A A A A 9A A 6A A 3A A 15A A A A 2
> A A A 9A A A A A 9A A 4A A 5A A 13A A A A 2
> A A 10A A A A A 9A A 0A A 9A A 9A A A A A 2
> Sequence statistics for 180 entries from set [0, 1]
> Counter-balance table for orders up to 2:
> Targets/OrderA O1A A A A |A A O2A A A A |
> A A A A A 0:A A A A A 119A 1A |A 118A 2A |
> A A A A A 1:A A A A A A 0A 59A |A A 0A 58A |
> Correlations: min=-0.5 max=0.98 mean=-0.0056 sum(abs)=79
> On Sat, Dec 19, 2015 at 11:00 PM, Yaroslav Halchenko
> <debian at onerussian.com> wrote:
> On Sat, 19 Dec 2015, Kaustubh Patil wrote:
> > Thanks a lot Yaroslav. I am following a procedure as described below
> please
> > letA me know if it has any clear or potential problems. I am also
> throwing in
> > another questions here but can start another thread if its worth.
> > 1) Alignment procedure: Align all the runs to the middle volume of
> run1
> > (example_func from fsl). Use the mask that was generated by fsl form
> run1.
> ok
> > 2) MVPA: do the classifiers give balanced accuracy as my datasets
> are not
> > balanced?
> might need rebalancing.A post output of your dataset.summary() here
> > Also, is it recommended to run searchlight on betamap (after fitting
> > hrf) or zscored raw data?
> whatever fits your bill.A usually betamaps, and possibly z-scored
> (per
> run or across all)
> > If betamap after fitting hrf then I using the
> > provided function I get only one parameter per target per run, is
> that how its
> > supposed to be?
> ok if that is what you want to classify... some times you might
> want model each trial separately.A there is no universal answer.
--
Yaroslav O. Halchenko
Center for Open Neuroscience http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419
WWW: http://www.linkedin.com/in/yarik
More information about the Pkg-ExpPsy-PyMVPA
mailing list