[pymvpa] Question about classifiers

Wed Apr 1 11:53:32 UTC 2015

Thanks for your responses, 

To answer your questions, the data was z-scored. One of the classes is 'special' indeed, but it isn't the second one, where the large classification numbers occur in the example I sent you, and that's why it was odd. taking your advice I tried SMLR and the problem doesn't occur with it. So I guess I'll be using that, but I am not sure I know why SVD was giving me these confusion matrices. 

Thanks again!
Serin
________________________________________
From: Pkg-ExpPsy-PyMVPA [pkg-exppsy-pymvpa-bounces+serin.atiani=mail.mcgill.ca at lists.alioth.debian.org] on behalf of pkg-exppsy-pymvpa-request at lists.alioth.debian.org [pkg-exppsy-pymvpa-request at lists.alioth.debian.org]
Sent: Sunday, March 29, 2015 9:52 PM
To: pkg-exppsy-pymvpa at lists.alioth.debian.org
Subject: Pkg-ExpPsy-PyMVPA Digest, Vol 85, Issue 5

Send Pkg-ExpPsy-PyMVPA mailing list submissions to
        pkg-exppsy-pymvpa at lists.alioth.debian.org

To subscribe or unsubscribe via the World Wide Web, visit
        http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa

or, via email, send a message with subject or body 'help' to
        pkg-exppsy-pymvpa-request at lists.alioth.debian.org

You can reach the person managing the list at
        pkg-exppsy-pymvpa-owner at lists.alioth.debian.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Pkg-ExpPsy-PyMVPA digest..."

Today's Topics:

   1. Re: Question about classifiers (Nick Oosterhof)
   2. Re: Rapid ER design (Jeffrey Bloch)
   3. Re: Question about classifiers (Yaroslav Halchenko)

----------------------------------------------------------------------

Message: 1
Date: Sun, 29 Mar 2015 18:01:26 +0200
From: Nick Oosterhof <n.n.oosterhof at googlemail.com>
To: Development and support of PyMVPA
        <pkg-exppsy-pymvpa at lists.alioth.debian.org>
Subject: Re: [pymvpa] Question about classifiers
Message-ID: <6C0975E3-5C28-4842-BF57-017F3FE06C7C at googlemail.com>
Content-Type: text/plain; charset=us-ascii

On 29 Mar 2015, at 09:50, Serin Atiani, Dr <serin.atiani at mail.mcgill.ca> wrote:

> When I use a SVM classifier, which I think  theoratically makes more sense to use with my data I get a strange cross validation confusion matrix with one row that has high numbers, and the rest is mostly zeros or ones.
>
> It doesn't look right, anybody has any thoughts about this?

Indeed, that looks weird. Did you apply any normalisation to the data (de-meaning or z-scoring)? Especially SVM classifiers may behave not so well when data is not normalised.
Also, apart from nearest neighbour (which usually does not perform well with fMRI data), have you tried another classifier such as LDA?

------------------------------

Message: 2
Date: Sun, 29 Mar 2015 15:53:53 -0400
From: Jeffrey Bloch <jeffreybloch1 at gmail.com>
To: Development and support of PyMVPA
        <pkg-exppsy-pymvpa at lists.alioth.debian.org>
Subject: Re: [pymvpa] Rapid ER design
Message-ID:
        <CAL0nPzhCH4QMOmh+2u=LOoHk3QCMqCgGMWikphaKbTJCdsG9ng at mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Dear Michael,

Thank you kindly for your responses.

1.  PyMVPA knows that my nifti file has 179 volumes, so it keeps getting
> mad when I try to create an attribute that has fewer items (say, a
> condition for each of the 110 events).  How can I make the software forget
> about the volumes and just focus on the timepoints of my events?  The
> original attribute file (with just chunks and condition) has to be 179
> items long, but 110 doesn't fit into 179 very nicely!  :)
>

You cannot make it forget. But you do not have to work with the original
time series and model your events to get parameter estimates.

        *** How can I do that in PyMVPA?  I realize that beta values for
each event can be used, but typically they would be derived from some other
software (ANFI, SPM, FSL).  I was hoping to avoid that initially, since I
am new to PyMVPA (obviously) and wanted to try and do the entire analysis
within the software.  On the other hand, if it is not worth it, and would
be much easier simply to start with betas from elsewhere, please let me
know so I can stop spinning my wheels.  :)

2. I am trying to get GLM estimates (per event/volume) using
> 'eventrelated_design' with 'hrf'.  I understand that it is possible to
> bring in betas from outsidePyMVPA, but I was hoping to keep it all "in
> house."  Even if I make my event list by multiplying by TR (as in the
> tutorial), the original dataset need is still 179 volumes, so the events
> and attributes don't line up.
>

HRF modeling doesn't require timing to be expressed in multiples of TR. It
would be best, if you would share the relevant parts of your code to get a
better picture of what you are doing.

          *** Right, but If I do make my dataset using the original time
series, I don't see how it is possible to use the 'eventrelated_dataset'
function given my timing issues.  Because of the mismatch between event
duration and TR, my attributes list with targets/chunks is simply not
accurate.  Since my dataset was made using this attribute and the original
time series, the number of events and time coordinates passed to the
function will not match.  Therefore, the events that I derive from
find_events (to feed into eventrelated_events) do not have the correct
onset times.

***I apologize if I've repeated myself, but I'm just struggling with the
concept.  Thanks again for taking the time to answer these basic
questions.  :)    I'm not sure what code would help you at this point, but
here ya go:

MAKING THE DATASET
>>> path = os.path.join('/home/bloch/data_for_Jeff')
>>> bold_fname = os.path.join(path, 'sub001', 'BOLD', 'task001_run001',
'bold.nii.gz')
>>> attr_fname = os.path.join(path, 'sub001', 'BOLD', 'task001_run001',
'Attributes_by_volume.txt'
>>> attr = SampleAttributes(attr_fname)
>>> ds1 = fmri_dataset(samples=bold_fname, targets=attr.targets,
chunks=attr.chunks)

[Attribute list above has 179 items.  If I try to give it only the 110
events with associated time stamps, it fails]

>>> evds = eventrelated_dataset(ds1,
...                             events,
...                             model='hrf',
...                             time_attr='time_coords'
...                             condition_attr=('targets', 'chunks'))

[I'm not able to give appropriate time coords here -- for 110 events.
Also, the information about the included events is not accurate due to
timing]

EXAMPLE OF EVENTS:
{'chunks': 0.0, 'duration': 5.8, 'onset': 20.0, 'targets': 'Monkey'}
{'chunks': 0.0, 'duration': 8.7, 'onset': 24.0, 'targets': 'Wrench'}
{'chunks': 0.0, 'duration': 2.9, 'onset': 30.0, 'targets': 'Monkey'}
{'chunks': 0.0, 'duration': 5.8, 'onset': 32.0, 'targets': 'Screwdriver'}
{'chunks': 0.0, 'duration': 8.7, 'onset': 36.0, 'targets': 'Elephant'}
{'chunks': 0.0, 'duration': 5.8, 'onset': 42.0, 'targets': 'Hammer'}

[The onsets/durations above aren't correct.  I had to guess how long they
would be by extrapolating my 110 events into the 179 volumes.  E.G., there
was only 1 monkey stimulus, but in my attribute file I put it twice two
"cover" the TRs during which it occurred.  Find_events interprets this as 2
events, and thus doubles the duration time I give it.  The onsets
themselves take on multiples of TR value, which in itself is incorrect]

On Sun, Mar 29, 2015 at 4:47 AM, Michael Hanke <michael.hanke at gmail.com>
wrote:

> Hi,
>
> On Fri, Mar 27, 2015 at 11:55 PM, Jeffrey Bloch <jeffreybloch1 at gmail.com>
> wrote:
>
>> Dear All,
>>
>> Hello!  I am in the middle of setting up a rapid event-related design
>> analysis, and was hoping I could ask a few questions.  I've been working
>> through this for a while, but to no avail.  Thanks in advance!
>>
>> My main confusion(s) stem from the fact that my design has events that
>> are not multiples of the TR.  Namely, there are 110 events (of 3s
>> duration), but my TR is 2s.  So, technically, any volume could have
>> multiple conditions, and/or conditions can spread over more than one TR.
>> What is the best way of dealing with this when setting up the analysis?
>>
>
> The HRF-modeling approach seems like a good choice.
>
> 1.  PyMVPA knows that my nifti file has 179 volumes, so it keeps getting
>> mad when I try to create an attribute that has fewer items (say, a
>> condition for each of the 110 events).  How can I make the software forget
>> about the volumes and just focus on the timepoints of my events?  The
>> original attribute file (with just chunks and condition) has to be 179
>> items long, but 110 doesn't fit into 179 very nicely!  :)
>>
>
> You cannot make it forget. But you do not have to work with the original
> time series and model your events to get parameter estimates.
>
> 2. I am trying to get GLM estimates (per event/volume) using
>> 'eventrelated_design' with 'hrf'.  I understand that it is possible to
>> bring in betas from outsidePyMVPA, but I was hoping to keep it all "in
>> house."  Even if I make my event list by multiplying by TR (as in the
>> tutorial), the original dataset need is still 179 volumes, so the events
>> and attributes don't line up.
>>
>
> HRF modeling doesn't require timing to be expressed in multiples of TR. It
> would be best, if you would share the relevant parts of your code to get a
> better picture of what you are doing.
>
> 3.  I find that my "time_coords" are always blank (all zeros) when I
>> create my datasets, but several functions seem to require this
>> information.  Similarly, the time_indices are always 179 due to the nifti
>> volumes, but that's not very useful for me (again, 110 events, sorry to be
>> repetitive).
>>
>
> Please also share the parts of your code that create your datasets. Also,
> please be aware that you can simple add the time_coords attribute to a
> dataset at any time.
>
>
>> Lastly (I hope!), is it possible using eventrelated_dataset to get
>> parameters for each event (instead of each target)?  I was hoping to be
>> able to tease apart relationships at an event by event basis.
>>
>
> Yes, you can do that. But of course a per event modeling tends to be
> noisier than a coarser one.
>
> Michael
>
> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20150329/93945b31/attachment-0001.html>

------------------------------

Message: 3
Date: Sun, 29 Mar 2015 21:09:30 -0400
From: Yaroslav Halchenko <debian at onerussian.com>
To: Development and support of PyMVPA
        <pkg-exppsy-pymvpa at lists.alioth.debian.org>
Subject: Re: [pymvpa] Question about classifiers
Message-ID: <20150330010930.GR29626 at onerussian.com>
Content-Type: text/plain; charset=utf-8

On Sun, 29 Mar 2015, Serin Atiani, Dr wrote:

> Hello,

> I am doing a first brush analysis on my data using pymvpa. When I use a SVM classifier, which I think  theoratically makes more sense to use with my data I get a strange cross validation confusion matrix with one row that has high numbers, and the rest is mostly zeros or ones. I have 17 different classes that I train the classifier on, and this is an example of the cross validation confusion matrix I get

> [[ 2  0  0  0  0  0  0  0  1  0  1  0  1  0  0  0  0]
>  [16 17 17 16 16 16 17 17 17 17 17 15 16 16 16 16 16]

multiclass SVM does pair-wise classifications and then votes for a new
sample among all those pair-wise classes.  Then ties (two classes have
equal number of "votes") are not broken randomly but rather all fall
into a single class. And if there is no clear signal for classes, it is
quite common to see such ties to happen.  And it might be your 2nd class
is somewhat special here ... is it?  may be it is the only class which
is anyhow different from others? it seems that your classes are
balanced out to all have 20 samples, but is it the case across runs?
what is the output of print dataset.summary() for your dataset?

as Nick suggested you might have a "better" luck with classifier which
doesn't exhibit similar behavior, e.g.  SMLR or GNB -- how does
your confusion matrix look alike?

> Reducing the number of features, makes things a bit better but I still get one row that has large numbers. I tried also to group my classes and train the SVM classifier on the two most distinguishable ones, Nearest neighbour gives a 80% accuracy, with SVM it is slightly above chance with a confusion matrix that looks like this again.

> [[5 0   ]
>  [15 20]

> It doesn't look right, anybody has any thoughts about this?

again, knowing more about your analysis/paradigm, and
preprocessing (as nick followed up) could help us to help you.

--
Yaroslav O. Halchenko, Ph.D.
http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org
Research Scientist,            Psychological and Brain Sciences Dept.
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834                       Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik

------------------------------

Subject: Digest Footer

_______________________________________________
Pkg-ExpPsy-PyMVPA mailing list
Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa

------------------------------

End of Pkg-ExpPsy-PyMVPA Digest, Vol 85, Issue 5
************************************************