[pymvpa] bug reporting

Tue Sep 1 17:25:27 UTC 2009

> if you are using Debian, then you are welcome to use
> reportbug python-mvpa
> but in general -- just complaint to the mailing list -- we haven't had
> much of need for full-featured bug tracker yet (but things might change
> ;))

Ok - I'm running Mint, and don't seem to have reportbug installed, so I'll
just stick with posting to list.
A problem I ran into was that using kNN with 'majority' voting caused a
crash. Copying the dataset example in the docs, the following code
reproduces that:

import numpy as N
from mvpa.datasets import Dataset
from mvpa.clfs.knn import kNN

data = Dataset(samples=N.random.normal(size=(10,5)), labels=1)
test = N.random.normal(size=(1,5))

k = kNN(k=1, voting='majority') #other values of k cause the same result
k.train(data)
k.predict(test)

Which gives me the stack trace:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/var/lib/python-support/python2.5/mvpa/clfs/base.py", line 436, in
predict
    result = self._predict(data)
  File "/var/lib/python-support/python2.5/mvpa/clfs/knn.py", line 148, in
_predict
    results = [vfx(knn) for knn in knns]
  File "/var/lib/python-support/python2.5/mvpa/clfs/knn.py", line 173, in
getMajorityVote
    votes[self.__labels[nn]] += 1
  File "/var/lib/python-support/python2.5/mvpa/misc/state.py", line 1088,
in __getattribute__
    return _object_getattribute(self, index)
AttributeError: 'kNN' object has no attribute '_kNN__labels'

I opened that file, and guessing at the following change means it now runs
for me:

160c160
<             votes[self.__labels[nn]] += 1
---
>             votes[self.__data.labels[nn]] += 1

The thing is, the classification rate I get from this on my test set is
only ~1%, whereas the kNN weighted voting gets over 70%. The discrepancy
seems odd (especially as I have the same number of samples for each
class), which makes me wonder whether I've 'corrected' this to the wrong
thing!

I'm running the hardy-backported version, 0.4.2 (rather than the 0.4.0
that I originally emailed about).

> heh -- sorry about that. The reason is simple -- our server (and us) has
> moved to a new location,
...
>'Show source' which would reveal original ReST text) -- thanks in
> advance!

Np :) I don't get on too well with git, so I'll probably send you source
diffs instead if that's ok.
I tend to start more with examples than APIs when getting into something
new, so the mistake I made was that looking through the examples under the
Datasets section, and the first part of the Dataset class description, I
got the impression that labels could be strings.
(In particular, from the dataset.labels[1] += "_bad" text at
http://www.pymvpa.org/modref/mvpa.datasets.base.html#mvpa.datasets.base.Dataset.)

This caused several different ValueErrors from the classifiers I tried
(GPR, BLR, RidgeReg); eventually a scipy error that was trying to convert
a class label to a float twigged things for me. Incidentally, why *do* the
labels get converted to floats? It seems counter-intuitive to return
things of a different type (floats to ints) when they are just label
markers.

Anyway, I just wanted to add a note to the docs to mention this in case
anyone else makes the same mistake! I'll sort this out later and post the
results to look over.

Thanks again,
Tara