[pymvpa] problem with high dimensional dataset
Alok Deshpande
alokdesh at gmail.com
Fri Mar 26 23:58:31 UTC 2010
Hi Yaroslav,
Thanks for your suggestions. Sorry that I am replying late, I was busy in
some other stuff, so din't get time to check on your suggestions.
here is some additional info about my dataset (named mdata):
>>> mdata.samples.shape
(41, 8673536)
>>> mdata.samples.dtype
dtype('float32')
>>> mdata.samples.max()
23559.484
>>> mdata.samples.min()
-1214.7351
print mdata.summary() goes into infinite loop!
following is the output of print mvpa.wtf() (before going into infinite
loop)
Current date: 2010-03-26 18:52
PyMVPA:
Version: 0.4.4
Path: /var/lib/python-support/python2.5/mvpa/__init__.pyc
Version control (GIT):
GIT information could not be obtained due
"/var/lib/python-support/python2.5/mvpa/.. is not under GIT"
SYSTEM:
OS: posix Linux 2.6.26-1-amd64 #1 SMP Mon Dec 15 17:25:36 UTC
2008
Distribution: debian/5.0
EXTERNALS:
Present: cPickle, ctypes, gzip, libsvm, libsvm verbosity control,
matplotlib, mdp, mdp ge 2.4, nifti, nifti ge 0.20090205.1, numpy, pylab,
pylab plottable, pywt, reportlab, scipy, sg_fixedcachesize, shogun,
shogun.krr, shogun.mpd
Absent: atlas_fsl, atlas_pymvpa, elasticnet, glmnet, good
scipy.stats.rdist, good scipy.stats.rv_discrete.ppf, griddata, hcluster,
lars, lxml, nose, openopt, pywt wp reconstruct, pywt wp reconstruct fixed,
rpy, running ipython env, sg ge 0.6.4, shogun.lightsvm, shogun.svrlight,
weave
Versions of critical externals:
ctypes : 1.0.3
matplotlib : 0.98.1
nifti : 0.20090303.1
numpy : 1.1.0
pywt : 0.1.7
scipy : 0.6.0
shogun : v0.6.3_r3165_2008-06-13_11:03_
Matplotlib backend: GTKAgg
RUNTIME:
PyMVPA Environment Variables:
PyMVPA Runtime Configuration:
[externals]
have griddata = no
have atlas_pymvpa = no
have good scipy.stats.rdist = no
have pylab plottable = yes
have pywt wp reconstruct = no
have mdp = yes
have lxml = no
have running ipython env = no
have sg_fixedcachesize = yes
have elasticnet = no
have shogun.mpd = yes
have matplotlib = yes
have pywt wp reconstruct fixed = no
have scipy = yes
have reportlab = yes
have openopt = no
have libsvm = yes
have nifti ge 0.20090205.1 = yes
have nose = no
have weave = no
have atlas_fsl = no
have ctypes = yes
have hcluster = no
have sg ge 0.6.4 = no
have good scipy.stats.rv_discrete.ppf = no
have libsvm verbosity control = yes
have mdp ge 2.4 = yes
have shogun.svrlight = no
have rpy = no
have shogun = yes
have glmnet = no
have lars = no
have nifti = yes
have shogun.krr = yes
have cpickle = yes
have numpy = yes
have pylab = yes
have shogun.lightsvm = no
have pywt = yes
have gzip = yes
[general]
verbose = 1
Process Information:
Name: python
State: R (running)
Tgid: 6316
Pid: 6316
PPid: 6152
TracerPid: 0
Uid: 1001 1001 1001 1001
Gid: 1001 1001 1001 1001
FDSize: 256
Groups: 1001
VmPeak: 692040 kB
VmSize: 630276 kB
VmLck: 0 kB
VmHWM: 73408 kB
VmRSS: 73408 kB
VmData: 125616 kB
VmStk: 228 kB
VmExe: 1172 kB
VmLib: 71500 kB
VmPTE: 1080 kB
Threads: 2
SigQ: 0/71680
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000001001000
SigCgt: 0000000180000002
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: ffffffffffffffff
Cpus_allowed: 000000ff
Cpus_allowed_list: 0-7
Mems_allowed: 00000000,00000001
Mems_allowed_list: 0
voluntary_ctxt_switches: 1482
nonvoluntary_ctxt_switches: 101
In order to check for improper scaling, I rescaled the whole data so that it
lies in [0,1]^n but the problem of going into infinite loop still persists.
Also, you asked about C=1 (I assume you meant the cvtype in Nfoldsplitter)
When I try with any other C value > 1, it throws back the following error:
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/var/lib/python-support/python2.5/mvpa/datasets/splitters.py", line
199, in __call__
cfgs = self.splitcfg(dataset)
File "/var/lib/python-support/python2.5/mvpa/datasets/splitters.py", line
388, in splitcfg
return self._getSplitConfig(eval('dataset.unique' + self.__splitattr))
File "/var/lib/python-support/python2.5/mvpa/datasets/splitters.py", line
626, in _getSplitConfig
self.__cvtype)]
File "/var/lib/python-support/python2.5/mvpa/misc/support.py", line 265,
in getUniqueLengthNCombinations
return _getUniqueLengthNCombinations_binary(L, n, sort=True)
File "/var/lib/python-support/python2.5/mvpa/misc/support.py", line 236,
in _getUniqueLengthNCombinations_binary
for X in range(2**N):
OverflowError: range() result has too many items
Has anybody worked on the dataset of this size successfully before? My
question is not related to whether SVM successfully classifies or not, its
related to whether it gives back output within reasonable time on such large
scale dataset?
Thanks in advance,
Alok
On Mon, Mar 22, 2010 at 6:58 PM, Yaroslav Halchenko
<debian at onerussian.com>wrote:
> Hi Alok,
>
> My guess is that you are using C=1 (or some other rigid value positive
> value) and/or improperly scaled data... then libsvm's optimizer might
> get into such loop.
>
> For best troubleshooting provide
>
> print mvpa.wtf()
>
> print dataset.summary()
> for your dataset which you try to process
>
> and actual classifier you are using
>
> On Mon, 22 Mar 2010, Alok Deshpande wrote:
>
> > Hi all,
> > I am currently working on basic classification problems on resting
> > state fMRI datasets (like gender differences, for example) The
> > dimensionality of feature vector is pretty large (128 time points X 33
> > X 64 X 64) I am working on a linux server with 8 cores and suff RAM.
> > Following is the output of command 'free -m'
> > free -m
> > total used free shared buffers
> > cached
> > Mem: 8006 1254 6752 0 100
> > 854
> > -/+ buffers/cache: 299 7706
> > Swap: 23454 209 23244
> > When I try to train a simple linear SVM classifier on the dataset (I
> --
> .-.
> =------------------------------ /v\ ----------------------------=
> Keep in touch // \\ (yoh@|www.)onerussian.com
> Yaroslav Halchenko /( )\ ICQ#: 60653192
> Linux User ^^-^^ [175555]
>
>
>
> _______________________________________________
> Pkg-ExpPsy-PyMVPA mailing list
> Pkg-ExpPsy-PyMVPA at lists.alioth.debian.org
> http://lists.alioth.debian.org/mailman/listinfo/pkg-exppsy-pymvpa
>
--
Alok S. Deshpande
Graduate Student
Electrical & Computer Engineering Department
University of Wisconsin Madison
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/pkg-exppsy-pymvpa/attachments/20100326/8b501646/attachment-0001.htm>
More information about the Pkg-ExpPsy-PyMVPA
mailing list