[Pkg-gridengine-devel] Bug#543649: gridengine-exec: sge_execd very CPU hungry for no apparent reason
Mario Lang
mlang at tugraz.at
Wed Aug 26 10:50:40 UTC 2009
Package: gridengine-exec
Version: 6.2-4
Severity: minor
sge_execd appears to be very CPU hungry for no apparent reason.
The following is the topmost lines of "top" on my execution host,
currently running a MPICH2 job with 16 processes. The execution
host has 32 cores:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
25089 lang 20 0 3029m 2.9g 6772 R 100 2.3 4:56.87 hpcc_acml_mpich
25093 lang 20 0 3030m 2.9g 6892 R 100 2.3 4:56.82 hpcc_acml_mpich
25079 lang 20 0 3030m 2.9g 6984 R 100 2.3 4:56.90 hpcc_acml_mpich
25080 lang 20 0 3030m 2.9g 6460 R 100 2.3 4:56.96 hpcc_acml_mpich
25081 lang 20 0 3030m 2.9g 6468 R 100 2.3 4:57.00 hpcc_acml_mpich
25082 lang 20 0 3029m 2.9g 7108 R 100 2.3 4:56.99 hpcc_acml_mpich
25083 lang 20 0 3028m 2.9g 5780 R 100 2.3 4:56.93 hpcc_acml_mpich
25084 lang 20 0 3029m 2.9g 6856 R 100 2.3 4:56.91 hpcc_acml_mpich
25085 lang 20 0 3030m 2.9g 6528 R 100 2.3 4:56.90 hpcc_acml_mpich
25086 lang 20 0 3030m 2.9g 6520 R 100 2.3 4:56.83 hpcc_acml_mpich
25087 lang 20 0 3030m 2.9g 7696 R 100 2.3 4:56.96 hpcc_acml_mpich
25088 lang 20 0 3030m 2.9g 6404 R 100 2.3 4:56.23 hpcc_acml_mpich
25090 lang 20 0 3030m 2.9g 6884 R 100 2.3 4:56.78 hpcc_acml_mpich
25091 lang 20 0 3030m 2.9g 6848 R 100 2.3 4:56.83 hpcc_acml_mpich
25092 lang 20 0 3030m 2.9g 7928 R 100 2.3 4:56.91 hpcc_acml_mpich
25094 lang 20 0 3029m 2.9g 6028 R 100 2.3 4:56.81 hpcc_acml_mpich
30263 sgeadmin 20 0 118m 2232 1728 S 38 0.0 11:54.75 sge_execd
$ ps auxw|grep 30263
sgeadmin 30263 0.4 0.0 121440 2244 ? Sl Aug24 12:44 /usr/lib/gridengine/sge_execd
$ date
Wed Aug 26 12:27:47 CEST 2009
sge_execd is running since two days now, and has already consumed 12
minutes of CPU time (on a 2.3 GHZ Opteron). Given that I have just run
a few test jobs, this looks very wasteful. sge_execd seems to bounce
around between 5% and sometimes even 30% CPU use during execution of
this parallel job.
Attached is the output of the following command:
# time strace -r -osge_execd.strace -p30263
Process 30263 attached - interrupt to quit
^CProcess 30263 detached
real 2m53.132s
user 0m2.104s
sys 0m14.473s
# wc -l sge_execd.strace
508030 sge_execd.strace
500k syscalls in 3 minutes (31MB unbzipped).
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sge_execd.strace.bz2
Type: application/octet-stream
Size: 709250 bytes
Desc: strace of sge_execd
URL: <http://lists.alioth.debian.org/pipermail/pkg-gridengine-devel/attachments/20090826/574606e8/attachment-0001.obj>
-------------- next part --------------
--
Regards,
Mario Lang
Graz University of Technology mailto:mlang at TUGraz.at
Department Computing http://www.ZID.TUGraz.at/lang/
Phone: +43 (0) 316 / 873 - 6897
/____________________________________________________________/
/_Apparently a teacher has been arrested in the UK in possession_/
/of a compass, protractor, and straight edge. It is claimed he is a/
/member of the Al Gebra movement, bearing weapons of math instruction/
More information about the Pkg-gridengine-devel
mailing list