[Pkg-gridengine-devel] Bug#597750: gridengine-client: new queue mishandled by qstat -g c -q <name>

Ferenc Wagner wferi at niif.hu
Wed Sep 22 18:06:29 UTC 2010


Package: gridengine-client
Version: 6.2u5-1
Severity: normal

Hi,

I've got an SGE installation with two queues:

wferi at n0:~$ qstat -g c
CLUSTER Q  CQLOAD   USED    RES  AVAIL  TOTAL aoACDS  cdsuE  
--------------------------------------------------------------------------------
alma         -NA-      0      0      0      4      0      4 
szie-aotk    0.00      0      0      5      7      0      2 

The above report can be broken up by queue name as expected:

wferi at n0:~$ qstat -g c -q alma
CLUSTER Q  CQLOAD   USED    RES  AVAIL  TOTAL aoACDS  cdsuE  
--------------------------------------------------------------------------------
alma         -NA-      0      0      0      4      0      4 
wferi at n0:~$ qstat -g c -q szie-aotk
CLUSTER Q  CQLOAD   USED    RES  AVAIL  TOTAL aoACDS  cdsuE  
--------------------------------------------------------------------------------
szie-aotk    0.00      0      0      5      7      0      2 

Now I add a third queue:

wferi at n0:~$ qconf -Aq /tmp/teszt.q 
wferi at n0.niif.grid added "teszt" to cluster queue list
wferi at n0:~$ qstat -g c
CLUSTER Q  CQLOAD   USED    RES  AVAIL  TOTAL aoACDS  cdsuE  
--------------------------------------------------------------------------------
alma         -NA-      0      0      0      4      0      4 
szie-aotk    0.00      0      0      5      7      0      2 
teszt        -NA-      0      0      0      0      0      0 

So far so good.  But separate reporting doesn't work properly:

wferi at n0:~$ qstat -g c -q teszt
wferi at n0:~$ qstat -g c -q alma
CLUSTER Q  CQLOAD   USED    RES  AVAIL  TOTAL aoACDS  cdsuE  
--------------------------------------------------------------------------------
alma         -NA-      0      0      0      4      0      4 
teszt        -NA-      0      0      0      0      0      0 
wferi at n0:~$ qstat -g c -q szie-aotk
CLUSTER Q  CQLOAD   USED    RES  AVAIL  TOTAL aoACDS  cdsuE  
--------------------------------------------------------------------------------
szie-aotk    0.00      0      0      5      7      0      2 
teszt        -NA-      0      0      0      0      0      0 

You don't need two working queue to exhibit the problem, one is enough,
but I found the demonstration more convincing with two.  Here's my queue
configuration (they differ in their first two lines only):

wferi at n0:~$ qconf -shgrpl
@alma
@szie-aotk
@teszt
wferi at n0:~$ cat /tmp/teszt.q
qname                 teszt
hostlist              @teszt
seq_no                0
load_thresholds       np_load_avg=1.75
suspend_thresholds    NONE
nsuspend              1
suspend_interval      00:05:00
priority              0
min_cpu_interval      00:05:00
processors            UNDEFINED
qtype                 BATCH INTERACTIVE
ckpt_list             NONE
pe_list               make
rerun                 FALSE
slots                 1
tmpdir                /tmp
shell                 /bin/bash
prolog                NONE
epilog                NONE
shell_start_mode      unix_behavior
starter_method        NONE
suspend_method        NONE
resume_method         NONE
terminate_method      NONE
notify                00:00:60
owner_list            NONE
user_lists            NONE
xuser_lists           NONE
subordinate_list      NONE
complex_values        NONE
projects              NONE
xprojects             NONE
calendar              NONE
initial_state         default
s_rt                  INFINITY
h_rt                  INFINITY
s_cpu                 INFINITY
h_cpu                 INFINITY
s_fsize               INFINITY
h_fsize               INFINITY
s_data                INFINITY
h_data                INFINITY
s_stack               INFINITY
h_stack               INFINITY
s_core                INFINITY
h_core                INFINITY
s_rss                 INFINITY
h_rss                 INFINITY
s_vmem                INFINITY
h_vmem                INFINITY

This problem disappears as soon as I add a (resolvable) host to the @teszt
hostgroup: the qstat -g c -q <name> outputs become reasonable again,
until I remove the host not belonging there (these queues are dynamic, I
typically can't choose a proper host for the new queue at creation time).

If this problem is too hard to fix, I'll work around it, but first I hope
for a real solution, either in qstat or in my configuration.

Can you offer one?
-- 
Thanks,
Feri.





More information about the Pkg-gridengine-devel mailing list