[Pkg-nagios-devel] Bug#631447: nsca processes don't timeout if the nagios named pipe is not being read

Michael Stroucken stroucki at andrew.cmu.edu
Thu Jun 23 23:04:06 UTC 2011


Package: nsca
Version: 2.7.2+nmu2
Severity: minor

*** Please type your report below this line ***

We have passive service checks that occur rather frequently in our 
compute cluster, and notice that a bunch of nsca processes don't 
terminate properly if they have started around the time when logrotate 
stops nagios to rotate logs. Eventually the system will run out of 
resources (network, processes).

Cutout from ps:-
nagios   15772  0.0  0.0  10364   464 ?        SL   Jun22   0:00 /usr/sbin/nsca --daemon -c /etc/nsca.cfg
nagios   15773  0.0  0.0  10364   464 ?        SL   Jun22   0:00 /usr/sbin/nsca --daemon -c /etc/nsca.cfg
nagios   15774  0.0  0.0  10364   464 ?        SL   Jun22   0:00 /usr/sbin/nsca --daemon -c /etc/nsca.cfg
nagios   15775  0.0  0.0  10364   464 ?        SL   Jun22   0:00 /usr/sbin/nsca --daemon -c /etc/nsca.cfg
nagios   15776  0.0  0.0  10364   464 ?        SL   Jun22   0:00 /usr/sbin/nsca --daemon -c /etc/nsca.cfg
nagios   15777  0.0  0.0  10364   464 ?        SL   Jun22   0:00 /usr/sbin/nsca --daemon -c /etc/nsca.cfg
nagios   15778  0.0  0.0  10364   464 ?        SL   Jun22   0:00 /usr/sbin/nsca --daemon -c /etc/nsca.cfg
nagios   20907  0.2  0.0  10364   664 ?        Ss   Jun21   6:54 /usr/sbin/nsca --daemon -c /etc/nsca.cfg
nagios   30069  0.0  0.0  10364   464 ?        SL   06:25   0:00 /usr/sbin/nsca --daemon -c /etc/nsca.cfg
nagios   30070  0.0  0.0  10364   464 ?        SL   06:25   0:00 /usr/sbin/nsca --daemon -c /etc/nsca.cfg
nagios   30071  0.0  0.0  10364   464 ?        SL   06:25   0:00 /usr/sbin/nsca --daemon -c /etc/nsca.cfg
nagios   30072  0.0  0.0  10364   464 ?        SL   06:25   0:00 /usr/sbin/nsca --daemon -c /etc/nsca.cfg

Attaching to one of the stale processes will cause it to terminate:-
monitor:~# strace -f -p 30934
Process 30934 attached - interrupt to quit
open("/var/lib/nagios3/rw/nagios.cmd", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 5
fstat(5, {st_mode=S_IFIFO|0660, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9a0e716000
write(5, "[1308824845] PROCESS_SERVICE_CHE"..., 103) = 103
close(5)                                = 0
munmap(0x7f9a0e716000, 4096)            = 0
gettimeofday({1308868403, 8804}, NULL)  = 0
recvfrom(6, "", 720, 0, NULL, NULL)     = 0
munlock(0x23f6c10, 56)                  = 0
munlock(0x23f6d10, 4168)                = 0
munlock(0x23f6c50, 24)                  = 0
close(6)                                = 0
exit_group(0)                           = ?
Process 30934 detached

Cutout of tracing parent process:-
9165  gettimeofday({1308868771, 721281}, NULL) = 0
9165  gettimeofday({1308868771, 721371}, NULL) = 0
9165  stat("/var/lib/nagios3/rw/nagios.cmd", {st_mode=S_IFIFO|0660, st_size=0, ...}) = 0
9165  open("/var/lib/nagios3/rw/nagios.cmd", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 5
9165  fstat(5, {st_mode=S_IFIFO|0660, st_size=0, ...}) = 0
9165  mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9a0e716000
9165  write(5, "[1308868771] PROCESS_SERVICE_CHECK_RESULT;cloud60;MDRAID;1;NO RAID Sets specified to check\n", 91) = 91

I can't tell whether in the stuck processses the stat has completed or 
not. If I shut down nagios, it removes the 
/var/lib/nagios3/rw/nagios.cmd file, so I am assuming this is occurring 
in the window from where nagios is no longer listening and to the point 
the named pipe is deleted.

I am going to workaround by stopping the nsca service, killing all 
remaining nsca processes, and restarting the service at 06:30. I feel a 
better fix would be to have the nsca child process terminate if unable 
to complete its job in a certain amount of time.

Greetings,
Michael.


-- System Information:
Debian Release: 6.0.1
   APT prefers stable
   APT policy: (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.32-5-xen-amd64 (SMP w/1 CPU core)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages nsca depends on:
ii  debconf [debconf-2.0]         1.5.36.1   Debian configuration management sy
ii  libc6                         2.11.2-10  Embedded GNU C Library: Shared lib
ii  libmcrypt4                    2.5.8-3.1  De-/Encryption Library

nsca recommends no packages.

Versions of packages nsca suggests:
ii  nagios-plugins                1.4.15-3   Plugins for the nagios network mon
ii  nagios-plugins-basic          1.4.15-3   Plugins for the nagios network mon
ii  nagios3                       3.2.1-2    A host/service/network monitoring

-- Configuration Files:
/etc/nsca.cfg changed [not included]
/etc/send_nsca.cfg changed [not included]

-- debconf information excluded





More information about the Pkg-nagios-devel mailing list