[Pkg-nagios-devel] Bug#631447: nsca processes don't timeout if the nagios named pipe is not being read
Michael Stroucken
stroucki at andrew.cmu.edu
Thu Jun 23 23:04:06 UTC 2011
Package: nsca
Version: 2.7.2+nmu2
Severity: minor
*** Please type your report below this line ***
We have passive service checks that occur rather frequently in our
compute cluster, and notice that a bunch of nsca processes don't
terminate properly if they have started around the time when logrotate
stops nagios to rotate logs. Eventually the system will run out of
resources (network, processes).
Cutout from ps:-
nagios 15772 0.0 0.0 10364 464 ? SL Jun22 0:00 /usr/sbin/nsca --daemon -c /etc/nsca.cfg
nagios 15773 0.0 0.0 10364 464 ? SL Jun22 0:00 /usr/sbin/nsca --daemon -c /etc/nsca.cfg
nagios 15774 0.0 0.0 10364 464 ? SL Jun22 0:00 /usr/sbin/nsca --daemon -c /etc/nsca.cfg
nagios 15775 0.0 0.0 10364 464 ? SL Jun22 0:00 /usr/sbin/nsca --daemon -c /etc/nsca.cfg
nagios 15776 0.0 0.0 10364 464 ? SL Jun22 0:00 /usr/sbin/nsca --daemon -c /etc/nsca.cfg
nagios 15777 0.0 0.0 10364 464 ? SL Jun22 0:00 /usr/sbin/nsca --daemon -c /etc/nsca.cfg
nagios 15778 0.0 0.0 10364 464 ? SL Jun22 0:00 /usr/sbin/nsca --daemon -c /etc/nsca.cfg
nagios 20907 0.2 0.0 10364 664 ? Ss Jun21 6:54 /usr/sbin/nsca --daemon -c /etc/nsca.cfg
nagios 30069 0.0 0.0 10364 464 ? SL 06:25 0:00 /usr/sbin/nsca --daemon -c /etc/nsca.cfg
nagios 30070 0.0 0.0 10364 464 ? SL 06:25 0:00 /usr/sbin/nsca --daemon -c /etc/nsca.cfg
nagios 30071 0.0 0.0 10364 464 ? SL 06:25 0:00 /usr/sbin/nsca --daemon -c /etc/nsca.cfg
nagios 30072 0.0 0.0 10364 464 ? SL 06:25 0:00 /usr/sbin/nsca --daemon -c /etc/nsca.cfg
Attaching to one of the stale processes will cause it to terminate:-
monitor:~# strace -f -p 30934
Process 30934 attached - interrupt to quit
open("/var/lib/nagios3/rw/nagios.cmd", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 5
fstat(5, {st_mode=S_IFIFO|0660, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9a0e716000
write(5, "[1308824845] PROCESS_SERVICE_CHE"..., 103) = 103
close(5) = 0
munmap(0x7f9a0e716000, 4096) = 0
gettimeofday({1308868403, 8804}, NULL) = 0
recvfrom(6, "", 720, 0, NULL, NULL) = 0
munlock(0x23f6c10, 56) = 0
munlock(0x23f6d10, 4168) = 0
munlock(0x23f6c50, 24) = 0
close(6) = 0
exit_group(0) = ?
Process 30934 detached
Cutout of tracing parent process:-
9165 gettimeofday({1308868771, 721281}, NULL) = 0
9165 gettimeofday({1308868771, 721371}, NULL) = 0
9165 stat("/var/lib/nagios3/rw/nagios.cmd", {st_mode=S_IFIFO|0660, st_size=0, ...}) = 0
9165 open("/var/lib/nagios3/rw/nagios.cmd", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 5
9165 fstat(5, {st_mode=S_IFIFO|0660, st_size=0, ...}) = 0
9165 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9a0e716000
9165 write(5, "[1308868771] PROCESS_SERVICE_CHECK_RESULT;cloud60;MDRAID;1;NO RAID Sets specified to check\n", 91) = 91
I can't tell whether in the stuck processses the stat has completed or
not. If I shut down nagios, it removes the
/var/lib/nagios3/rw/nagios.cmd file, so I am assuming this is occurring
in the window from where nagios is no longer listening and to the point
the named pipe is deleted.
I am going to workaround by stopping the nsca service, killing all
remaining nsca processes, and restarting the service at 06:30. I feel a
better fix would be to have the nsca child process terminate if unable
to complete its job in a certain amount of time.
Greetings,
Michael.
-- System Information:
Debian Release: 6.0.1
APT prefers stable
APT policy: (500, 'stable')
Architecture: amd64 (x86_64)
Kernel: Linux 2.6.32-5-xen-amd64 (SMP w/1 CPU core)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash
Versions of packages nsca depends on:
ii debconf [debconf-2.0] 1.5.36.1 Debian configuration management sy
ii libc6 2.11.2-10 Embedded GNU C Library: Shared lib
ii libmcrypt4 2.5.8-3.1 De-/Encryption Library
nsca recommends no packages.
Versions of packages nsca suggests:
ii nagios-plugins 1.4.15-3 Plugins for the nagios network mon
ii nagios-plugins-basic 1.4.15-3 Plugins for the nagios network mon
ii nagios3 3.2.1-2 A host/service/network monitoring
-- Configuration Files:
/etc/nsca.cfg changed [not included]
/etc/send_nsca.cfg changed [not included]
-- debconf information excluded
More information about the Pkg-nagios-devel
mailing list