[Pkg-nagios-devel] Bug#608455: Bug#608455: nagios3: return_code of passive checks sent via nsca to central server are in wrong format
leee
leee at spatial.plus.com
Sun Jan 2 15:06:18 UTC 2011
The problem is _not_ with the supplied submit_check_result_via_nsca
bash script itself, but with the data being passed to that script
on the distributed clients.
To check that I was receiving the data from the (remote/distributed)
passive checks on the central monitoring server, I ran a 'cat'
command against the input pipe on the central monitoring server: a
portion of the output is shown below...
Mountain:~# cat /var/lib/nagios3/rw/nagios.cmd
[1293975915] PROCESS_SERVICE_CHECK_RESULT;Benthic;NTP;0;NTP OK:
Offset 1.148838783e-05 secs
[1293975944] PROCESS_SERVICE_CHECK_RESULT;Benthic;PING;0;PING OK -
Packet loss = 0%, RTA = 0.08 ms
[1293975964]
PROCESS_SERVICE_CHECK_RESULT;Benthic;Platform;0;Linux-2.6.32-i686-with-debian-5.0.7
[1293975984] PROCESS_SERVICE_CHECK_RESULT;Benthic;Postgres;0;OK -
database template1 (0 sec.)
[1293976004] PROCESS_SERVICE_CHECK_RESULT;Benthic;Disk Space;0;DISK
WARNING - free space: /common 2780 MB (17% inode=99%):
[1293976014] PROCESS_SERVICE_CHECK_RESULT;Benthic;Process
Count;0;PROCS OK: 155 processes
^C
Mountain:~#
Note that the Disk Space entry at [1293976004] shows a return_code
of 0 even though the check result data indicates a warning.
(Be aware that running a cat command against the input pipe on the
central monitoring server will empty the pipe before Nagios can
read it, which Nagios will interpret as receiving _no_ results from
its distributed clients. If this is done on a 'live' system then
the appropriate notifications will be raised and you will lose that
monitoring data)
I then amended the submit_check_result_via_nsca bash script to write
its input data out to a text file (in addition to transmitting it
to the central monitoring server - just duplicate the $printfcmd
command at the end of the xcript but divert it to a file with write
permissions instead of piping it to the send_nsca command) and got
the follwing results...
Benthic NTP OK NTP OK: Offset 1.148838783e-05 secs
Benthic PING OK PING OK - Packet loss = 0%, RTA = 0.08 ms
Benthic Platform OK Linux-2.6.32-i686-with-debian-5.0.7
Benthic Postgres OK OK - database template1 (0 sec.)
Benthic Disk Space WARNING DISK WARNING - free space: /common 2780
MB (17% inode=99%):
Benthic Process Count OK PROCS OK: 155 processes
Benthic SSH OK SSH OK - OpenSSH_5.1p1 Debian-5 (protocol 2.0)
(I also repeated this experiment after deliberately stopping one of
the monitored services, to force a return_code of "CRITICAL", and
found that the return_code received on the central monitoring
server was still 0)
So it appears that the return_code supplied to the
submit_check_result_via_nsca bash script on the remote/distributed
clients is a string, with possible values of
OK/WARNING/CRITICAL/UNKNOWN, instead of the numeric values of
0/1/2/3. Furthermore, all string values are converted to 0 by the
time that they are placed into the central monitoring server's
input pipe, either by the /usr/sbin/send_nsca command running on
the remote/distributed client, or by the receiving nsca agent
running on the central monitoring server.
Regards,
Lee Elliott
More information about the Pkg-nagios-devel
mailing list