[Pkg-nagios-devel] Bug#608455: Bug#608455: nagios3: return_code of passive checks sent via nsca to central server are in wrong format

Sun Jan 2 15:06:18 UTC 2011

The problem is _not_ with the supplied submit_check_result_via_nsca 
bash script itself, but with the data being passed to that script 
on the distributed clients.

To check that I was receiving the data from the (remote/distributed) 
passive checks on the central monitoring server, I ran a 'cat' 
command against the input pipe on the central monitoring server: a 
portion of the output is shown below...

Mountain:~# cat /var/lib/nagios3/rw/nagios.cmd
[1293975915] PROCESS_SERVICE_CHECK_RESULT;Benthic;NTP;0;NTP OK: 
Offset 1.148838783e-05 secs
[1293975944] PROCESS_SERVICE_CHECK_RESULT;Benthic;PING;0;PING OK - 
Packet loss = 0%, RTA = 0.08 ms
[1293975964] 
PROCESS_SERVICE_CHECK_RESULT;Benthic;Platform;0;Linux-2.6.32-i686-with-debian-5.0.7
[1293975984] PROCESS_SERVICE_CHECK_RESULT;Benthic;Postgres;0;OK - 
database template1 (0 sec.)
[1293976004] PROCESS_SERVICE_CHECK_RESULT;Benthic;Disk Space;0;DISK 
WARNING - free space: /common 2780 MB (17% inode=99%):
[1293976014] PROCESS_SERVICE_CHECK_RESULT;Benthic;Process 
Count;0;PROCS OK: 155 processes
^C
Mountain:~#

Note that the Disk Space entry at [1293976004] shows a return_code 
of 0 even though the check result data indicates a warning.

(Be aware that running a cat command against the input pipe on the 
central monitoring server will empty the pipe before Nagios can 
read it, which Nagios will interpret as receiving _no_ results from 
its distributed clients.  If this is done on a 'live' system then 
the appropriate notifications will be raised and you will lose that 
monitoring data)

I then amended the submit_check_result_via_nsca bash script to write 
its input data out to a text file (in addition to transmitting it 
to the central monitoring server - just duplicate the $printfcmd 
command at the end of the xcript but divert it to a file with write 
permissions instead of piping it to the send_nsca command) and got 
the follwing results...

Benthic	NTP	OK	NTP OK: Offset 1.148838783e-05 secs
Benthic	PING	OK	PING OK - Packet loss = 0%, RTA = 0.08 ms
Benthic	Platform	OK	Linux-2.6.32-i686-with-debian-5.0.7
Benthic	Postgres	OK	OK - database template1 (0 sec.)
Benthic	Disk Space	WARNING	DISK WARNING - free space: /common 2780 
MB (17% inode=99%):
Benthic	Process Count	OK	PROCS OK: 155 processes
Benthic	SSH	OK	SSH OK - OpenSSH_5.1p1 Debian-5 (protocol 2.0)

(I also repeated this experiment after deliberately stopping one of 
the monitored services, to force a return_code of "CRITICAL", and 
found that the return_code received on the central monitoring 
server was still 0)

So it appears that the return_code supplied to the 
submit_check_result_via_nsca bash script on the remote/distributed 
clients is a string, with possible values of 
OK/WARNING/CRITICAL/UNKNOWN, instead of the numeric values of 
0/1/2/3.  Furthermore, all string values are converted to 0 by the 
time that they are placed into the central monitoring server's 
input pipe, either by the /usr/sbin/send_nsca command running on 
the remote/distributed client, or by the receiving nsca agent 
running on the central monitoring server.

Regards,

Lee Elliott