[Pkg-nagios-devel] Bug#608455: Bug#608455: Bug#608455: nagios3: return_code of passive checks sent via nsca to central server are in wrong format

Sun Jan 2 17:08:06 UTC 2011

leee schrieb am Sunday, den 02. January 2011:

> The problem is _not_ with the supplied submit_check_result_via_nsca 
> bash script itself, but with the data being passed to that script 
> on the distributed clients.
> 
> To check that I was receiving the data from the (remote/distributed) 
> passive checks on the central monitoring server, I ran a 'cat' 
> command against the input pipe on the central monitoring server: a 
> portion of the output is shown below...
> 
> Mountain:~# cat /var/lib/nagios3/rw/nagios.cmd
> [1293975915] PROCESS_SERVICE_CHECK_RESULT;Benthic;NTP;0;NTP OK: 
> Offset 1.148838783e-05 secs
> [1293975944] PROCESS_SERVICE_CHECK_RESULT;Benthic;PING;0;PING OK - 
> Packet loss = 0%, RTA = 0.08 ms
> [1293975964] 
> PROCESS_SERVICE_CHECK_RESULT;Benthic;Platform;0;Linux-2.6.32-i686-with-debian-5.0.7
> [1293975984] PROCESS_SERVICE_CHECK_RESULT;Benthic;Postgres;0;OK - 
> database template1 (0 sec.)
> [1293976004] PROCESS_SERVICE_CHECK_RESULT;Benthic;Disk Space;0;DISK 
> WARNING - free space: /common 2780 MB (17% inode=99%):
> [1293976014] PROCESS_SERVICE_CHECK_RESULT;Benthic;Process 
> Count;0;PROCS OK: 155 processes
> ^C
> Mountain:~#
> 
> Note that the Disk Space entry at [1293976004] shows a return_code 
> of 0 even though the check result data indicates a warning.
There is nothing like a string result. Such a thing does not exist. Only the
return-code counts. 
> 
> (Be aware that running a cat command against the input pipe on the 
> central monitoring server will empty the pipe before Nagios can 
> read it, which Nagios will interpret as receiving _no_ results from 
> its distributed clients.  If this is done on a 'live' system then 
> the appropriate notifications will be raised and you will lose that 
> monitoring data)
> 
> I then amended the submit_check_result_via_nsca bash script to write 
> its input data out to a text file (in addition to transmitting it 
> to the central monitoring server - just duplicate the $printfcmd 
> command at the end of the xcript but divert it to a file with write 
> permissions instead of piping it to the send_nsca command) and got 
> the follwing results...
> 
> Benthic	NTP	OK	NTP OK: Offset 1.148838783e-05 secs
> Benthic	PING	OK	PING OK - Packet loss = 0%, RTA = 0.08 ms
> Benthic	Platform	OK	Linux-2.6.32-i686-with-debian-5.0.7
> Benthic	Postgres	OK	OK - database template1 (0 sec.)
> Benthic	Disk Space	WARNING	DISK WARNING - free space: /common 2780 
> MB (17% inode=99%):
> Benthic	Process Count	OK	PROCS OK: 155 processes
> Benthic	SSH	OK	SSH OK - OpenSSH_5.1p1 Debian-5 (protocol 2.0)
What are the return codes? Only they are interesting. And where are they
exactly coming from?

> 
> (I also repeated this experiment after deliberately stopping one of 
> the monitored services, to force a return_code of "CRITICAL", and 
> found that the return_code received on the central monitoring 
> server was still 0)
> 
> So it appears that the return_code supplied to the 
> submit_check_result_via_nsca bash script on the remote/distributed 
> clients is a string, with possible values of 
> OK/WARNING/CRITICAL/UNKNOWN, instead of the numeric values of 
> 0/1/2/3.  Furthermore, all string values are converted to 0 by the 
> time that they are placed into the central monitoring server's 
> input pipe, either by the /usr/sbin/send_nsca command running on 
> the remote/distributed client, or by the receiving nsca agent 
> running on the central monitoring server.
Who feeds the data? The obsession handler? And if yes, how does the config
look?

Alex