[Pkg-nagios-devel] Bug#294104: Hangs using 80% CPU when database server is restarted

Roman Hodek roman@hodek.net, 294104@bugs.debian.org
Mon, 7 Feb 2005 21:35:20 +0100 (CET)

Package: nagios-pgsql
Version: 2:1.3-cvs.20050116-1
Severity: normal

Around a week ago, I switched my nagios installation from text-based
to a Postgres database (which is now running anyway on the same
machine for other reasons).

(Unintendedly) I've set up postmaster.conf so that the db server runs
with locale de_DE.UTF-8, so error messages are in German. When I then
restart the Postgres server, nagios goes into an endless loop, using
~80% CPU time. With strace, I can see that its famous last words are:

send(6, "Q\0\0\0\26BEGIN TRANSACTION\0", 23, 0) = 23
rt_sigaction(SIGPIPE, {SIG_DFL}, {SIG_IGN}, 8) = 0
poll([{fd=6, events=POLLIN|POLLERR, revents=POLLIN|POLLERR|POLLHUP}], 1, -1) = 1
recv(6, "E\0\0\0mSFATAL\0C57P01\0Mbreche Verbi"..., 16384, 0) = 110
poll([{fd=6, events=POLLIN|POLLERR, revents=POLLIN|POLLERR|POLLHUP}], 1, -1) = 1
recv(6, "", 16384, 0)                   = 0
time(NULL)                              = 1107807044
poll([{fd=6, events=POLLIN|POLLERR, revents=POLLIN|POLLERR|POLLHUP}], 1, 0) = 1
recv(6, "", 16384, 0)                   = 0
close(6)                                = 0

"breche Verbi" is probably the start of "breche Verbindung ab" ==
"aborting connection". After that nothing anymore in strace... It
seems nagios somehow checks the wording of the error message, and if
it doesn't see the expected English string it does something silly and
loops around ;) If Postgres runs in en_US.UTF-8 locale, nagios still
runs normally after the db restart.

This is not a big problem for me, as error messages _should_ be
English in my case. However, it would nevertheless be better to check
the return code instead parsing the string...