[Nut-upsuser] Alert: REPLBATT active after battery replacement and requires reboot to clear
Vyasa
info at dalpha.com
Mon Jun 30 23:35:42 BST 2025
Hi Jim,
Thanks for the prompt response.
The restart I refer to was exactly as you say. Where I restarted the
service using: systemctl restart nut-server. This was separate to where
I mention the reboot of server machine, which resolves the issue.
The driver used was:
Network UPS Tools - UPS driver controller 2.8.0
Network UPS Tools - BCMXCP UPS driver 0.32 (2.8.0)
I simulated the fault again, by putting the UPS in bypass and
disconnecting the battery. This caused the RB alert again. With this I
then reconnected battery, restored UPS to normal operating condition.
Then used upsdrvctl to STOP and START the driver.
Generating alert condition for simulating RB:
Alert type: REPLBATT
.....................
ups.status: ALARM OL BYPASS RB
ups.test.result: Done and error
Alert cleared on UPS, and alert condition with RB persisting on NUT-SERVER:
Alert type: ONLINE
.................
ups.status: OL RB
ups.test.result: Done and passed
Restarting using upsdrvctl start/stop command clears RB:
Alert type: COMMOK
..................
ups.status: OL
ups.test.result: Done and passed
So it seems that your and my suspicions have been verified. Where bcmxcp
seems to "latch" the alarm until driver restart or server reboot.
I think you are correct, in that this can cause issues in other subsets
of real-life cases. Thinking here of automating and scripting and so forth.
What would you suggest at this point? Can this be submitted as a bug?
Vyasa
On 6/30/25 14:18, Jim Klimov wrote:
> Hello,
>
> You mention that you've tried restarting the "nut-server" - I
> suppose you mean literally, the service unit by such name - of the NUT
> data server. Did you try restarting the unit for the NUT driver (e.g.
> `systemctl restart nut-drvier at upsname` with NUT v2.8.x and newer)?
>
> You did not mention the driver used, but I wonder if that driver
> program "latches" the RB value when it goes bad and never updates
> it?.. This could make sense when UPS battery replacement means server
> downtime, but that is just a subset of real-life cases - so generally
> can be just an oversight. For example, `bcmxcp` code seems to only set
> `bcmxcp_status.alarm_replace_battery=1` (oddly neither the field nor
> struct is ever initialized to 0, so might be garbage on some
> systems/compilers that do not zero-out aggregate types by default).
>
> Jim
>
>
> On Mon, Jun 30, 2025 at 7:53 PM Vyasa via Nut-upsuser
> <nut-upsuser at alioth-lists.debian.net> wrote:
>
> Hello,
>
> CONFIGURATION:
>
> I am using a Powerware PW9120 3000i, on a network configuration
> with a server and a couple of slaves.
>
> The nut-server OS is /Debian 12 (6.1.0-37-amd64)/. Nut was
> installed from the Debian repo with version /2.8.0-7 amd64/, and
> client has the same version.
>
> UPS is connected with a standard RS232 serial connection, and
> works with all standard commands and functionality.
>
> Command "/upscmd -l upsname/" provides the following, where I have
> successfully used /test.battery.start/ and /test.system.start/:
>
> beeper.disable - Disable the UPS beeper
> beeper.enable - Enable the UPS beeper
> beeper.mute - Temporarily mute the UPS beeper
> load.on - Turn on the load immediately
> outlet.1.load.off - Turn off the load on outlet 1 immediately
> outlet.1.load.on - Turn on the load on outlet 1 immediately
> outlet.1.shutdown.return - Turn off the outlet 1 and return when
> power is back
> outlet.2.load.off - Turn off the load on outlet 2 immediately
> outlet.2.load.on - Turn on the load on outlet 2 immediately
> outlet.2.shutdown.return - Turn off the outlet 2 and return when
> power is back
> shutdown.return - Turn off the load and return when power is back
> shutdown.stayoff - Turn off the load and remain off
> test.battery.start - Start a battery test
> test.system.start - Start a system test
>
> ISSUE:
>
> Every couple of years when I have to replace batteries in the UPS,
> I get an issue with not being able to clear the REPLBATT alert.
> That is not until I reboot the server running NUT-SERVER. This
> might seem as not a big deal, but becomes a hassle when batteries
> haven't quite failed yet and are still good after a ups battery test.
>
> The UPS itself reports OK after battery replacement or battery
> test, and clears alarm on its LCD. But when I poll the UPS data
> using "upsc upsname" I still see the RB or REPLBATT and this will
> not clear until I reboot the server. So without reboot the alert
> will then be generated based on RBWARNTIME in upsmon.conf, which
> is as per nut design.
>
> So without reboot I always get the RB flag with status:
>
> /Alert type: REPLBATT/
> /............/
> /ups.status: OL RB/
> /ups.test.result: Done and passed/
>
> After reboot of server the alert is cleared:
>
> /Alert type: COMMOK
> ............
> ups.status: OL
> ups.test.result: Done and passed/
>
> So my question becomes, why is this reboot required and it doesn't
> seem to make any sense? I can't understand why the polled data
> from a UPS would change after a reboot, while on the UPS LCD its
> reporting all OK? I tried restarting NUT-SERVER to see if it
> would make any difference. Also, the command test.battery.start
> will clear the alarm on the UPS if battery test good.
>
> The only explanation that I have come up with is that the
> persistent RB/REPLBATT is latched to this condition and is an
> artifact of UPS to NUT handshaking.
>
> Any feedback would be kindly appreciated, as I have searched and
> searched.
>
> Thank you!
>
> Vyasa
> _______________________________________________
> Nut-upsuser mailing list
> Nut-upsuser at alioth-lists.debian.net
> https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/nut-upsuser
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/nut-upsuser/attachments/20250630/5b65ec53/attachment-0001.htm>
More information about the Nut-upsuser
mailing list