[Nut-upsuser] wrong NOTIFYTYPE always ONBATT on upssched TIMER ?
Jim Klimov
jimklimov+nut at gmail.com
Fri Apr 11 15:09:22 BST 2025
So, I've done some experiments, also integrating upssched into NIT (NUT
Integration Testing) suite, to help automate the setup and debugging - that
aspect can be seen in https://github.com/jimklimov/nut/tree/issue-2890
(also investigating for https://github.com/networkupstools/nut/issues/2890
that also touches on upssched).
My current conclusion is that `setenv("NOTIFYTYPE")` only happens when
`upsmon` calls whatever it has as the `NOTIFYCMD` handler. It is a new
process every time it is called, and when the handler is a script or other
short-lived program, it indeed sees the current unique value of NOTIFYTYPE
(and UPSNAME, and "other CLI args" that convey the message for
notification).
However in case of upssched, the first call to create/cancel timers
launches a daemon (to track timer states and execute `CMDSCRIPT` when
timers elapse). Subsequent calls to `upssched` as a CLI program talk to
this daemon via pipe, to inform it of state changes for `START-TIMER` and
`CANCEL-TIMER` matches, e.g.:
UPS dummy at localhost:12345 on battery
Fri Apr 11 13:40:09 UTC 2025: /home/jim/nut/scripts/misc/notifyme-debug
ONBATT [dummy at localhost:12345]: args: UPS dummy at localhost:12345 on
battery (1 arg tokens)
Defaulting debug verbosity to NUT_DEBUG_LEVEL=3 since none was requested by
command-line options
0.000081 [D2:7688] parse_at: is '*' in AT command the
'dummy at localhost:12345' we were launched to process?
0.000129 [D1:7688] parse_at: this AT command is for a wildcard:
matched
0.000171 [D1:7688] parse_at: SKIP: 'ONLINE' in AT command did not
match the 'ONBATT' NOTIFYTYPE we were launched to process
...
0.001165 [D1:7688] parse_at: this AT command is for a wildcard:
matched
0.001172 [D1:7688] parse_at: processing EXECUTE
0.001193 Executing command: ONBATT
Fri Apr 11 13:40:09 UTC 2025: /home/jim/nut/clients/upssched-cmd: THIS IS A
SAMPLE SCRIPT, PLEASE TAILOR IT FOR YOUR DEPLOYMENT OF NUT!
Fri Apr 11 13:40:09 UTC 2025: UPSNAME='dummy at localhost:12345'
NOTIFYTYPE='ONBATT' args=ONBATT
0.009687 [D2:7688] parse_at: is '*' in AT command the
'dummy at localhost:12345' we were launched to process?
0.009719 [D1:7688] parse_at: this AT command is for a wildcard:
matched
0.009751 [D1:7688] parse_at: processing CANCEL-TIMER
25.037157 [D3:7373] new connection on fd 16
25.037214 [D3:7373] Ending sock_read() on a good note: try to use
command:
25.037238 [D3:7373] arg 0: CANCEL
25.037243 [D3:7373] arg 1: LINE-STATUS-1s
...
0.019160 [D2:7688] parse_at: is '*' in AT command the
'dummy at localhost:12345' we were launched to process?
0.019190 [D1:7688] parse_at: this AT command is for a wildcard:
matched
0.019196 [D1:7688] parse_at: processing START-TIMER
25.046606 [D3:7373] closing connection on fd 16
25.046647 [D3:7373] new connection on fd 16
25.046706 [D3:7373] Ending sock_read() on a good note: try to use
command:
25.046729 [D3:7373] arg 0: START
25.046734 [D3:7373] arg 1: BATT-STATUS-60
25.046757 [D3:7373] arg 2: 3600
25.046780 New timer: BATT-STATUS-60 (3600 seconds)
0.019597 [D2:7688] parse_at: is '*' in AT command the
'dummy at localhost:12345' we were launched to process?
0.019622 [D1:7688] parse_at: this AT command is for a wildcard:
matched
0.019627 [D1:7688] parse_at: SKIP: 'REPLBATT' in AT command did not
match the 'ONBATT' NOTIFYTYPE we were launched to process
Note the 25-sec timestamps of the daemon, and 0.x sec timestamps of the
client that reacts to a new ONBATT event.
The `upssched` code, its infinite-looping daemon included, never calls
`setenv()`. So all runs of a `CMDSCRIPT` inherit the environment exported
for the first run of `upssched` when it created the daemon, including the
original and static value of `NOTIFYTYPE`.
The `upssched` CLI program directly runs the `CMDSCRIPT` when an `AT ...
EXECUTE ...` line is matched, however -- including the `NOTIFYTYPE`
relevant at this moment, as seen in the third report from ` upssched-cmd`
script below:
25.047059 [D3:7373] closing connection on fd 16
0.019795 [D2:7688] parse_at: is '*' in AT command the
'dummy at localhost:12345' we were launched to process?
0.019818 [D1:7688] parse_at: this AT command is for a wildcard:
matched
0.019842 [D1:7688] parse_at: SKIP: 'COMMBAD' in AT command did not
match the 'ONBATT' NOTIFYTYPE we were launched to process
0.019858 [D1:7688] Exiting upssched (CLI process)
26.048227 Event: BATT-STATUS-1s
Fri Apr 11 13:40:10 UTC 2025: /home/jim/nut/clients/upssched-cmd: THIS IS A
SAMPLE SCRIPT, PLEASE TAILOR IT FOR YOUR DEPLOYMENT OF NUT!
Fri Apr 11 13:40:10 UTC 2025: UPSNAME='dummy at localhost:12345'
NOTIFYTYPE='ONBATT' args=BATT-STATUS-1s
27.061667 Event: BATT-STATUS-2s
Fri Apr 11 13:40:11 UTC 2025: /home/jim/nut/clients/upssched-cmd: THIS IS A
SAMPLE SCRIPT, PLEASE TAILOR IT FOR YOUR DEPLOYMENT OF NUT!
Fri Apr 11 13:40:11 UTC 2025: UPSNAME='dummy at localhost:12345'
NOTIFYTYPE='ONBATT' args=BATT-STATUS-2s
UPS dummy at localhost:12345 on line power
Fri Apr 11 13:40:14 UTC 2025: /home/jim/nut/scripts/misc/notifyme-debug
ONLINE [dummy at localhost:12345]: args: UPS dummy at localhost:12345 on
line power (1 arg tokens)
Defaulting debug verbosity to NUT_DEBUG_LEVEL=3 since none was requested by
command-line options
0.000068 [D2:7720] parse_at: is '*' in AT command the
'dummy at localhost:12345' we were launched to process?
0.000109 [D1:7720] parse_at: this AT command is for a wildcard:
matched
0.000138 [D1:7720] parse_at: processing EXECUTE
0.000143 Executing command: ONLINE
Fri Apr 11 13:40:14 UTC 2025: /home/jim/nut/clients/upssched-cmd: THIS IS A
SAMPLE SCRIPT, PLEASE TAILOR IT FOR YOUR DEPLOYMENT OF NUT!
Fri Apr 11 13:40:14 UTC 2025: UPSNAME='dummy at localhost:12345'
NOTIFYTYPE='ONLINE' args=ONLINE
0.011340 [D2:7720] parse_at: is '*' in AT command the
'dummy at localhost:12345' we were launched to process?
0.011389 [D1:7720] parse_at: this AT command is for a wildcard:
matched
0.011398 [D1:7720] parse_at: processing CANCEL-TIMER
30.042455 [D3:7373] new connection on fd 16
30.042490 Event: BATT-STATUS-5s
Fri Apr 11 13:40:14 UTC 2025: /home/jim/nut/clients/upssched-cmd: THIS IS A
SAMPLE SCRIPT, PLEASE TAILOR IT FOR YOUR DEPLOYMENT OF NUT!
Fri Apr 11 13:40:14 UTC 2025: UPSNAME='dummy at localhost:12345'
NOTIFYTYPE='ONBATT' args=BATT-STATUS-5s
30.051445 [D3:7373] Ending sock_read() on a good note: try to use
command:
30.051489 [D3:7373] arg 0: CANCEL
30.051496 [D3:7373] arg 1: BATT-STATUS-1s
It seems that sometimes timer calls can happen before they get the cancel
signal (processing the list and calling the commands can take a while),
note the jump from 27.x to 30.x seconds and call to handle BATT-STATUS-5s
even though we are already processing cancellation of BATT-STATUS-1s and
would likely get to cancel 5s in a moment.
...
0.020701 [D2:7720] parse_at: is '*' in AT command the
'dummy at localhost:12345' we were launched to process?
0.020727 [D1:7720] parse_at: this AT command is for a wildcard:
matched
0.020733 [D1:7720] parse_at: processing CANCEL-TIMER
30.051843 [D3:7373] closing connection on fd 16
30.051897 [D3:7373] new connection on fd 16
30.051963 [D3:7373] Ending sock_read() on a good note: try to use
command:
30.051992 [D3:7373] arg 0: CANCEL
30.052021 [D3:7373] arg 1: BATT-STATUS-2s
0.021220 [D2:7720] parse_at: is '*' in AT command the
'dummy at localhost:12345' we were launched to process?
0.021250 [D1:7720] parse_at: this AT command is for a wildcard:
matched
0.021290 [D1:7720] parse_at: processing CANCEL-TIMER
30.052299 [D3:7373] closing connection on fd 16
30.052340 [D3:7373] new connection on fd 16
30.052392 [D3:7373] Ending sock_read() on a good note: try to use
command:
30.052415 [D3:7373] arg 0: CANCEL
30.052420 [D3:7373] arg 1: BATT-STATUS-5s
0.021601 [D2:7720] parse_at: is '*' in AT command the
'dummy at localhost:12345' we were launched to process?
0.021626 [D1:7720] parse_at: this AT command is for a wildcard:
matched
0.021631 [D1:7720] parse_at: processing CANCEL-TIMER
30.052685 [D3:7373] closing connection on fd 16
30.052727 [D3:7373] new connection on fd 16
30.052781 [D3:7373] Ending sock_read() on a good note: try to use
command:
30.052804 [D3:7373] arg 0: CANCEL
30.052810 [D3:7373] arg 1: BATT-STATUS-15s
30.052831 Cancelling timer: BATT-STATUS-15s
...
Ultimately, what should matter for the `CMDSCRIPT` are its command-line
arguments (technically, the first and only one). That alone dictates the
operation it should run, regardless of the UPSNAME or NOTIFYTYPE values. It
would probably be correct to unset those variables in the loop at least, or
in the tool overall (to not imply any guarantees of anything related to
those strings).
If timers or executions relate to a particular UPS, name it in the `AT`
line spec (instead of the asterisk) and track as an uniquely named timer to
start or cancel.
That was a fun exploration, thanks for the nudge!
Jim Klimov
On Tue, Apr 8, 2025 at 10:59 AM RemiG <mailing-nut at dbmtechnologies.com>
wrote:
> Hi for all
>
> an other one with same wrong NOTIFYTYPE (my supposition) (ups.mfr: EATON, ups.model:
> Ellipse PRO 1600)
>
> upsmon says NOTIFYTYPE = ONBATT
>
> upssched create a time associated to this Event => BATT-STATUS-5 (but
> Power return before 5 minutes ;-) so ... )
> upsmon says NOTIFYTYPE = ONLINE
>
> upssched.conf delete this Event => BATT-STATUS-5, BATT-STATUS-10
>
> upssched create a TIMER associated to this Event => LINE-STATUS-5 and so on
>
> and after all other timers created by ONLINE => LINE-STATUS-5,
> LINE-STATUS-10
>
> sent ENV var NOTIFYTYPE to /bin/upssched-cmd bash script => ONBATT (but
> upsc UPS at localhost read ups.status : OL CHRG )
>
>
> Isn't it strange ONBATT / OL CHGR
>
> root at greg:~# grep " ups" /var/log/syslog
> Apr 8 10:07:33 greg upsmon[32507]: Onduleur UPS1 at localhost en mode
> Secours sur batterie
> Apr 8 10:07:33 greg upssched[17464]: Executing command: ONBATT
> Apr 8 10:07:34 greg upssched-cmd-received-NOTIFYTYPE: ONBATT
> Apr 8 10:07:34 greg upssched-cmd-read-Upsc_ups_status: OB DISCHRG
> Apr 8 10:07:34 greg upssched[17497]: Timer daemon started
> Apr 8 10:07:34 greg upssched[17497]: New timer: BATT-STATUS-5 (300
> seconds)
> Apr 8 10:07:34 greg upssched[17497]: New timer: BATT-STATUS-10 (600
> seconds)
> Apr 8 10:08:03 greg upsmon[32507]: Onduleur UPS1 at localhost en mode
> Secteur : recharge batterie en cours
> Apr 8 10:08:03 greg upssched[17516]: Executing command: ONLINE
> Apr 8 10:08:03 greg upssched-cmd-received-NOTIFYTYPE: ONLINE
> Apr 8 10:08:03 greg upssched-cmd-read-Upsc_ups_status: OL CHRG
> Apr 8 10:08:04 greg upssched[17497]: Cancelling timer: BATT-STATUS-5
> Apr 8 10:08:04 greg upssched[17497]: Cancelling timer: BATT-STATUS-10
> Apr 8 10:08:04 greg upssched[17497]: New timer: LINE-STATUS-5 (300
> seconds)
> Apr 8 10:08:04 greg upssched[17497]: New timer: LINE-STATUS-10 (600
> seconds)
> Apr 8 10:13:04 greg upssched[17497]: Event: LINE-STATUS-5
> Apr 8 10:13:04 greg upssched-cmd-received-NOTIFYTYPE: ONBATT
> Apr 8 10:13:04 greg upssched-cmd-read-Upsc_ups_status: OL CHRG
> Apr 8 10:18:04 greg upssched[17497]: Event: LINE-STATUS-10
> Apr 8 10:18:05 greg upssched-cmd-received-NOTIFYTYPE: ONBATT
> Apr 8 10:18:05 greg upssched-cmd-read-Upsc_ups_status: OL CHRG
> --
>
> # admin linux only ;-)
>>
>> _______________________________________________
>> Nut-upsuser mailing list
>> Nut-upsuser at alioth-lists.debian.net
>> https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/nut-upsuser
>>
> --
> # admin linux only ;-)
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/nut-upsuser/attachments/20250411/08c81099/attachment-0001.htm>
More information about the Nut-upsuser
mailing list