[Nut-upsuser] NUT and Eaton UPS produce a lot of error messages
Jim Klimov
jimklimov+nut at gmail.com
Sat Jan 20 15:04:19 GMT 2024
Thanks for the info!
First, regarding the later detail about starting `usbhid-ups`: the
"Resource busy" indicates that likely an earlier instance of the NUT driver
(in its own service unit) is still running and holding the device. In a
worse case, some other programs might consider this a HID device (same
class family as keyboards and mice) and grab it somehow, or it gets passed
through virtualization/containers and a different OS holds it in fact
(guest if you're running NUT on hypervisor, or vice versa). Make sure you
stop those too, whichever applies.
Second, looking at dmesg and other logs: it seems the device number gets
re-enumerated (lost #3, got #4) during reconnection - some OSes do that,
others do not. This might confuse a driver's attempts to re-attach (relying
on earlier learned details from initial connection) - although due to an
issue in NUT v2.8.0 and before the drivers might have not tracked the
"device" number (seen as "unknown" here), which got fixed in 2.8.1, but
since 2.8.2 the `nut-scanner` would again not-suggest it as a config option
by default due to such unreliability with reconnections...
There are quite a few nuances that might differ between "vanilla" NUT
systemd integration (which also evolved over time in git sources) and that
packaged by distros, I haven't looked deep into Debian recipes lately.
Maybe there's some unfortunate interaction of service dependencies that
follows up from the disconnections. Some educated guesswork below:
My current guess would be that a driver gives up reconnecting with the
remembered data that no longer matches the device, and perhaps exits to be
restarted by systemd (journal history of `nut-driver at Eaton.service`, if
that one got generated by `nut-driver-enumerator` on your system, might
confirm or dispel that guess; otherwise check for a plain old
`nut-driver.service` monolith). Or it exits/crashes due to some other
reason, such as manual runs of `usbhid-ups` (if the PID file is saved by
the daemon and is found and used by the command-line spawned instance to
kill off the "presumed-frozen" competitor; in the log above I don't see
direct indications of that though).
* Found an example of when a new driver process kills its older "self"
using the PID files - just in case you see such indications in your logs:
Duplicate driver instance detected (PID file
/run/nut/nutdrv_qx-tecnowaremansarda.pid exists)! Terminating other driver!
Further, what I guess could follow up is that if your only
`nut-driver*.service` exits and restarts, the systemd dependency it
provides for `nut-server.service` (maybe via `nut-driver.target`) flickers
and causes the data server to restart. (Not sure at the moment if it is a
weak Wants or a harder Requires type of dependency there; technically
`upsd` should run well without drivers and report that the device is
unknown or data is stale).
Probably you can interpolate the event trails from `dmesg` and `journalctl
-lx` as suggested earlier, with bumped `debug_min` in `ups.conf`,
`upsd.conf` and maybe `upsmon.conf`, to check if the real events and
service state changes seem to confirm this theorized chain of events:
* USB reconnection (HW/FW reasons can vary a lot);
* `nut-driver at Eaton.service` tries to reconnect but fails, or for older
versions just aborts due to loss of link (even without OS USB
re-enumeration, there can be a few seconds when the newly made devfs node
is owned by `root` and `udev` did not have time to hand it off to `nut`, so
the NUT driver can not re-attach);
* fault of `nut-driver*` causes `nut-server` to stop (shouldn't, but might,
happen... at least fits the symptoms you've posted earlier)
* the `nut-driver` is resuscitated by systemd after some RestartSec timeout
* dependency for `nut-server` is healthy so it is started up again
* thinking of it, maybe the last couple of steps is almost concurrent: as
soon as systemd launched the driver process, its unit is considered
healthy, so the data server starts; however the driver then takes some time
to do the initial walk of the device and only then begins talking to `upsd`
-- maybe this is when `upsmon` asks for login to the not-yet-recognized UPS
so the nut-server says it is denied per your original post. With NUT
v2.8.1+ the systemd integration is tighter, so daemons can notify the
service manager when they are actually ready to serve, and only then the
dependencies can start, if (packaging) build-time configuration options
enable this mode.
Given the revised integrations since NUT v2.8.0 release, you might have
better luck with current master-branch codebase - see
https://github.com/networkupstools/nut/wiki/Building-NUT-for-in%E2%80%90place-upgrades-or-non%E2%80%90disruptive-tests
- and by confirming that it works or by uncovering more edge cases that are
not well handled, help improve an upcoming NUT v2.8.2 release :)
Hope this helps,
Jim Klimov
On Sat, Jan 20, 2024 at 1:07 PM Stefan Schumacher <
> stefanschumacheratwork at gmail.com> wrote:
>
>> Hello Jim,
>>
>> I am away from home visiting a friend until tomorrow but if you give
>> me a checklist of the information you need, I will gladly gather all
>> that I can find. My computer is not a Raspberry Pi although I own
>> several models of the little guys. My file server is simply my former
>> workstation (Skylake 6700K, 32 GB RAM, Broadcom 3008 HBA, Intel 10GBE
>> NIC). It runs all the time and never goes into suspend or
>> energy-saving mode.It has a btrfs-filesystem with 6 hard drives and
>> about 100TB Brutto capacity.
>>
>> This is the output I get when I plug the USB connection to the UPS on and
>> off:
>>
>> [120112.880346] usb 1-14: USB disconnect, device number 3
>> [120116.883552] usb 1-14: new full-speed USB device number 4 using
>> xhci_hcd
>> [120118.836528] usb 1-14: New USB device found, idVendor=0463,
>> idProduct=ffff, bcdDevice= 1.00
>> [120118.836608] usb 1-14: New USB device strings: Mfr=1, Product=2,
>> SerialNumber=4
>> [120118.836657] usb 1-14: Product: Ellipse ECO
>> [120118.836687] usb 1-14: Manufacturer: EATON
>> [120118.836717] usb 1-14: SerialNumber: 000000000
>> [120122.114630] hid-generic 0003:0463:FFFF.0002: hiddev0,hidraw0: USB
>> HID v10.10 Device [EATON Ellipse ECO] on usb-0000:00:14.0-14/input0
>>
>> This is the output from the driver with nut-monitor and nut-server not
>> running:
>> root at mars:/usr/lib/nut# ./usbhid-ups -DD -a Eaton
>> Network UPS Tools - Generic HID driver 0.47 (2.8.0)
>> USB communication driver (libusb 1.0) 0.43
>> 0.000000 [D1] debug level is '2'
>> 0.001288 [D2] Initializing an USB-connected UPS with library
>> libusb-1.0.26 (API: 0x1000109) (NUT subdriver name='USB communication
>> driver (libusb 1.0)' ver='0.43')
>> 0.001295 [D1] upsdrv_initups (non-SHUT)...
>> 0.004834 [D2] Checking device 1 of 7 (1D6B/0003)
>> 0.004849 [D1] Failed to open device (1D6B/0003), skipping: Access
>> denied (insufficient permissions)
>> 0.004854 [D2] Checking device 2 of 7 (0BDA/5452)
>> 0.004860 [D1] Failed to open device (0BDA/5452), skipping: Access
>> denied (insufficient permissions)
>> 0.004865 [D2] Checking device 3 of 7 (1D6B/0002)
>> 0.004878 [D1] Failed to open device (1D6B/0002), skipping: Access
>> denied (insufficient permissions)
>> 0.004882 [D2] Checking device 4 of 7 (1D6B/0003)
>> 0.004888 [D1] Failed to open device (1D6B/0003), skipping: Access
>> denied (insufficient permissions)
>> 0.004892 [D2] Checking device 5 of 7 (0463/FFFF)
>> 0.417990 [D2] - VendorID: 0463
>> 0.418048 [D2] - ProductID: ffff
>> 0.418065 [D2] - Manufacturer: EATON
>> 0.418082 [D2] - Product: Ellipse ECO
>> 0.418103 [D2] - Serial Number: 000000000
>> 0.418123 [D2] - Bus: 001
>> 0.418141 [D2] - Device: unknown
>> 0.418161 [D2] - Device release number: 0100
>> 0.418183 [D2] Trying to match device
>> 0.418204 [D2] match_function_subdriver (non-SHUT mode): matching a
>> device...
>> 0.418394 [D2] Device matches
>> 0.418416 [D2] Reading first configuration descriptor
>> 0.418456 [D2] failed to claim USB device: Resource busy
>> 0.418487 [D2] Kernel driver already detached
>> 0.418512 [D2] failed to claim USB device: Resource busy
>> 0.418546 [D2] Kernel driver already detached
>> 0.418571 [D2] failed to claim USB device: Resource busy
>> 0.418594 [D2] Kernel driver already detached
>> 0.418619 [D2] failed to claim USB device: Resource busy
>> 0.418643 [D2] Kernel driver already detached
>> 0.418664 Can't claim USB device [0463:ffff]@0/0: Entity not found
>>
>>
>> Yours sincerely
>> Stefan Schumacher
>>
>>
>> Am Sa., 20. Jan. 2024 um 12:03 Uhr schrieb Jim Klimov via Nut-upsuser
>> <nut-upsuser at alioth-lists.debian.net>:
>> >
>> > Well, at least per issue tracker, they can be finicky especially
>> regarding reconnects (in many models, the USB controller tends to fall into
>> power-saving at the wrong times, so polling rates should be increased).
>> More on the Wiki and in github search/labels :)
>> >
>> > As for Eatons, I've had experience with many enterprise and some
>> home-oriented models (worked there for a while), and can't point to any
>> particular "systemic" flaws of the brand. Individual devices of course
>> could be randomly damaged in shop/transport/storage/..., but that's about
>> it.
>> >
>> > In this thread I haven't yet seen one clue or report that would be
>> about the actual UPS or its driver, or a dmesg dump about re-connections,
>> if any - so just have no founded opinion if the power device is or is not
>> at any fault. Likewise, I don't think I've seen any details about the
>> fileserver's hardware (e.g. Raspberry Pi's are often mentioned in issues
>> nowadays, no idea if that points to inherent issues of their hardware or
>> just to their popularity and higher chance-to-encounter in tinkering,
>> though). All bread-crumbs so far were about the data server
>> (nut-server/upsd) and the client (upsmon).
>> >
>> > Jim
>> >
>> >
>> > On Sat, Jan 20, 2024 at 2:03 AM Sam Varshavchik <mrsam at courier-mta.com>
>> wrote:
>> >>
>> >> Stefan Schumacher via Nut-upsuser writes:
>> >>
>> >> > I am considering returning the device to the online shop I bought. As
>> >> > it is, it's nothing but trouble. Can you recommend a small UPS (1
>> >> > Server) that is guaranteed to work flawlessly with NUT?
>> >>
>> >> I've had good experience with Cyberpower units, as long as they're
>> included
>> >> NUT's hardware list.
>> >>
>> >> _______________________________________________
>> >> Nut-upsuser mailing list
>> >> Nut-upsuser at alioth-lists.debian.net
>> >> https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/nut-upsuser
>> >
>> > _______________________________________________
>> > Nut-upsuser mailing list
>> > Nut-upsuser at alioth-lists.debian.net
>> > https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/nut-upsuser
>>
>
On Sat, Jan 20, 2024 at 2:00 PM Jim Klimov <jimklimov+nut at gmail.com> wrote:
> Thanks for the info!
>
> First, regarding the later detail about starting `usbhid-ups`: the
> "Resource busy" indicates that likely an earlier instance of the NUT driver
> (in its own service unit) is still running and holding the device. In a
> worse case, some other programs might consider this a HID device (same
> class family as keyboards and mice) and grab it somehow, or it gets passed
> through virtualization/containers and a different OS holds it in fact
> (guest if you're running NUT on hypervisor, or vice versa). Make sure you
> stop those too, whichever applies.
>
> Second, looking at dmesg and other logs: it seems the device number gets
> re-enumerated (lost #3, got #4) during reconnection - some OSes do that,
> others do not. This might confuse a driver's attempts to re-attach (relying
> on earlier learned details from initial connection) - although due to an
> issue in NUT v2.8.0 and before the drivers might have not tracked the
> "device" number (seen as "unknown" here), which got fixed in 2.8.1, but
> since 2.8.2 the `nut-scanner` would again not-suggest it as a config option
> by default due to such unreliability with reconnections...
>
> There are quite a few nuances that might differ between "vanilla" NUT
> systemd integration (which also evolved over time in git sources) and that
> packaged by distros, I haven't looked deep into Debian recipes lately.
> Maybe there's some unfortunate interaction of service dependencies that
> follows up from the disconnections. Some educated guesswork below:
>
> My current guess would be that a driver gives up reconnecting with the
> remembered data that no longer matches the device, and perhaps exits to be
> restarted by systemd (journal history of `nut-driver at Eaton.service`, if
> that one got generated by `nut-driver-enumerator` on your system, might
> confirm or dispel that guess; otherwise check for a plain old
> `nut-driver.service` monolith). Or it exits/crashes due to some other
> reason, such as manual runs of `usbhid-ups` (if the PID file is saved by
> the daemon and is found and used by the command-line spawned instance to
> kill off the "presumed-frozen" competitor; in the log above I don't see
> direct indications of that though).
>
> Further, what I guess could follow up is that if your only
> `nut-driver*.service` exits and restarts, the systemd dependency it
> provides for `nut-server.service` (maybe via `nut-driver.target`) flickers
> and causes the data server to restart. (Not sure at the moment if it is a
> weak Wants or a harder Requires type of dependency there; technically
> `upsd` should run well without drivers and report that the device is
> unknown or data is stale).
>
> Probably you can interpolate the event trails from `dmesg` and `journalctl
> -lx` as suggested earlier, with bumped `debug_min` in `ups.conf`,
> `upsd.conf` and maybe `upsmon.conf`, to check if the real events and
> service state changes seem to confirm this theorized chain of events:
>
> * USB reconnection (HW/FW reasons can vary a lot);
> * `nut-driver at Eaton.service` tries to reconnect but fails, or for older
> versions just aborts due to loss of link (even without OS USB
> re-enumeration, there can be a few seconds when the newly made devfs node
> is owned by `root` and `udev` did not have time to hand it off to `nut`, so
> the NUT driver can not re-attach);
> * fault of `nut-driver*` causes `nut-server` to stop (shouldn't, but
> might, happen... at least fits the symptoms you've posted earlier)
> * the `nut-driver` is resuscitated by systemd after some RestartSec timeout
> * dependency for `nut-server` is healthy so it is started up again
> * thinking of it, maybe the last couple of steps is almost concurrent: as
> soon as systemd launched the driver process, its unit is considered
> healthy, so the data server starts; however the driver then takes some time
> to do the initial walk of the device and only then begins talking to `upsd`
> -- maybe this is when `upsmon` asks for login to the not-yet-recognized UPS
> so the nut-server says it is denied per your original post. With NUT
> v2.8.1+ the systemd integration is tighter, so daemons can notify the
> service manager when they are actually ready to serve, and only then the
> dependencies can start, if (packaging) build-time configuration options
> enable this mode.
>
> Given the revised integrations since NUT v2.8.0 release, you might have
> better luck with current master-branch codebase - see
> https://github.com/networkupstools/nut/wiki/Building-NUT-for-in%E2%80%90place-upgrades-or-non%E2%80%90disruptive-tests
> - and by confirming that it works or by uncovering more edge cases that are
> not well handled, help improve an upcoming NUT v2.8.2 release :)
>
> Jim
>
>
>
> On Sat, Jan 20, 2024 at 1:07 PM Stefan Schumacher <
> stefanschumacheratwork at gmail.com> wrote:
>
>> Hello Jim,
>>
>> I am away from home visiting a friend until tomorrow but if you give
>> me a checklist of the information you need, I will gladly gather all
>> that I can find. My computer is not a Raspberry Pi although I own
>> several models of the little guys. My file server is simply my former
>> workstation (Skylake 6700K, 32 GB RAM, Broadcom 3008 HBA, Intel 10GBE
>> NIC). It runs all the time and never goes into suspend or
>> energy-saving mode.It has a btrfs-filesystem with 6 hard drives and
>> about 100TB Brutto capacity.
>>
>> This is the output I get when I plug the USB connection to the UPS on and
>> off:
>>
>> [120112.880346] usb 1-14: USB disconnect, device number 3
>> [120116.883552] usb 1-14: new full-speed USB device number 4 using
>> xhci_hcd
>> [120118.836528] usb 1-14: New USB device found, idVendor=0463,
>> idProduct=ffff, bcdDevice= 1.00
>> [120118.836608] usb 1-14: New USB device strings: Mfr=1, Product=2,
>> SerialNumber=4
>> [120118.836657] usb 1-14: Product: Ellipse ECO
>> [120118.836687] usb 1-14: Manufacturer: EATON
>> [120118.836717] usb 1-14: SerialNumber: 000000000
>> [120122.114630] hid-generic 0003:0463:FFFF.0002: hiddev0,hidraw0: USB
>> HID v10.10 Device [EATON Ellipse ECO] on usb-0000:00:14.0-14/input0
>>
>> This is the output from the driver with nut-monitor and nut-server not
>> running:
>> root at mars:/usr/lib/nut# ./usbhid-ups -DD -a Eaton
>> Network UPS Tools - Generic HID driver 0.47 (2.8.0)
>> USB communication driver (libusb 1.0) 0.43
>> 0.000000 [D1] debug level is '2'
>> 0.001288 [D2] Initializing an USB-connected UPS with library
>> libusb-1.0.26 (API: 0x1000109) (NUT subdriver name='USB communication
>> driver (libusb 1.0)' ver='0.43')
>> 0.001295 [D1] upsdrv_initups (non-SHUT)...
>> 0.004834 [D2] Checking device 1 of 7 (1D6B/0003)
>> 0.004849 [D1] Failed to open device (1D6B/0003), skipping: Access
>> denied (insufficient permissions)
>> 0.004854 [D2] Checking device 2 of 7 (0BDA/5452)
>> 0.004860 [D1] Failed to open device (0BDA/5452), skipping: Access
>> denied (insufficient permissions)
>> 0.004865 [D2] Checking device 3 of 7 (1D6B/0002)
>> 0.004878 [D1] Failed to open device (1D6B/0002), skipping: Access
>> denied (insufficient permissions)
>> 0.004882 [D2] Checking device 4 of 7 (1D6B/0003)
>> 0.004888 [D1] Failed to open device (1D6B/0003), skipping: Access
>> denied (insufficient permissions)
>> 0.004892 [D2] Checking device 5 of 7 (0463/FFFF)
>> 0.417990 [D2] - VendorID: 0463
>> 0.418048 [D2] - ProductID: ffff
>> 0.418065 [D2] - Manufacturer: EATON
>> 0.418082 [D2] - Product: Ellipse ECO
>> 0.418103 [D2] - Serial Number: 000000000
>> 0.418123 [D2] - Bus: 001
>> 0.418141 [D2] - Device: unknown
>> 0.418161 [D2] - Device release number: 0100
>> 0.418183 [D2] Trying to match device
>> 0.418204 [D2] match_function_subdriver (non-SHUT mode): matching a
>> device...
>> 0.418394 [D2] Device matches
>> 0.418416 [D2] Reading first configuration descriptor
>> 0.418456 [D2] failed to claim USB device: Resource busy
>> 0.418487 [D2] Kernel driver already detached
>> 0.418512 [D2] failed to claim USB device: Resource busy
>> 0.418546 [D2] Kernel driver already detached
>> 0.418571 [D2] failed to claim USB device: Resource busy
>> 0.418594 [D2] Kernel driver already detached
>> 0.418619 [D2] failed to claim USB device: Resource busy
>> 0.418643 [D2] Kernel driver already detached
>> 0.418664 Can't claim USB device [0463:ffff]@0/0: Entity not found
>>
>>
>> Yours sincerely
>> Stefan Schumacher
>>
>>
>> Am Sa., 20. Jan. 2024 um 12:03 Uhr schrieb Jim Klimov via Nut-upsuser
>> <nut-upsuser at alioth-lists.debian.net>:
>> >
>> > Well, at least per issue tracker, they can be finicky especially
>> regarding reconnects (in many models, the USB controller tends to fall into
>> power-saving at the wrong times, so polling rates should be increased).
>> More on the Wiki and in github search/labels :)
>> >
>> > As for Eatons, I've had experience with many enterprise and some
>> home-oriented models (worked there for a while), and can't point to any
>> particular "systemic" flaws of the brand. Individual devices of course
>> could be randomly damaged in shop/transport/storage/..., but that's about
>> it.
>> >
>> > In this thread I haven't yet seen one clue or report that would be
>> about the actual UPS or its driver, or a dmesg dump about re-connections,
>> if any - so just have no founded opinion if the power device is or is not
>> at any fault. Likewise, I don't think I've seen any details about the
>> fileserver's hardware (e.g. Raspberry Pi's are often mentioned in issues
>> nowadays, no idea if that points to inherent issues of their hardware or
>> just to their popularity and higher chance-to-encounter in tinkering,
>> though). All bread-crumbs so far were about the data server
>> (nut-server/upsd) and the client (upsmon).
>> >
>> > Jim
>> >
>> >
>> > On Sat, Jan 20, 2024 at 2:03 AM Sam Varshavchik <mrsam at courier-mta.com>
>> wrote:
>> >>
>> >> Stefan Schumacher via Nut-upsuser writes:
>> >>
>> >> > I am considering returning the device to the online shop I bought. As
>> >> > it is, it's nothing but trouble. Can you recommend a small UPS (1
>> >> > Server) that is guaranteed to work flawlessly with NUT?
>> >>
>> >> I've had good experience with Cyberpower units, as long as they're
>> included
>> >> NUT's hardware list.
>> >>
>> >> _______________________________________________
>> >> Nut-upsuser mailing list
>> >> Nut-upsuser at alioth-lists.debian.net
>> >> https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/nut-upsuser
>> >
>> > _______________________________________________
>> > Nut-upsuser mailing list
>> > Nut-upsuser at alioth-lists.debian.net
>> > https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/nut-upsuser
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/nut-upsuser/attachments/20240120/1fbfab43/attachment-0001.htm>
More information about the Nut-upsuser
mailing list