[Nut-upsuser] usbhid-ups causes hang/crash in ohci driver
Kristian Rasmussen
kristian_rasmussen at fastmail.co.uk
Sat Nov 22 00:53:44 UTC 2014
I'm experiencing a rather serious issue with the usbhid-ups driver on
two different Linux systems. The symptoms on both systems are the same:
After anything from a few hours to a day or two, upsd is no longer able
to communicate with the UPS. upsd initially starts complaining about
"data from UPS <such-and-such> is stale - check driver", then upsmon
reports a loss of communication.
When this happens, reloading the usbhid-ups driver and even unplugging
and re-inserting the USB cable does not fix the problem, and lsusb does
not list the UPS at all. On one system, an x86_64 server running kernel
3.17.3, the following can be seen in the log:
Nov 21 23:24:07 test-svr1 kernel: [111233.920039] ohci-pci 0000:00:02.0:
frame counter not updating; disabled
Nov 21 23:24:07 test-svr1 kernel: [111233.920047] ohci-pci 0000:00:02.0:
HC died; cleaning up
Unloading and re-inserting the ohci kernel modules (ohci_hcd and
ohci_pci) does temporarily resolve the issue on this host, but after a
few hours the problem appears again.
On the other affected system, a 32-bit VIA router running kernel 3.12.6,
the issue causes a kernel panic and nothing regarding the kernel USB
drivers is logged. I haven't yet had the opportunity to set up serial
console logging to conclusively verify that the panic occurs in the ohci
module, but it does seem likely; if I catch the problem in time, after
the warning about "stale data" from upsd but before the actual panic
occurs, I can unload and reload the ohci drivers and prevent the crash.
Both systems are running nut-2.7.1 compiled from source. The UPS units
involved are not identical; one is an older MGE Pulsar while the other
is a newer Eaton, but both use the same USB identiier (0463:ffff, "MGE
UPS Systems UPS"). The USB chipsets involved are quite dissimilar
(Nvidia vs. VIA).
Is this likely to be a bug in nut, or has the nut usbhid-ups driver
perhaps triggered an underlying kernel bug in the USB driver subsystem?
Anything I can do to narrow down what's causing this?
Regards,
Kristian
More information about the Nut-upsuser
mailing list