[Nut-upsdev] a nasty kernel oops

Charles Lepple clepple at gmail.com
Sat Jan 15 15:27:26 UTC 2011


On Jan 14, 2011, at 9:31 PM, Alfred Ganz wrote:

> Charles,
>
> Here is some more insight into my problem.
> * I am now able to get a crash on a virtual machine, so life has
>   become a bit easier

kernel oops, usbhid-ups crash, or simply a failure to launch usbhid-ups?

> * disabling the UPS, then immediately after re-enabling it, the first
>   libhid-detach-device fails after about 10 sec, with:
> 	hid_force_open failed with return code 7.
>   i.e. no device has been found.
>   Following the first libhid-detach-device immediately by several  
> more,
>   they all fail the same way, but without another 10 sec delay.
>   Finally, adding a sleep 1 followed by another libhid-detach-device
>   will succeed.
> * disabling the UPS, then waiting 20 sec after re-enabling it, the
>   first libhid-detach-device will succeed.
>   Note, I wasn't able to reduce this delay significantly, so it seems
> 	that the total delay can be smaller when doing the above
> 	failing operations.
> * The same behavior occurs when using lsusb -d instead of the above
>   libhid-detach-device.
> * usbhid-ups crashes if the last preceeding libhid-detach-device  
> fails,
>   but it will not fail if there is a successful libhid-detach-device
>   preceeding it, or if there is a longer inactive delay.
> Unfortunately, the timing is for the virtual machine, and I don't  
> expect
> things to be similar on the real machine, not to speak of the boot
> context with other devices present.
>
> As you suspected, it looks like usbhid-ups crashes if things have not
> reached quiescence or some other kind of availability. However, I have
> no idea how at boot time adding an active USB device can achieve this
> (or maybe achieve it much more quickly).
>
> It would of course be nice to make usbhid-ups have a builtin method  
> for
> detecting such a state and at the same time be able to detect the
> absence of the device in question. However, I think the appropriate
> thing is to determine such a method outside of usbhid-ups first. If at
> all possible, I would prefer to do this with some shell script, but if
> push comes to shuff, I might have to resort to some C code as well.

I don't want to downplay the significance of the problem on your end,  
but it is really up to the kernel to protect itself from race  
conditions and crashes caused by userspace applications accessing  
devices. To that end, I agree that something should be done outside  
usbhid-ups.

We've had a few discussions on how the drivers should deal with USB  
devices which are not there. My take on this is that we will try to  
reconnect if it is a temporary disappearance, but we won't retry for  
long at startup. I personally think that if the device node is not  
ready by the time NUT starts, either NUT is being started too early,  
or the device is not to be trusted with something as critical as  
notification of power events.

That said, the HAL-style drivers are started when the device is  
plugged in. While that might be nice, I don't think that's a fair  
comparison because they tend to provide information about the power  
situation, rather than being part of a reliable monitoring and  
shutdown system.

> Any advice on what might work would of course be much appreciated.

One workaround would be to patch the kernel to blacklist the UPS from  
the kernel raw HID driver. Of course, this doesn't play well with  
prebuilt binary kernels.

Along these lines, it should be possible to blacklist the kernel HID  
module which has attached to the UPS. I haven't followed this portion  
of the kernel much lately (and all bets are off in RedHat kernels),  
but with any luck, it might be separate from the keyboard/mouse HID-to- 
input-layer module.

A less intrusive way might be to watch the /dev space for the node  
corresponding to the HID interface, and wait a few seconds after that  
appears before detaching.

> Thanks, AG
>
> P.S. What happened to the mail server at lists.alioth.debian.org

We haven't heard any updates, and I don't see any tracker items  
explaining what happened, but it seems to be back now.



More information about the Nut-upsdev mailing list