[Nut-upsdev] a nasty kernel oops

Alfred Ganz alfred-ganz+nut at agci.com
Wed Jan 12 15:08:35 UTC 2011


Ladies and Gentlemen,

I have been trying to sort out a nasty kernel oops for which I would
like to ask for some advice. I don't actually think that this is a nut
problem, although it is triggered by upsdrvctl or usbhid-ups. I rather
suspect the USB library or the associated kernel code.
Here is the configuration information:
 * system: Centos-5.5
 * kernel: 2.6.18-194.32.1.el5, latest vanilla for Centos-5.5
 * nut: nut-2.4.3-1, a build provided by Arnaud last fall
 * libusb: libusb-0.1.12-5.1, the latest for this system
 * the UPS: 
	device.mfr: APC
	device.model: Back-UPS ES 650
	device.serial: QB0514232934  
	device.type: ups
	ups.firmware: 818.w1.D
	ups.mfr.date: 2005/08/10
	ups.productid: 0002
   Sorry Arnaud!
 * the driver:
	driver.name: usbhid-ups
	driver.parameter.pollfreq: 30
	driver.parameter.pollinterval: 1
	driver.parameter.port: auto
	driver.version: 2.4.3
	driver.version.data: APC HID 0.95
	driver.version.internal: 0.34
The problem:
 * under certain circumstances system boot fails with a kernel oops 
   during the upsdrvctl/usbhid-ups phase of the startup script. 
   Note that we have *not* reached the upsd part of the startup script, 
	and the startup script of the printer package (cups) would be 
	reached much later.
 * the boot process works fine if another USB device, the USB printer, 
   is powered up. Unfortunately, I would like to keep it powered off 
   most of the time if I could do so.
   Note, the printer is not the only other USB device on the system,
	but the number of active USB devices is being changed.
 * Once the system is completely booted, there is no problem starting
   and stopping the ups using the startup script.
 * the problem appeared late in the lifetime of Centos-5, the system
   has shut down due to power loss and restarted with the printer off
   before the arrival of the problem.
 * the problem has probably appeared with the transition from Centos-5.4 
   (kernel-2.6.18-164.15.1.el5) to Centos-5.5 (kernel-2.6.18-194.3.1.el5),
   although I noticed it much later
 * libusb has been the same since 2008/04/22, I don't believe the problem
   appeared that long ago.
 * the problem happens with the locally rebuilt (with no changes)
   nut-2.2.0-6.1 from Fedora 8, with the nut-2.4.3-1 provided by
   Arnaud, and with a locally rebuilt version of nut-2.4.3-1 using
   a spec file provided by Arnaud.
Here is what I have tried:
 * I have started the system for a crash dump, and below is an extract
   from the crash log.
 * I have tried to recreate the problem on a virtual machine, but it
   doesn't happen there
 * I have introduced a delay before the start of the ups startup script
   to make sure that the suspected timing problem doesn't happen earlier.
 * I have hacked the startup script to invoke usbhid-ups directly with
   the proper device parameter and -D, and the problem goes away
 * I have used the hacked startup script to invoke usbhid-ups directly 
   with the proper device parameter but *no* -D, and we have a crash!
So I am now left with the observation that we have a timing problem
somewhere among usbhid-ups, libusb, and the kernel, and I have no
idea how to further narrow things down. 

I would be glad to do some more tests, but I would need your help with
this.

Thanks for any advice, AG

----------------------------------------------------------------------------
BUG: unable to handle kernel paging request at virtual address 0000190c
 printing eip:
c05954c6
*pde = 74ecb067
Oops: 0000 [#1]
SMP 
....
CPU:    0
EIP:    0060:[<c05954c6>]    Not tainted VLI
EFLAGS: 00010206   (2.6.18-194.26.1.el5 #1) 
EIP is at hid_close+0x2/0x1f
eax: 00000000   ebx: d216bd20   ecx: f7fff080   edx: 00000000
esi: d21ebc00   edi: c06b3470   ebp: 00000000   esp: f768ad34
ds: 007b   es: 007b   ss: 0068
Process usbhid-ups (pid: 2437, ti=f768a000 task=f7682000 task.ti=f768a000)
Stack: c05976fd d20fe000 c0595668 d21ebc00 c06b3440 c058e1c5 d21ebc8c d21ebc14 
       c055e859 d21ebc14 d21ebc14 d21ebc00 c055ea91 d21ebc00 c0588351 ffffffc3 
       00000000 c0590cc0 f75ee340 00000000 00005516 00000000 c00c5512 d2173400 
Call Trace:
 [<c05976fd>] hiddev_disconnect+0x3f/0x5e
 [<c0595668>] hid_disconnect+0x81/0xbf
 [<c058e1c5>] usb_unbind_interface+0x34/0x6a
 [<c055e859>] __device_release_driver+0x7d/0xbb
 [<c055ea91>] device_release_driver+0x1c/0x2b
 [<c0588351>] usb_driver_release_interface+0x38/0x60
 [<c0590cc0>] proc_ioctl_default+0x10d/0x1d0
 [<c0591ffd>] usbdev_ioctl+0x1027/0x10de
 [<c048b06b>] __d_lookup+0x98/0xdb
 [<c04c7eff>] inode_has_perm+0x54/0x5c
 [<c04c78ab>] avc_has_perm+0x3c/0x46
 [<c04c7eff>] inode_has_perm+0x54/0x5c
 [<c0464c32>] __handle_mm_fault+0x463/0xaac
 [<c0486290>] do_ioctl+0x47/0x5d
 [<c04867f9>] vfs_ioctl+0x47b/0x4d3
 [<c0486899>] sys_ioctl+0x48/0x5f
 [<c0404f17>] syscall_call+0x7/0xb
 =======================
Code: 00 05 b0 01 86 83 b4 0c 00 00 fb 8d 83 54 0c 00 00 e8 4c 87 e9 ff 8b 83 a8 0c 00 00 e8 7d 74 ff ff 31 c0 5b c3 31 d2 eb be 89 c2 <8b> 80 0c 19 00 00 48 85 c0 89 82 0c 19 00 00 75 0b 8b 82 a8 0c 
EIP: [<c05954c6>] hid_close+0x2/0x1f SS:ESP 0068:f768ad34
----------------------------------------------------------------------------

-- 
 ----------------------------------------------------------------------
   Alfred Ganz					alfred-ganz:at:agci.com
   AG Consulting, Inc.				(203) 624-9667
   440 Prospect Street # 11
   New Haven, CT 06511
 ----------------------------------------------------------------------



More information about the Nut-upsdev mailing list