[Nut-upsdev] a nasty kernel oops
Knight, Dave
diemkae at gmail.com
Wed Jan 12 16:16:57 UTC 2011
My guess would be that there is some miscommunication between driver levels,
perhaps a result of running a new nut with an old kernel (latest is 2.6-37)?
That the presence of the "-D" flag appears to suppress the error:
* I have hacked the startup script to invoke usbhid-ups directly with
> the proper device parameter and -D, and the problem goes away
> * I have used the hacked startup script to invoke usbhid-ups directly
> with the proper device parameter but *no* -D, and we have a crash!
>
suggests a place to start, possibly looking for a device/interface timing
issue.
Dave
On Wed, Jan 12, 2011 at 10:08 AM, Alfred Ganz
<alfred-ganz+nut at agci.com<alfred-ganz%2Bnut at agci.com>
> wrote:
> Ladies and Gentlemen,
>
> I have been trying to sort out a nasty kernel oops for which I would
> like to ask for some advice. I don't actually think that this is a nut
> problem, although it is triggered by upsdrvctl or usbhid-ups. I rather
> suspect the USB library or the associated kernel code.
> Here is the configuration information:
> * system: Centos-5.5
> * kernel: 2.6.18-194.32.1.el5, latest vanilla for Centos-5.5
> * nut: nut-2.4.3-1, a build provided by Arnaud last fall
> * libusb: libusb-0.1.12-5.1, the latest for this system
> * the UPS:
> device.mfr: APC
> device.model: Back-UPS ES 650
> device.serial: QB0514232934
> device.type: ups
> ups.firmware: 818.w1.D
> ups.mfr.date: 2005/08/10
> ups.productid: 0002
> Sorry Arnaud!
> * the driver:
> driver.name: usbhid-ups
> driver.parameter.pollfreq: 30
> driver.parameter.pollinterval: 1
> driver.parameter.port: auto
> driver.version: 2.4.3
> driver.version.data: APC HID 0.95
> driver.version.internal: 0.34
> The problem:
> * under certain circumstances system boot fails with a kernel oops
> during the upsdrvctl/usbhid-ups phase of the startup script.
> Note that we have *not* reached the upsd part of the startup script,
> and the startup script of the printer package (cups) would be
> reached much later.
> * the boot process works fine if another USB device, the USB printer,
> is powered up. Unfortunately, I would like to keep it powered off
> most of the time if I could do so.
> Note, the printer is not the only other USB device on the system,
> but the number of active USB devices is being changed.
> * Once the system is completely booted, there is no problem starting
> and stopping the ups using the startup script.
> * the problem appeared late in the lifetime of Centos-5, the system
> has shut down due to power loss and restarted with the printer off
> before the arrival of the problem.
> * the problem has probably appeared with the transition from Centos-5.4
> (kernel-2.6.18-164.15.1.el5) to Centos-5.5 (kernel-2.6.18-194.3.1.el5),
> although I noticed it much later
> * libusb has been the same since 2008/04/22, I don't believe the problem
> appeared that long ago.
> * the problem happens with the locally rebuilt (with no changes)
> nut-2.2.0-6.1 from Fedora 8, with the nut-2.4.3-1 provided by
> Arnaud, and with a locally rebuilt version of nut-2.4.3-1 using
> a spec file provided by Arnaud.
> Here is what I have tried:
> * I have started the system for a crash dump, and below is an extract
> from the crash log.
> * I have tried to recreate the problem on a virtual machine, but it
> doesn't happen there
> * I have introduced a delay before the start of the ups startup script
> to make sure that the suspected timing problem doesn't happen earlier.
> * I have hacked the startup script to invoke usbhid-ups directly with
> the proper device parameter and -D, and the problem goes away
> * I have used the hacked startup script to invoke usbhid-ups directly
> with the proper device parameter but *no* -D, and we have a crash!
> So I am now left with the observation that we have a timing problem
> somewhere among usbhid-ups, libusb, and the kernel, and I have no
> idea how to further narrow things down.
>
> I would be glad to do some more tests, but I would need your help with
> this.
>
> Thanks for any advice, AG
>
>
> ----------------------------------------------------------------------------
> BUG: unable to handle kernel paging request at virtual address 0000190c
> printing eip:
> c05954c6
> *pde = 74ecb067
> Oops: 0000 [#1]
> SMP
> ....
> CPU: 0
> EIP: 0060:[<c05954c6>] Not tainted VLI
> EFLAGS: 00010206 (2.6.18-194.26.1.el5 #1)
> EIP is at hid_close+0x2/0x1f
> eax: 00000000 ebx: d216bd20 ecx: f7fff080 edx: 00000000
> esi: d21ebc00 edi: c06b3470 ebp: 00000000 esp: f768ad34
> ds: 007b es: 007b ss: 0068
> Process usbhid-ups (pid: 2437, ti=f768a000 task=f7682000 task.ti=f768a000)
> Stack: c05976fd d20fe000 c0595668 d21ebc00 c06b3440 c058e1c5 d21ebc8c
> d21ebc14
> c055e859 d21ebc14 d21ebc14 d21ebc00 c055ea91 d21ebc00 c0588351
> ffffffc3
> 00000000 c0590cc0 f75ee340 00000000 00005516 00000000 c00c5512
> d2173400
> Call Trace:
> [<c05976fd>] hiddev_disconnect+0x3f/0x5e
> [<c0595668>] hid_disconnect+0x81/0xbf
> [<c058e1c5>] usb_unbind_interface+0x34/0x6a
> [<c055e859>] __device_release_driver+0x7d/0xbb
> [<c055ea91>] device_release_driver+0x1c/0x2b
> [<c0588351>] usb_driver_release_interface+0x38/0x60
> [<c0590cc0>] proc_ioctl_default+0x10d/0x1d0
> [<c0591ffd>] usbdev_ioctl+0x1027/0x10de
> [<c048b06b>] __d_lookup+0x98/0xdb
> [<c04c7eff>] inode_has_perm+0x54/0x5c
> [<c04c78ab>] avc_has_perm+0x3c/0x46
> [<c04c7eff>] inode_has_perm+0x54/0x5c
> [<c0464c32>] __handle_mm_fault+0x463/0xaac
> [<c0486290>] do_ioctl+0x47/0x5d
> [<c04867f9>] vfs_ioctl+0x47b/0x4d3
> [<c0486899>] sys_ioctl+0x48/0x5f
> [<c0404f17>] syscall_call+0x7/0xb
> =======================
> Code: 00 05 b0 01 86 83 b4 0c 00 00 fb 8d 83 54 0c 00 00 e8 4c 87 e9 ff 8b
> 83 a8 0c 00 00 e8 7d 74 ff ff 31 c0 5b c3 31 d2 eb be 89 c2 <8b> 80 0c 19 00
> 00 48 85 c0 89 82 0c 19 00 00 75 0b 8b 82 a8 0c
> EIP: [<c05954c6>] hid_close+0x2/0x1f SS:ESP 0068:f768ad34
>
> ----------------------------------------------------------------------------
>
> --
> ----------------------------------------------------------------------
> Alfred Ganz alfred-ganz:at:agci.com
> AG Consulting, Inc. (203) 624-9667
> 440 Prospect Street # 11
> New Haven, CT 06511
> ----------------------------------------------------------------------
>
--
"Ridicule is man's most potent weapon. It’s hard to counterattack ridicule,
and it infuriates
the opposition, which then reacts to your advantage."
Saul Alinsky,
Marxist, Obama mentor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/nut-upsdev/attachments/20110112/6b780000/attachment.htm>
More information about the Nut-upsdev
mailing list