[Nut-upsdev] a nasty kernel oops

Knight, Dave diemkae at gmail.com
Wed Jan 12 16:16:57 UTC 2011


My guess would be that there is some miscommunication between driver levels,
perhaps a result of running a new nut with an old kernel (latest is 2.6-37)?

That the presence of the "-D" flag appears to suppress the error:

 * I have hacked the startup script to invoke usbhid-ups directly with
>   the proper device parameter and -D, and the problem goes away
>  * I have used the hacked startup script to invoke usbhid-ups directly
>   with the proper device parameter but *no* -D, and we have a crash!
>

suggests a place to start, possibly looking for a device/interface timing
issue.

Dave

On Wed, Jan 12, 2011 at 10:08 AM, Alfred Ganz
<alfred-ganz+nut at agci.com<alfred-ganz%2Bnut at agci.com>
> wrote:

> Ladies and Gentlemen,
>
> I have been trying to sort out a nasty kernel oops for which I would
> like to ask for some advice. I don't actually think that this is a nut
> problem, although it is triggered by upsdrvctl or usbhid-ups. I rather
> suspect the USB library or the associated kernel code.
> Here is the configuration information:
>  * system: Centos-5.5
>  * kernel: 2.6.18-194.32.1.el5, latest vanilla for Centos-5.5
>  * nut: nut-2.4.3-1, a build provided by Arnaud last fall
>  * libusb: libusb-0.1.12-5.1, the latest for this system
>  * the UPS:
>        device.mfr: APC
>        device.model: Back-UPS ES 650
>        device.serial: QB0514232934
>        device.type: ups
>        ups.firmware: 818.w1.D
>        ups.mfr.date: 2005/08/10
>        ups.productid: 0002
>   Sorry Arnaud!
>  * the driver:
>        driver.name: usbhid-ups
>        driver.parameter.pollfreq: 30
>        driver.parameter.pollinterval: 1
>        driver.parameter.port: auto
>        driver.version: 2.4.3
>        driver.version.data: APC HID 0.95
>        driver.version.internal: 0.34
> The problem:
>  * under certain circumstances system boot fails with a kernel oops
>   during the upsdrvctl/usbhid-ups phase of the startup script.
>   Note that we have *not* reached the upsd part of the startup script,
>        and the startup script of the printer package (cups) would be
>        reached much later.
>  * the boot process works fine if another USB device, the USB printer,
>   is powered up. Unfortunately, I would like to keep it powered off
>   most of the time if I could do so.
>   Note, the printer is not the only other USB device on the system,
>        but the number of active USB devices is being changed.
>  * Once the system is completely booted, there is no problem starting
>   and stopping the ups using the startup script.
>  * the problem appeared late in the lifetime of Centos-5, the system
>   has shut down due to power loss and restarted with the printer off
>   before the arrival of the problem.
>  * the problem has probably appeared with the transition from Centos-5.4
>   (kernel-2.6.18-164.15.1.el5) to Centos-5.5 (kernel-2.6.18-194.3.1.el5),
>   although I noticed it much later
>  * libusb has been the same since 2008/04/22, I don't believe the problem
>   appeared that long ago.
>  * the problem happens with the locally rebuilt (with no changes)
>   nut-2.2.0-6.1 from Fedora 8, with the nut-2.4.3-1 provided by
>   Arnaud, and with a locally rebuilt version of nut-2.4.3-1 using
>   a spec file provided by Arnaud.
> Here is what I have tried:
>  * I have started the system for a crash dump, and below is an extract
>   from the crash log.
>  * I have tried to recreate the problem on a virtual machine, but it
>   doesn't happen there
>  * I have introduced a delay before the start of the ups startup script
>   to make sure that the suspected timing problem doesn't happen earlier.
>  * I have hacked the startup script to invoke usbhid-ups directly with
>   the proper device parameter and -D, and the problem goes away
>  * I have used the hacked startup script to invoke usbhid-ups directly
>   with the proper device parameter but *no* -D, and we have a crash!
> So I am now left with the observation that we have a timing problem
> somewhere among usbhid-ups, libusb, and the kernel, and I have no
> idea how to further narrow things down.
>
> I would be glad to do some more tests, but I would need your help with
> this.
>
> Thanks for any advice, AG
>
>
> ----------------------------------------------------------------------------
> BUG: unable to handle kernel paging request at virtual address 0000190c
>  printing eip:
> c05954c6
> *pde = 74ecb067
> Oops: 0000 [#1]
> SMP
> ....
> CPU:    0
> EIP:    0060:[<c05954c6>]    Not tainted VLI
> EFLAGS: 00010206   (2.6.18-194.26.1.el5 #1)
> EIP is at hid_close+0x2/0x1f
> eax: 00000000   ebx: d216bd20   ecx: f7fff080   edx: 00000000
> esi: d21ebc00   edi: c06b3470   ebp: 00000000   esp: f768ad34
> ds: 007b   es: 007b   ss: 0068
> Process usbhid-ups (pid: 2437, ti=f768a000 task=f7682000 task.ti=f768a000)
> Stack: c05976fd d20fe000 c0595668 d21ebc00 c06b3440 c058e1c5 d21ebc8c
> d21ebc14
>       c055e859 d21ebc14 d21ebc14 d21ebc00 c055ea91 d21ebc00 c0588351
> ffffffc3
>       00000000 c0590cc0 f75ee340 00000000 00005516 00000000 c00c5512
> d2173400
> Call Trace:
>  [<c05976fd>] hiddev_disconnect+0x3f/0x5e
>  [<c0595668>] hid_disconnect+0x81/0xbf
>  [<c058e1c5>] usb_unbind_interface+0x34/0x6a
>  [<c055e859>] __device_release_driver+0x7d/0xbb
>  [<c055ea91>] device_release_driver+0x1c/0x2b
>  [<c0588351>] usb_driver_release_interface+0x38/0x60
>  [<c0590cc0>] proc_ioctl_default+0x10d/0x1d0
>  [<c0591ffd>] usbdev_ioctl+0x1027/0x10de
>  [<c048b06b>] __d_lookup+0x98/0xdb
>  [<c04c7eff>] inode_has_perm+0x54/0x5c
>  [<c04c78ab>] avc_has_perm+0x3c/0x46
>  [<c04c7eff>] inode_has_perm+0x54/0x5c
>  [<c0464c32>] __handle_mm_fault+0x463/0xaac
>  [<c0486290>] do_ioctl+0x47/0x5d
>  [<c04867f9>] vfs_ioctl+0x47b/0x4d3
>  [<c0486899>] sys_ioctl+0x48/0x5f
>  [<c0404f17>] syscall_call+0x7/0xb
>  =======================
> Code: 00 05 b0 01 86 83 b4 0c 00 00 fb 8d 83 54 0c 00 00 e8 4c 87 e9 ff 8b
> 83 a8 0c 00 00 e8 7d 74 ff ff 31 c0 5b c3 31 d2 eb be 89 c2 <8b> 80 0c 19 00
> 00 48 85 c0 89 82 0c 19 00 00 75 0b 8b 82 a8 0c
> EIP: [<c05954c6>] hid_close+0x2/0x1f SS:ESP 0068:f768ad34
>
> ----------------------------------------------------------------------------
>
> --
>  ----------------------------------------------------------------------
>   Alfred Ganz                                  alfred-ganz:at:agci.com
>   AG Consulting, Inc.                          (203) 624-9667
>   440 Prospect Street # 11
>   New Haven, CT 06511
>  ----------------------------------------------------------------------
>



-- 
"Ridicule is man's most potent weapon.  It’s hard to counterattack ridicule,
and it infuriates
 the opposition, which then reacts to your advantage."

                                                              Saul Alinsky,
Marxist, Obama mentor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/nut-upsdev/attachments/20110112/6b780000/attachment.htm>


More information about the Nut-upsdev mailing list