Bug#593438: dahdi-source: Kernel panic while call in progress on TDM400P

Thu Aug 19 09:34:09 UTC 2010

On Wed, Aug 18, 2010 at 2:52 AM, Tzafrir Cohen <tzafrir.cohen at xorcom.com> wrote:
> Thanks for the report,
>
> On Wed, Aug 18, 2010 at 01:27:46AM -0700, Russ Dill wrote:
>> Package: dahdi-source
>> Version: 1:2.3.0.1+dfsg-1
>> Severity: important
>> Tags: upstream
>>
>> System becomes unreponsive, even to serial break sysrq (oops captured via
>> serial console). This occurs after only a few minutes of call time.
>>
>> [ 9695.212106] BUG: unable to handle kernel NULL pointer dereference at 000001f0
>> [ 9695.216018] IP: [<c1001eaf>] __switch_to+0x50/0x129
>> [ 9695.216018] *pde = 00000000
>> [ 9695.216018] Oops: 0002 [#1]
>> [ 9695.216018] last sysfs file: /sys/devices/pci0000:00/0000:00:10.2/usb4/4-2/idProduct
>> [ 9695.216018] Modules linked in: dahdi_echocan_oslec echo loop snd_via82xx gameport snd_ac97_codec ac97_bus snd_pcm snd_page_alloc]
>> [ 9695.216018]
>> [ 9695.216018] Pid: 14267, comm: tail Not tainted (2.6.32-5-486 #1)
>> [ 9695.216018] EIP: 0060:[<c1001eaf>] EFLAGS: 00010046 CPU: 0
>> [ 9695.216018] EIP is at __switch_to+0x50/0x129
>> [ 9695.216018] EAX: 00000001 EBX: ce835040 ECX: 57ab6cfd EDX: ce835040
>> [ 9695.216018] ESI: cd15e8a0 EDI: 00000000 EBP: 00000000 ESP: ce847f28
>> [ 9695.216018]  DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
>> [ 9695.216018] Process tail (pid: 14267, ti=ce846000 task=cd15e8a0 task.ti=cd24e000)
>> [ 9695.216018] Stack:
>> [ 9695.216018]  c101cf62 c08ea080 c08ea0ac 725916f1 00000000 cd16e540 cd15e8a0 c1242f26
>> [ 9695.216018] <0> ce835040 00000046 cb023800 c1350604 ce8351fc 576e589e 000008d1 c135ae30
>> [ 9695.216018] <0> 00000246 cd0c281c 00000286 cd4af4b8 cb0238a4 ce80a0e0 ce847fb4 ce80a0e0
>> [ 9695.216018] Call Trace:
>> [ 9695.216018]  [<c101cf62>] ? set_next_entity+0x29/0x51
>> [ 9695.216018]  [<c1242f26>] ? schedule+0x395/0x3d5
>> [ 9695.216018]  [<c1030d4a>] ? worker_thread+0x90/0x1a4
>> [ 9695.216018]  [<c116e0a5>] ? flush_to_ldisc+0x0/0x161
>> [ 9695.216018]  [<c1033570>] ? autoremove_wake_function+0x0/0x2d
>> [ 9695.216018]  [<c1030cba>] ? worker_thread+0x0/0x1a4
>> [ 9695.216018]  [<c10331c8>] ? kthread+0x60/0x65
>> [ 9695.216018]  [<c1033168>] ? kthread+0x0/0x65
>> [ 9695.216018]  [<c1003997>] ? kernel_thread_helper+0x7/0x10
>> [ 9695.216018] Code: 40 0c a8 01 74 56 a8 10 8b be 8c 02 00 00 74 1b 83 c8 ff 89 c2 0f ae 27 f6 87 00 02 00 00 01 74 23 80 7f 02 00
>> [ 9695.216018] EIP: [<c1001eaf>] __switch_to+0x50/0x129 SS:ESP 0068:ce847f28
>> [ 9695.216018] CR2: 00000000000001f0
>> [ 9695.216018] ---[ end trace 090adb1746d9327c ]---
>> [ 9695.452635] BUG: unable to handle kernel NULL pointer dereference at 000001f0
>> [ 9695.456014] IP: [<c1003b13>] __math_state_restore+0x30/0x67
>
> Yup. Could indeed be related to OSLEC and its unique feature of MMX code
> running in the interrupt handler. I'll try to look into it. An ugly
> workaround would be to disable the MMX optimizations of the OSLEC code
> (see drivers/staging/oslec/Kbuild , IIRC). That would have a noticable
> performance penalty, which may be rather meaningful on a system such as
> yours

testing that now by removing the mmx_auto patch. A very interesting
side effect is that my codec_g729.so is now working. The performance
difference isn't a big deal as the system has more than enough
horsepower to handle 2 or 3 calls.

BTW, I've noticed that Greg KH has removed the USE_MMX and USE_SSE
code from the staging tree.