[Debian-ha-maintainers] Oops breaks resource failover in RHCS
Fabio M. Di Nitto
fdinitto at redhat.com
Wed Feb 17 06:37:59 UTC 2010
Please file a bug on bugzilla.redhat.com -> Fedora rawhide -> component
cluster. I´ll take care to reassign it to the correct maintainer.
Yes I understand you are running Debian, but we use RH bugzilla instance
for upstream, so just go ahead and add all the info in there.
thanks
Fabio
On 2/17/2010 3:07 AM, Ernesto Rodriguez Reina wrote:
> Hi, once I wrote you because I had a very very similar problem, and I
> though it was completed solved. Unfortunately I saw the OOPS again. We
> have repeted some times and always get the same. Here is my scenario:
>
> Node master with nodeid=1;
> Node spare with nodeid=2;
> Node slave1 with nodeid=3;
> Node slave2 with nodeid=4;
>
> We shutdown node master. Services are corrected relocated. We turn on
> node Master and again services are corrected relocated. We then
> shutdown node master again and then the oops appears but only on node
> spare, nodes slave1 and slave2 seems to be ok with services running.
> We tested with to different kernels 2.6.32.8 and 2.6.31.5 (with patch
> http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=063c4c99630c0b06afad080d2a18bda64172c1a2).
>
> We are using RHCS 3.0.4-2 from debian mirror. Any ideas of how to
> solve this? We are going to test with RHCS 3.0.6-5
>
> Hoping you can help me. Best regards,
> Ernesto
>
> The oops:
>
> with kernel 2.6.32.8:
> Feb 16 19:48:22 spare kernel: [ 1080.523027] INFO: task rgmanager:6531
> blocked for more than 120 seconds.
> Feb 16 19:48:22 spare kernel: [ 1080.523091] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Feb 16 19:48:22 spare kernel: [ 1080.523166] rgmanager D
> 0000000000000000 0 6531 2363 0x00000000
> Feb 16 19:48:22 spare kernel: [ 1080.523170] ffffffff826fc080
> 0000000000000086 0000000000000296 ffffffff8104d7d9
> Feb 16 19:48:22 spare kernel: [ 1080.523175] ffff8801ab0a5038
> 000000000000e1c8 ffff8801ac083fd8 ffff8801ab915000
> Feb 16 19:48:22 spare kernel: [ 1080.523178] ffff8801ab9154b0
> ffff8801ab0a5010 ffffffff82a420d8 ffff8801ab9154b0
> Feb 16 19:48:22 spare kernel: [ 1080.523181] Call Trace:
> Feb 16 19:48:22 spare kernel: [ 1080.523189] [<ffffffff8104d7d9>] ?
> try_to_wake_up+0x109/0x2d0
> Feb 16 19:48:22 spare kernel: [ 1080.523194] [<ffffffff81234bc4>] ?
> cpumask_any_but+0x24/0x40
> Feb 16 19:48:22 spare kernel: [ 1080.523199] [<ffffffff8140d7a5>] ?
> __down_read+0x85/0xb5
> Feb 16 19:48:22 spare kernel: [ 1080.523208] [<ffffffffa04b7960>] ?
> dlm_user_request+0x60/0x240 [dlm]
> Feb 16 19:48:22 spare kernel: [ 1080.523212] [<ffffffff8110a72c>] ?
> __kmalloc+0x11c/0x250
> Feb 16 19:48:22 spare kernel: [ 1080.523217] [<ffffffffa04c2196>] ?
> device_write+0x686/0x790 [dlm]
> Feb 16 19:48:22 spare kernel: [ 1080.523221] [<ffffffff81111f7b>] ?
> vfs_write+0xcb/0x1a0
> Feb 16 19:48:22 spare kernel: [ 1080.523224] [<ffffffff81112153>] ?
> sys_write+0x53/0xa0
> Feb 16 19:48:22 spare kernel: [ 1080.523227] [<ffffffff8100bf82>] ?
> system_call_fastpath+0x16/0x1b
>
> with kernel 2.6.31.5 (with patch
> http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=063c4c99630c0b06afad080d2a18bda64172c1a2):
> Feb 16 20:35:27 spare kernel: [ 1320.436213] INFO: task
> rgmanager:13795 blocked for more than 120 seconds.
> Feb 16 20:35:27 spare kernel: [ 1320.436277] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Feb 16 20:35:27 spare kernel: [ 1320.436352] rgmanager D
> 0000000000000000 0 13795 2247 0x00000000
> Feb 16 20:35:27 spare kernel: [ 1320.436357] ffff8801ae219000
> 0000000000000086 ffff88019293fd88 ffff8801a1cfbe90
> Feb 16 20:35:27 spare kernel: [ 1320.436360] 0000000000013f80
> 000000000000e168 ffff8801928a1000 ffff8801928a14b8
> Feb 16 20:35:27 spare kernel: [ 1320.436364] 0000000200000002
> 00000001000bf260 ffff8801ab843038 ffff8801928a14b8
> Feb 16 20:35:27 spare kernel: [ 1320.436367] Call Trace:
> Feb 16 20:35:27 spare kernel: [ 1320.436376] [<ffffffff813ea425>] ?
> __down_read+0x85/0xb5
> Feb 16 20:35:27 spare kernel: [ 1320.436389] [<ffffffffa052c970>] ?
> dlm_user_request+0x60/0x240 [dlm]
> Feb 16 20:35:27 spare kernel: [ 1320.436393] [<ffffffff81077aef>] ?
> wake_futex+0x3f/0x80
> Feb 16 20:35:27 spare kernel: [ 1320.436397] [<ffffffff810d4c40>] ?
> shmem_delete_inode+0x0/0x110
> Feb 16 20:35:27 spare kernel: [ 1320.436401] [<ffffffff8100caee>] ?
> invalidate_interrupt0+0xe/0x20
> Feb 16 20:35:27 spare kernel: [ 1320.436406] [<ffffffff810fc1cc>] ?
> __kmalloc+0x11c/0x250
> Feb 16 20:35:27 spare kernel: [ 1320.436414] [<ffffffffa05370f6>] ?
> device_write+0x686/0x790 [dlm]
> Feb 16 20:35:27 spare kernel: [ 1320.436418] [<ffffffff8105c7a3>] ?
> do_sigaction+0x1b3/0x1d0
> Feb 16 20:35:27 spare kernel: [ 1320.436421] [<ffffffff8105c691>] ?
> do_sigaction+0xa1/0x1d0
> Feb 16 20:35:27 spare kernel: [ 1320.436424] [<ffffffff81102e0b>] ?
> vfs_write+0xcb/0x1a0
> Feb 16 20:35:27 spare kernel: [ 1320.436427] [<ffffffff81102fe3>] ?
> sys_write+0x53/0xa0
> Feb 16 20:35:27 spare kernel: [ 1320.436430] [<ffffffff8100bf02>] ?
> system_call_fastpath+0x16/0x1b
>
>
>
More information about the Debian-ha-maintainers
mailing list