[Debian-ha-maintainers] again: "redhat-cluster: services are not relocated when a node fails"

Guido Günther agx at sigxcpu.org
Thu Nov 19 14:15:04 UTC 2009


On Thu, Nov 19, 2009 at 01:47:39PM +0100, Guido Günther wrote:
> Hi Ernesto,
> On Wed, Nov 18, 2009 at 02:30:57PM +0100, Ernesto Rodriguez Reina wrote:
> > Hi everyone!
> > 
> > I recently start using RHCS for a project I'm working on but I found
> > that RHCS2 in Debian Lenny do not relocate services when a node fails.
> > I found the thread [1] where Guido Günther says that this problem was
> > solved on RHCS 3.0.2. Then I downloaded and installed RHCS 3.0.4 (the
> > deb packages from debian mirror) and reproduced the experiment of
> > Martin Waite and again the service was not relocated on node fail.
> > Does someone had make it work as it should in Debian? Martin, or Guido
> > or anybody can you please help me to find out why it is not working as
> > it should?
> I checked with RHCS 3.0.4 as it's currently in unstable rebuilt for
> Lenny. The kernel enters a soft lock after I shut off one node (see
> attached log) and no resource takeover happens. Fabione, any idea what
> triggers this?
And here's a trace with 2.6.31:

Nov 19 15:04:26 testfoo1 rgmanager[1709]: State change: testfoo2.foo.bar DOWN
Nov 19 15:04:26 testfoo1 kernel: [  117.438037] ------------[ cut here ]------------
Nov 19 15:04:26 testfoo1 kernel: [  117.440175] kernel BUG at /build/buildd/linux-2.6-2.6.31/debian/build/source_amd64_none/fs/inode.c:1323!
Nov 19 15:04:26 testfoo1 kernel: [  117.440309] invalid opcode: 0000 [#1] SMP
Nov 19 15:04:26 testfoo1 kernel: [  117.440309] last sysfs file: /sys/devices/virtual/input/input0/capabilities/sw
Nov 19 15:04:26 testfoo1 kernel: [  117.440309] CPU 0
Nov 19 15:04:26 testfoo1 kernel: [  117.440309] Modules linked in: nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs dlm configfs bridge stp dm_multipath sc
si_dh loop virtio_console virtio_balloon evdev serio_raw psmouse button snd_pcm snd_timer snd soundcore snd_page_alloc pcspkr processor i2c_piix4 i2c_cor
e ext3 jbd mbcache dm_mirror dm_region_hash dm_log dm_snapshot dm_mod ide_cd_mod cdrom ata_generic libata scsi_mod virtio_blk piix ide_pci_generic e1000
uhci_hcd ide_core floppy virtio_pci virtio_ring virtio thermal fan thermal_sys
Nov 19 15:04:26 testfoo1 kernel: [  117.460215] Pid: 1718, comm: dlm_send Not tainted 2.6.31-1-amd64 #1
Nov 19 15:04:26 testfoo1 kernel: [  117.460215] RIP: 0010:[<ffffffff811249ae>]  [<ffffffff811249ae>] iput+0x27/0x9e
Nov 19 15:04:26 testfoo1 kernel: [  117.460215] RSP: 0018:ffff88001bd53cb0  EFLAGS: 00010246
Nov 19 15:04:26 testfoo1 kernel: [  117.460215] RAX: 0000000000000000 RBX: ffff88001c1aac88 RCX: ffff88001c1aac88
Nov 19 15:04:26 testfoo1 kernel: [  117.460215] RDX: 0000000000000004 RSI: 00000000f38c67d6 RDI: ffff88001c1aac88
Nov 19 15:04:26 testfoo1 kernel: [  117.460215] RBP: ffff88001bd2fed0 R08: 00000000f38c67d6 R09: 00000000f38c67d6
Nov 19 15:04:26 testfoo1 kernel: [  117.460215] R10: 00000000f38c67d6 R11: ffffffff812b84ca R12: ffff88001bd2fee0
Nov 19 15:04:26 testfoo1 kernel: [  117.460215] R13: ffff88001bd2ff00 R14: ffff88003785e080 R15: ffff88001bd2ff88
Nov 19 15:04:26 testfoo1 kernel: [  117.460215] FS:  0000000041bf2950(0000) GS:ffff880001702000(0000) knlGS:0000000000000000
Nov 19 15:04:26 testfoo1 kernel: [  117.460215] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Nov 19 15:04:26 testfoo1 kernel: [  117.460215] CR2: 00000000010d10e8 CR3: 000000001bd4a000 CR4: 00000000000006f0
Nov 19 15:04:26 testfoo1 kernel: [  117.460215] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 19 15:04:26 testfoo1 kernel: [  117.460215] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Nov 19 15:04:26 testfoo1 kernel: [  117.460215] Process dlm_send (pid: 1718, threadinfo ffff88001bd52000, task ffff88003785e080)
Nov 19 15:04:26 testfoo1 kernel: [  117.460215] Stack:
Nov 19 15:04:26 testfoo1 kernel: [  117.460215]  00000000f38c67d6 00000000f38c67d6 0000000000000000 ffffffffa02aa003
Nov 19 15:04:26 testfoo1 kernel: [  117.460215] <0> ffff88001c1aac40 ffff88001bd6b510 00000000f38c67d6 ffff880001716870
Nov 19 15:04:26 testfoo1 kernel: [  117.460215] <0> ffff88003785e0b8 ffff880001716800 00000000f38c67d6 ffffffff810106ac
Nov 19 15:04:26 testfoo1 kernel: [  117.460215] Call Trace:
Nov 19 15:04:26 testfoo1 kernel: [  117.460215]  [<ffffffffa02aa003>] ? tcp_connect_to_sock+0x1f9/0x260 [dlm]
Nov 19 15:04:26 testfoo1 kernel: [  117.460215]  [<ffffffff810106ac>] ? __switch_to+0xbe/0x28b
Nov 19 15:04:26 testfoo1 kernel: [  117.460215]  [<ffffffff8104b6bd>] ? set_next_entity+0x48/0x82
Nov 19 15:04:26 testfoo1 kernel: [  117.460215]  [<ffffffff8104ccbb>] ? pick_next_task_fair+0xb4/0xd3
Nov 19 15:04:26 testfoo1 kernel: [  117.460215]  [<ffffffffa02ab1c3>] ? process_send_sockets+0x41/0x1f3 [dlm]
Nov 19 15:04:26 testfoo1 kernel: [  117.460215]  [<ffffffff81072ce6>] ? __queue_work+0x35/0x5c
Nov 19 15:04:26 testfoo1 kernel: [  117.460215]  [<ffffffffa02ab182>] ? process_send_sockets+0x0/0x1f3 [dlm]
Nov 19 15:04:26 testfoo1 kernel: [  117.460215]  [<ffffffff81071d48>] ? worker_thread+0x182/0x234
Nov 19 15:04:26 testfoo1 kernel: [  117.460215]  [<ffffffff810775e2>] ? autoremove_wake_function+0x0/0x59
Nov 19 15:04:26 testfoo1 kernel: [  117.460215]  [<ffffffff81071bc6>] ? worker_thread+0x0/0x234
Nov 19 15:04:26 testfoo1 kernel: [  117.460215]  [<ffffffff8107715b>] ? kthread+0x9b/0xa3
Nov 19 15:04:26 testfoo1 kernel: [  117.460215]  [<ffffffff81012f2a>] ? child_rip+0xa/0x20
Nov 19 15:04:26 testfoo1 kernel: [  117.460215]  [<ffffffff810356b4>] ? pvclock_clocksource_read+0x4a/0x98
Nov 19 15:04:26 testfoo1 kernel: [  117.460215]  [<ffffffff810770c0>] ? kthread+0x0/0xa3
Nov 19 15:04:26 testfoo1 kernel: [  117.460215]  [<ffffffff81012f20>] ? child_rip+0x0/0x20




More information about the Debian-ha-maintainers mailing list