[Debian-ha-maintainers] again: "redhat-cluster: services are not relocated when a node fails"

Guido Günther agx at sigxcpu.org
Thu Nov 19 12:47:39 UTC 2009


Hi Ernesto,
On Wed, Nov 18, 2009 at 02:30:57PM +0100, Ernesto Rodriguez Reina wrote:
> Hi everyone!
> 
> I recently start using RHCS for a project I'm working on but I found
> that RHCS2 in Debian Lenny do not relocate services when a node fails.
> I found the thread [1] where Guido Günther says that this problem was
> solved on RHCS 3.0.2. Then I downloaded and installed RHCS 3.0.4 (the
> deb packages from debian mirror) and reproduced the experiment of
> Martin Waite and again the service was not relocated on node fail.
> Does someone had make it work as it should in Debian? Martin, or Guido
> or anybody can you please help me to find out why it is not working as
> it should?
I checked with RHCS 3.0.4 as it's currently in unstable rebuilt for
Lenny. The kernel enters a soft lock after I shut off one node (see
attached log) and no resource takeover happens. Fabione, any idea what
triggers this?
Cheers,
 -- Guido
-------------- next part --------------
Nov 19 13:19:33 testfoo2 corosync[1969]:   [CLM   ] CLM CONFIGURATION CHANGE
Nov 19 13:19:33 testfoo2 corosync[1969]:   [CLM   ] New Configuration:
Nov 19 13:19:33 testfoo2 corosync[1969]:   [CLM   ] #011r(0) ip(192.168.1.3)
Nov 19 13:19:33 testfoo2 corosync[1969]:   [CLM   ] #011r(0) ip(192.168.1.4)
Nov 19 13:19:33 testfoo2 corosync[1969]:   [CLM   ] Members Left:
Nov 19 13:19:33 testfoo2 corosync[1969]:   [CLM   ] Members Joined:
Nov 19 13:19:33 testfoo2 corosync[1969]:   [TOTEM ] A processor joined or left the membership and a new membership was formed.
Nov 19 13:19:33 testfoo2 kernel: [  256.244490] dlm: closing connection to node 1
Nov 19 13:19:33 testfoo2 rgmanager[2165]: State change: testfoo1.bar.com DOWN
Nov 19 13:19:33 testfoo2 corosync[1969]:   [MAIN  ] Completed service synchronization, ready to provide service.
Nov 19 13:20:38 testfoo2 kernel: [  321.052015] Pid: 2174, comm: dlm_send Not tainted 2.6.26-2-amd64 #1
Nov 19 13:20:38 testfoo2 kernel: [  321.052015] RIP: 0010:[<ffffffff8042979d>]  [<ffffffff8042979d>] _spin_unlock_irqrestore+0x7/0xe
Nov 19 13:20:38 testfoo2 kernel: [  321.052015] RSP: 0018:ffff81001c5ffea8  EFLAGS: 00000202
Nov 19 13:20:38 testfoo2 kernel: [  321.052015] RAX: ffff81001c8616e0 RBX: ffffffffa01ee79b RCX: 0000000000000000
Nov 19 13:20:38 testfoo2 kernel: [  321.052015] RDX: 0000000300000000 RSI: 0000000000000202 RDI: 0000000000000202
Nov 19 13:20:38 testfoo2 kernel: [  321.052015] RBP: ffff81001c8616d8 R08: ffff81001c8616c8 R09: ffff81001c4e2ba0
Nov 19 13:20:38 testfoo2 kernel: [  321.052015] R10: ffff81001c5ffdd0 R11: ffff81001c5ffdd0 R12: 0000000300000000
Nov 19 13:20:38 testfoo2 kernel: [  321.052015] R13: ffff81001c5ffdd0 R14: ffff81001c4e2b00 R15: 0000000000000000
Nov 19 13:20:38 testfoo2 kernel: [  321.052015] FS:  0000000041af1950(0000) GS:ffffffff8053b000(0000) knlGS:0000000000000000
Nov 19 13:20:38 testfoo2 kernel: [  321.052015] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Nov 19 13:20:38 testfoo2 kernel: [  321.052015] CR2: 00007f9b98000010 CR3: 000000001c592000 CR4: 00000000000006e0
Nov 19 13:20:38 testfoo2 kernel: [  321.052015] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 19 13:20:38 testfoo2 kernel: [  321.052015] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Nov 19 13:20:38 testfoo2 kernel: [  321.052015]
Nov 19 13:20:38 testfoo2 kernel: [  321.052015] Call Trace:
Nov 19 13:20:38 testfoo2 kernel: [  321.052015]  [<ffffffff802435e0>] ? queue_work+0x37/0x40
Nov 19 13:20:38 testfoo2 kernel: [  321.052015]  [<ffffffff802430c8>] ? run_workqueue+0x82/0x111
Nov 19 13:20:38 testfoo2 kernel: [  321.052015]  [<ffffffff80243995>] ? worker_thread+0xd5/0xe0
Nov 19 13:20:38 testfoo2 kernel: [  321.052015]  [<ffffffff802461c5>] ? autoremove_wake_function+0x0/0x2e
Nov 19 13:20:38 testfoo2 kernel: [  321.052015]  [<ffffffff802438c0>] ? worker_thread+0x0/0xe0
Nov 19 13:20:38 testfoo2 kernel: [  321.052015]  [<ffffffff8024609f>] ? kthread+0x47/0x74
Nov 19 13:20:38 testfoo2 kernel: [  321.052015]  [<ffffffff802301ec>] ? schedule_tail+0x27/0x5c
Nov 19 13:20:38 testfoo2 kernel: [  321.052015]  [<ffffffff8020cf28>] ? child_rip+0xa/0x12
Nov 19 13:20:38 testfoo2 kernel: [  321.052015]  [<ffffffff80246058>] ? kthread+0x0/0x74
Nov 19 13:20:38 testfoo2 kernel: [  321.052015]  [<ffffffff8020cf1e>] ? child_rip+0x0/0x12
Nov 19 13:20:38 testfoo2 kernel: [  321.052015]



More information about the Debian-ha-maintainers mailing list