[Debian-ha-maintainers] again: "redhat-cluster: services are not relocated when a node fails"

Fabio M. Di Nitto fdinitto at redhat.com
Thu Nov 19 22:01:01 UTC 2009


Guido Günther wrote:
> Hi Ernesto,
> On Wed, Nov 18, 2009 at 02:30:57PM +0100, Ernesto Rodriguez Reina wrote:
>> Hi everyone!
>>
>> I recently start using RHCS for a project I'm working on but I found
>> that RHCS2 in Debian Lenny do not relocate services when a node fails.
>> I found the thread [1] where Guido Günther says that this problem was
>> solved on RHCS 3.0.2. Then I downloaded and installed RHCS 3.0.4 (the
>> deb packages from debian mirror) and reproduced the experiment of
>> Martin Waite and again the service was not relocated on node fail.
>> Does someone had make it work as it should in Debian? Martin, or Guido
>> or anybody can you please help me to find out why it is not working as
>> it should?

> I checked with RHCS 3.0.4 as it's currently in unstable rebuilt for
> Lenny. The kernel enters a soft lock after I shut off one node (see
> attached log) and no resource takeover happens. Fabione, any idea what
> triggers this?

since you guys are running cluster 3.0.4, please do the following:

1) add <logging debug="on"/> in cluster.conf

<cluster...
 <logging debug="on"/>
...

2) reproduce the above scenario, then collect all the logs, from all
daemons, from all nodes from /var/log/cluster (this is upstream default,
check with Debian if they have changed it please).

then I´d like to see your cluster.conf and have a better idea on how a
node is "killed". If cluster.conf contains sensitive data such as
passwords, either blank them or send the file to me only. I´ll keep it
confidential but please do NOT randomly mangle the configuration to hide
bits.

The recovery operation is strictly dependent on different things. The
configuration and the logs should be able to tell us something.

Thanks
Fabio




More information about the Debian-ha-maintainers mailing list