[Debian-ha-maintainers] [Linux-HA] pacemaker with heartbeat on Debian Wheezy reboots the node reproducable when putting into maintance mode because of a /usr/lib/heartbeat/crmd crash

Andrew Beekhof andrew at beekhof.net
Tue Aug 6 21:11:20 UTC 2013


On 06/08/2013, at 2:29 AM, Thomas Glanzmann <thomas at glanzmann.de> wrote:

> Hello Andrew,
> 
>> You will need to run crm_report and email us the resulting tarball.
>> This will include the version of the software you're running and log
>> files (both system and cluster) - without which we can't do anything.
> 
> Find the files here:
> 
> I manually packaged it because crm_report output was empty.

I can try and fix that if you re-run with -x and paste the output.

> If I forget
> something, please let me know. I included the daemon syslog output from
> both nodes from the central syslog server and the crm file, the ha.cf
> which is the same on both nodes and the /var/lib/heartbeat directory
> which seems to keep all files from the first node.

I can't do anything with the core file I'm afraid.
I don't run debian at all, let alone that particular version with the same binaries, libraries and symbols as you.
Without those, the core file is meaningless (which is why crm_report generates backtraces).

> 
> The reason for the crash in unmanaged mode seems to be the same as
> before:
> 
> Aug  4 18:50:27 apache-03 crmd: [29398]: ERROR: crm_abort: abort_transition_graph: Triggered assert at te_utils.c:339 : transition_graph != NULL

That shouldn't have resulted in a crash.

I see a lot of this though:

Aug  4 18:50:27 apache-03 crmd: [29398]: ERROR: lrm_get_rsc(666): failed to send a getrsc message to lrmd via ch_cmd channel.
Aug  4 18:50:27 apache-03 crmd: [29398]: ERROR: lrm_get_rsc(666): failed to send a getrsc message to lrmd via ch_cmd channel.
Aug  4 18:50:27 apache-03 crmd: [29398]: ERROR: lrm_add_rsc(870): failed to send a addrsc message to lrmd via ch_cmd channel.
Aug  4 18:50:27 apache-03 crmd: [29398]: ERROR: lrm_get_rsc(666): failed to send a getrsc message to lrmd via ch_cmd channel.
Aug  4 18:50:27 apache-03 crmd: [29398]: ERROR: get_lrm_resource: Could not add resource nfs-common to LRM

Which looks more concerning.

I would _really_ recommend upgrading to something a little more recent.
And it might be time to get off heartbeat while you're at it.

> 
> Probably I should update it.
> 
> But why the config got lost, I have no idea what went wrong here.
> 
> https://thomas.glanzmann.de/tmp/linux_ha_crash.2013-08-05.tar.gz
> 
> Cheers,
>        Thomas
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems




More information about the Debian-ha-maintainers mailing list