[Debian-ha-maintainers] [Linux-HA] pacemaker with heartbeat on Debian Wheezy reboots the node reproducable when putting into maintance mode because of a /usr/lib/heartbeat/crmd crash
Andrew Beekhof
andrew at beekhof.net
Tue Aug 6 21:11:20 UTC 2013
On 06/08/2013, at 2:29 AM, Thomas Glanzmann <thomas at glanzmann.de> wrote:
> Hello Andrew,
>
>> You will need to run crm_report and email us the resulting tarball.
>> This will include the version of the software you're running and log
>> files (both system and cluster) - without which we can't do anything.
>
> Find the files here:
>
> I manually packaged it because crm_report output was empty.
I can try and fix that if you re-run with -x and paste the output.
> If I forget
> something, please let me know. I included the daemon syslog output from
> both nodes from the central syslog server and the crm file, the ha.cf
> which is the same on both nodes and the /var/lib/heartbeat directory
> which seems to keep all files from the first node.
I can't do anything with the core file I'm afraid.
I don't run debian at all, let alone that particular version with the same binaries, libraries and symbols as you.
Without those, the core file is meaningless (which is why crm_report generates backtraces).
>
> The reason for the crash in unmanaged mode seems to be the same as
> before:
>
> Aug 4 18:50:27 apache-03 crmd: [29398]: ERROR: crm_abort: abort_transition_graph: Triggered assert at te_utils.c:339 : transition_graph != NULL
That shouldn't have resulted in a crash.
I see a lot of this though:
Aug 4 18:50:27 apache-03 crmd: [29398]: ERROR: lrm_get_rsc(666): failed to send a getrsc message to lrmd via ch_cmd channel.
Aug 4 18:50:27 apache-03 crmd: [29398]: ERROR: lrm_get_rsc(666): failed to send a getrsc message to lrmd via ch_cmd channel.
Aug 4 18:50:27 apache-03 crmd: [29398]: ERROR: lrm_add_rsc(870): failed to send a addrsc message to lrmd via ch_cmd channel.
Aug 4 18:50:27 apache-03 crmd: [29398]: ERROR: lrm_get_rsc(666): failed to send a getrsc message to lrmd via ch_cmd channel.
Aug 4 18:50:27 apache-03 crmd: [29398]: ERROR: get_lrm_resource: Could not add resource nfs-common to LRM
Which looks more concerning.
I would _really_ recommend upgrading to something a little more recent.
And it might be time to get off heartbeat while you're at it.
>
> Probably I should update it.
>
> But why the config got lost, I have no idea what went wrong here.
>
> https://thomas.glanzmann.de/tmp/linux_ha_crash.2013-08-05.tar.gz
>
> Cheers,
> Thomas
> _______________________________________________
> Linux-HA mailing list
> Linux-HA at lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
More information about the Debian-ha-maintainers
mailing list