[Debian-ha-maintainers] Bug#596694: corosync fails to start corretly
Frank Schmidt
frank_schmidt at gmx.de
Mon Sep 13 13:46:09 UTC 2010
Package: corosync
Version: 1.2.1-1
Severity: important
Hi,
after a clean install of debian squeeze and increasing the consensus timeout to 3600 (to solve
#573030) corosync does not start correctly after boot. crm_mon is unable to connect to the cluster.
The processlist (ps auxf) shows the following:
root 773 0.3 1.3 128960 5136 ? Ssl 13:11 0:00 /usr/sbin/corosync
root 808 0.0 0.8 114484 3308 ? S 13:11 0:00 \_ /usr/sbin/corosync
root 809 0.0 0.8 114480 3308 ? S 13:11 0:00 \_ /usr/sbin/corosync
root 810 0.0 0.8 114480 3308 ? S 13:11 0:00 \_ /usr/sbin/corosync
root 811 0.0 0.8 114480 3308 ? S 13:11 0:00 \_ /usr/sbin/corosync
root 812 0.0 0.8 114480 3308 ? S 13:11 0:00 \_ /usr/sbin/corosync
root 813 0.0 0.8 114480 3308 ? S 13:11 0:00 \_ /usr/sbin/corosync
root 925 0.1 0.4 52416 1888 ? Sl 13:11 0:00 /usr/sbin/rsyslogd -c4
root 958 0.0 0.3 44568 1336 ? Ss 13:11 0:00 ha_logd: read process
root 959 0.0 0.2 44568 932 ? S 13:11 0:00 \_ ha_logd: write process
Killing the corosync processes and doing a
> /etc/init.d/corosync start
the processlist now shows
root 1422 0.0 1.3 146160 5268 ? Ssl 15:15 0:00 /usr/sbin/corosync
root 1433 0.0 3.1 79524 12036 ? SLs 15:15 0:00 \_ /usr/lib/heartbeat/stonithd
103 1434 0.0 1.2 82444 4892 ? S 15:15 0:00 \_ /usr/lib/heartbeat/cib
root 1435 0.0 0.6 83428 2372 ? S 15:15 0:00 \_ /usr/lib/heartbeat/lrmd
103 1436 0.0 0.8 83504 3112 ? S 15:15 0:00 \_ /usr/lib/heartbeat/attrd
103 1437 0.0 0.7 83876 2972 ? S 15:15 0:00 \_ /usr/lib/heartbeat/pengine
103 1438 0.0 0.9 89732 3612 ? S 15:15 0:00 \_ /usr/lib/heartbeat/crmd
and crm_mon works as expected.
The bug seems to be caused by corosync being started earlier than the syslog daemon during boot:
/etc/init.d/corosync:
#! /bin/sh
#
### BEGIN INIT INFO
# Provides: corosync
# Required-Start: $network $remote_fs
# Required-Stop: $network $remote_fs
# Default-Start: S
# Default-Stop: 0 1 6
# Short-Description: corosync cluster framework
### END INIT INFO
[...]
/etc/init.d/rsyslog:
#! /bin/sh
### BEGIN INIT INFO
# Provides: rsyslog
# Required-Start: $remote_fs $time
# Required-Stop: umountnfs $time
# X-Stop-After: sendsigs
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: enhanced syslogd
# Description: Rsyslog is an enhanced multi-threaded syslogd.
# It is quite compatible to stock sysklogd and can be
# used as a drop-in replacement.
### END INIT INFO
[...]
It seems that corosync must be started after the syslogd and perhaps even after ha_logd
(part of the package cluster-glue):
/etc/init.d/logd:
[...]
### BEGIN INIT INFO
# Description: ha_logd is a non-blocking logging daemon.
# It can log messages either to a file or through syslog
# daemon.
# Short-Description: ha_logd logging daemon
# Provides: ha_logd
# Required-Start: $network $syslog $remote_fs
# Required-Stop: $network $syslog $remote_fs
# X-Start-Before: heartbeat openais
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
### END INIT INFO
Here corosync seems to be missing in the line 'X-Start-Before'.
Greetings,
Frank
-- System Information:
Debian Release: squeeze/sid
APT prefers testing
APT policy: (500, 'testing')
Architecture: amd64 (x86_64)
Kernel: Linux 2.6.32-5-amd64 (SMP w/1 CPU core)
Locale: LANG=de_DE.UTF-8, LC_CTYPE=de_DE.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Versions of packages corosync depends on:
ii adduser 3.112 add and remove users and groups
ii libc6 2.11.2-2 Embedded GNU C Library: Shared lib
ii libcorosync4 1.2.1-1 Standards-based cluster framework
ii lsb-base 3.2-23.1 Linux Standard Base 3.2 init scrip
corosync recommends no packages.
corosync suggests no packages.
-- Configuration Files:
/etc/corosync/corosync.conf changed:
totem {
version: 2
# How long before declaring a token lost (ms)
token: 3000
# How many token retransmits before forming a new configuration
token_retransmits_before_loss_const: 10
# How long to wait for join messages in the membership protocol (ms)
join: 60
# How long to wait for consensus to be achieved before starting a new round of membership configuration (ms)
consensus: 3600
# Turn off the virtual synchrony filter
vsftype: none
# Number of messages that may be sent by one processor on receipt of the token
max_messages: 20
# Limit generated nodeids to 31-bits (positive signed integers)
clear_node_high_bit: yes
# Disable encryption
secauth: off
# How many threads to use for encryption/decryption
threads: 0
# Optionally assign a fixed node id (integer)
# nodeid: 1234
# This specifies the mode of redundant ring, which may be none, active, or passive.
rrp_mode: none
interface {
# The following values need to be set based on your environment
ringnumber: 0
bindnetaddr: 127.0.0.1
mcastaddr: 226.94.1.1
mcastport: 5405
}
}
amf {
mode: disabled
}
service {
# Load the Pacemaker Cluster Resource Manager
ver: 0
name: pacemaker
}
aisexec {
user: root
group: root
}
logging {
fileline: off
to_stderr: yes
to_logfile: no
to_syslog: yes
syslog_facility: daemon
debug: off
timestamp: on
logger_subsys {
subsys: AMF
debug: off
tags: enter|leave|trace1|trace2|trace3|trace4|trace6
}
}
/etc/default/corosync changed:
START=yes
-- no debconf information
More information about the Debian-ha-maintainers
mailing list