[Debian-ha-maintainers] Bug#722339: crmd: number of connections to pengine socket increasing, , exhausting max_open_files after some time

Christian Eichelmann christian.eichelmann at 1und1.de
Tue Sep 10 11:51:40 UTC 2013


Package: pacemaker
Version: 1.1.7-1
Severity: important
Tags: upstream

Dear Maintainer,

we are using pacemaker/corosync for building our clusters. In addition,
we use puppet to build our systems.
Since we switched from squeeze to wheezy, we found a serious problem
within the crmd process.
The problem can be triggered like this:

* requirement: a working pacemaker/corosync cluster

1. Login to all nodes of the cluster
2. run 'watch -n1 "lsof -p `pidof crmd` | grep socket | wc -l"' on all
nodes to see the number of open sockets of the crmd process
3. choose one node of the cluster and save the configuration with 'crm
configure save /tmp/config.bak'
4. update the configuration with the saved file: 'crm configure load
update /tmp/config.bak'
5. The number of open socket of at least one node should increase now

The problem is, that puppet triggers the last command on every run (in
our case every 10 minutes). The number of sockets keep increasing
till max_open_files is reached (usually 1024). After that, the cluster
behaves unexpectedly. In our case, it lost all of it's resources till
the next puppet run. I testet the above with a squeeze installation and
the problem did not appear. We found the same problem on RedHat systems
using the same pacemaker version (1.1.7).

As a workaround, we disabled the pacemaker module in puppet. But I think
this is a critical problem, since the pacemaker cluster who should
provide high availability can cause a serious downtime.

Regards,
Christian

-- System Information:
Debian Release: 7.1
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 3.2.0-4-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages pacemaker depends on:
ii  adduser           3.113+nmu3
ii  corosync          1.4.2-3
ii  libbz2-1.0        1.0.6-4
ii  libc6             2.13-38
ii  libcfg4           1.4.2-3
ii  libcib1           1.1.7-1
ii  libconfdb4        1.4.2-3
ii  libcoroipcc4      1.4.2-3
ii  libcpg4           1.4.2-3
ii  libcrmcluster1    1.1.7-1
ii  libcrmcommon2     1.1.7-1
ii  libesmtp6         1.0.6-1+b1
ii  libglib2.0-0      2.33.12+really2.32.4-5
ii  libgnutls26       2.12.20-7
ii  liblrm2           1.0.9+hg2665-1
ii  libltdl7          2.4.2-1.1
ii  libncurses5       5.9-10
ii  libpam0g          1.1.3-7.1
ii  libpe-rules2      1.1.7-1
ii  libpe-status3     1.1.7-1
ii  libpengine3       1.1.7-1
ii  libpils2          1.0.9+hg2665-1
ii  libplumb2         1.0.9+hg2665-1
ii  libsnmp15         5.4.3~dfsg-2.7
ii  libssl1.0.0       1.0.1e-2
ii  libstonithd1      1.1.7-1
ii  libtinfo5         5.9-10
ii  libtransitioner1  1.1.7-1
ii  libuuid1          2.20.1-5.3
ii  libxml2           2.8.0+dfsg1-7+nmu1
ii  libxslt1.1        1.1.26-14.1
ii  python            2.7.3-4
ii  python2.7         2.7.3-6
ii  resource-agents   1:3.9.2-5+deb7u1

pacemaker recommends no packages.

pacemaker suggests no packages.

-- no debconf information



More information about the Debian-ha-maintainers mailing list