[Pkg-samba-maint] Bug#801690: 'smbstatus -b' leads to broken ctdb cluster

Adi Kriegisch adi at kriegisch.at
Tue Oct 13 13:44:03 UTC 2015


Package: ctdb
Version: 2.5.4+debian0-4

Dear maintainers,

I recently upgraded a samba cluster from Wheezy (with Kernel, ctdb, samba
and glusterfs from backports) to Jessie. The cluster itself is way older
and basically always worked. Since the upgrade to Jessie 'smbstatus -b'
(almost always) just hangs the whole cluster; I need to interrupt the call
with ctrl+c (or run with 'timeout 2') to avoid a complete cluster lockup
leading to the other cluster nodes being banned and the node I run smbstatus
on to have ctdbd run at 100% load but not being able to recover.

The cluster itself consists of three nodes sharing three cluster ips. The
only service ctdb manages is Samba. The lock file is located on a mirrored
glusterfs volume.

running and interrupting the hanging smbstatus leads to the following log
messages in /var/log/ctdb/log.ctdb:
  | 2015/10/13 15:09:24.923002 [19378]: Starting traverse on DB
  |                  smbXsrv_session_global.tdb (id 2592646)
  | 2015/10/13 15:09:25.505302 [19378]: server/ctdb_traverse.c:644 Traverse
  |                  cancelled by client disconnect for database:0x6b06a26d
  | 2015/10/13 15:09:25.505492 [19378]: Could not find idr:2592646
  | [...]
  | 2015/10/13 15:09:25.507553 [19378]: Could not find idr:2592646

'ctdb getdbmap' lists that database, but also lists a second entry for
smbXsrv_session_global.tdb:
  | dbid:0x521b7544 name:smbXsrv_version_global.tdb path:/var/lib/ctdb/smbXsrv_version_global.tdb.0
  | dbid:0x6b06a26d name:smbXsrv_session_global.tdb path:/var/lib/ctdb/smbXsrv_session_global.tdb.0
(I have no idea if that has always been the case or if that happened after
the upgrade).

Calling 'smbstatus --locks' and 'smbstatus --shares' works just fine.
'strace'ing ctdbd leads to a massive amount of these messages:
  | write(58,"\240\4\0\0BDTC\1\0\0\0\215U\336\25\5\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
  |                          1184) = -1 EAGAIN (Resource temporarily unavailable)

Running 'ctdb_diagnostics' is only possible shortly after  the cluster is
started (ie. while smbstatus -b works) and yields the following messages:
  | ERROR[1]: /etc/krb5.conf is missing on node 0
  | ERROR[2]: File /etc/hosts is different on node 1
  | ERROR[3]: File /etc/hosts is different on node 2
  | ERROR[4]: File /etc/samba/smb.conf is different on node 1
  | ERROR[5]: File /etc/samba/smb.conf is different on node 2
  | ERROR[6]: File /etc/fstab is different on node 1
  | ERROR[7]: File /etc/fstab is different on node 2
  | ERROR[8]: /etc/multipath.conf is missing on node 0
  | ERROR[9]: /etc/pam.d/system-auth is missing on node 0
  | ERROR[10]: /etc/default/nfs is missing on node 0
  | ERROR[11]: /etc/exports is missing on node 0
  | ERROR[12]: /etc/vsftpd/vsftpd.conf is missing on node 0
  | ERROR[13]: Optional file /etc/ctdb/static-routes is not present on node 0
'/etc/hosts' differs in some newlines and comments while 'smb.conf' only
has some different log levels on the nodes. The rest of the messages does
not affect ctdb as it only manages samba.

Feel free to ask if you need any more information.

-- Adi
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 827 bytes
Desc: Digital signature
URL: <http://lists.alioth.debian.org/pipermail/pkg-samba-maint/attachments/20151013/3dad6197/attachment.sig>


More information about the Pkg-samba-maint mailing list