[Pkg-samba-maint] Bug#801690: Bug#801690: 'smbstatus -b' leads to broken ctdb cluster
math.parent at gmail.com
Mon Nov 2 06:24:29 UTC 2015
2015-10-13 15:44 GMT+02:00 Adi Kriegisch <adi at kriegisch.at>:
> Package: ctdb
> Version: 2.5.4+debian0-4
> Dear maintainers,
Sorry for my late reply.
> I recently upgraded a samba cluster from Wheezy (with Kernel, ctdb, samba
> and glusterfs from backports) to Jessie. The cluster itself is way older
> and basically always worked. Since the upgrade to Jessie 'smbstatus -b'
> (almost always) just hangs the whole cluster; I need to interrupt the call
> with ctrl+c (or run with 'timeout 2') to avoid a complete cluster lockup
> leading to the other cluster nodes being banned and the node I run smbstatus
> on to have ctdbd run at 100% load but not being able to recover.
How do you recover then? KILL-ing ctdbd?
> The cluster itself consists of three nodes sharing three cluster ips. The
> only service ctdb manages is Samba. The lock file is located on a mirrored
> glusterfs volume.
> running and interrupting the hanging smbstatus leads to the following log
> messages in /var/log/ctdb/log.ctdb:
> | 2015/10/13 15:09:24.923002 : Starting traverse on DB
> | smbXsrv_session_global.tdb (id 2592646)
> | 2015/10/13 15:09:25.505302 : server/ctdb_traverse.c:644 Traverse
> | cancelled by client disconnect for database:0x6b06a26d
> | 2015/10/13 15:09:25.505492 : Could not find idr:2592646
> | [...]
> | 2015/10/13 15:09:25.507553 : Could not find idr:2592646
> 'ctdb getdbmap' lists that database, but also lists a second entry for
> | dbid:0x521b7544 name:smbXsrv_version_global.tdb path:/var/lib/ctdb/smbXsrv_version_global.tdb.0
> | dbid:0x6b06a26d name:smbXsrv_session_global.tdb path:/var/lib/ctdb/smbXsrv_session_global.tdb.0
> (I have no idea if that has always been the case or if that happened after
> the upgrade).
> Calling 'smbstatus --locks' and 'smbstatus --shares' works just fine.
Have you tried which of --processes, --notify hangs? Does it hangs
with "-b --fast"?
> 'strace'ing ctdbd leads to a massive amount of these messages:
> | write(58,"\240\4\0\0BDTC\1\0\0\0\215U\336\25\5\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> | 1184) = -1 EAGAIN (Resource temporarily unavailable)
fd 58 is probably the ctdb socket. Can you confirm?
To have more usefull info, can you install gdb, ctdb-dbg and samba-dbg
and send the stacktrace of ctdbd at the write?
> Running 'ctdb_diagnostics' is only possible shortly after the cluster is
> started (ie. while smbstatus -b works) and yields the following messages:
> | ERROR: /etc/krb5.conf is missing on node 0
> | ERROR: File /etc/hosts is different on node 1
> | ERROR: File /etc/hosts is different on node 2
> | ERROR: File /etc/samba/smb.conf is different on node 1
> | ERROR: File /etc/samba/smb.conf is different on node 2
> | ERROR: File /etc/fstab is different on node 1
> | ERROR: File /etc/fstab is different on node 2
> | ERROR: /etc/multipath.conf is missing on node 0
> | ERROR: /etc/pam.d/system-auth is missing on node 0
> | ERROR: /etc/default/nfs is missing on node 0
> | ERROR: /etc/exports is missing on node 0
> | ERROR: /etc/vsftpd/vsftpd.conf is missing on node 0
> | ERROR: Optional file /etc/ctdb/static-routes is not present on node 0
> '/etc/hosts' differs in some newlines and comments while 'smb.conf' only
> has some different log levels on the nodes. The rest of the messages does
> not affect ctdb as it only manages samba.
Yes. Nothing relevant here.
> Feel free to ask if you need any more information.
More information about the Pkg-samba-maint