[Pkg-samba-maint] Bug#801690: Bug#801690: 'smbstatus -b' leads to broken ctdb cluster

Wed Nov 4 09:14:46 UTC 2015

Hi!

Thanks for getting back to me! :)

> > I recently upgraded a samba cluster from Wheezy (with Kernel, ctdb, samba
> > and glusterfs from backports) to Jessie. The cluster itself is way older
> > and basically always worked. Since the upgrade to Jessie 'smbstatus -b'
> > (almost always) just hangs the whole cluster; I need to interrupt the call
> > with ctrl+c (or run with 'timeout 2') to avoid a complete cluster lockup
> > leading to the other cluster nodes being banned and the node I run smbstatus
> > on to have ctdbd run at 100% load but not being able to recover.
> 
> How do you recover then? KILL-ing ctdbd?
Killing the loaded node is the easiest; manual unbanning of the other nodes
is still required. Combinations of enabling and disabling nodes may fix the
situation too.

> > Calling 'smbstatus --locks' and 'smbstatus --shares' works just fine.
> 
> Have you tried which of --processes, --notify hangs? Does it hangs
> with "-b --fast"?
Ah, I missed that: '--brief --fast' works just fine. So obviously the
validation does not work...

> > 'strace'ing ctdbd leads to a massive amount of these messages:
> >   | write(58,"\240\4\0\0BDTC\1\0\0\0\215U\336\25\5\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> >   |                          1184) = -1 EAGAIN (Resource temporarily unavailable)
> 
> fd 58 is probably the ctdb socket. Can you confirm?
Right.

> To have more usefull info, can you install gdb, ctdb-dbg and samba-dbg
> and send the stacktrace of ctdbd at the write?
Ok, I will report back the stack traces in a few days (I'm afraid I can
only do these during the weekend).

All the best,
	Adi
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 827 bytes
Desc: Digital signature
URL: <http://lists.alioth.debian.org/pipermail/pkg-samba-maint/attachments/20151104/f899bec7/attachment.sig>