[Pkg-openldap-devel] Bug#827135: Bug#827135: slapd won't stop (shutdown) on multi-core system under stress

Sat Jul 29 02:58:30 UTC 2017

Control: tag -1 moreinfo unreproducible

Hi Zvika,

My apologies for taking so long to get back to you on this.

On Sun, Jun 12, 2016 at 07:37:36PM +0000, Zvika Ferentz wrote:
>How to reproduced it:
>-------------------------------
>I guess that there are a few ways to reproduce it , I managed to easily reproduce it with two terminates - one producing "ldapsearch" stress and the other restarting slapd :
>- Open two terminals.
>- On terminal #1 i'm just manually running  "slapd restart" commands:
>   # /etc/init.d/slapd status ; /etc/init.d/slapd restart
>- On terminal #2 i'm running a infinite loops of simple "ldapsearch" (100
>concurrent processes running loops of ldapsearch). Terminal #2 is trying to
>simulate many concurrent read operations. see "more information" later for the
>exact scripts that i used.
>
>Incorrect behavior:
>-------------------------
>The "slapd restart" works a few times, and then the "stop" operation fails.
>The stop continues to fail even if i stop all "stress" and terminate all
>ldapsearch/connections  (CPU is 99% idle !)
>
>Expected Behavior:
>-------------------------
>All slapd stop/restart operations complete successfully
>
>
>More Information (optional - my exact scripts):
>----------------------------------------------------------------
>On terminal#2 i used a very simple script to generate a "read only" stress:
># cat > ldaploop.sh << EOF
>#!/bin/sh
>while true ; do  ldapsearch -x -Z ; done
>EOF
>
># cat > manyloops.sh << "EOF"
>#!/bin/sh
>for i in `seq 1 100` ; do ( ./ldaploop.sh &) ; done
>EOF
>
>As previously mentioned, i ran the "manyloops.sh" to generate 100 running
>processes where each one simply runs "ldapsearch" (locally).

Thanks a lot for the detailed steps to reproduce. I got access to a VM 
with 16 CPUs where I could try this. It doesn't have a wheezy chroot any 
longer, but I tried the jessie version (2.4.40+dfsg-1+deb8u3).

I'm afraid I have not been able to trigger any hangs, even using your 
exact scripts and after restarting slapd many times.

I'm testing with the following, very simple, config:

include /etc/ldap/schema/core.schema
include /etc/ldap/schema/cosine.schema
include /etc/ldap/schema/nis.schema
include /etc/ldap/schema/inetorgperson.schema

tlscertificatefile ssl-cert-snakeoil.pem
tlscertificatekeyfile ssl-cert-snakeoil.key

moduleload back_mdb

database mdb
suffix dc=example,dc=com
directory db
index objectClass eq

and a database of 1000 entries. I tried both the hdb and mdb backends.

Do you still encounter this bug on jessie or stretch? Is there more to 
your configuration than the simple config I posted, that might be 
relevant?

If you can still reproduce the bug, it would be great if you could 
install slapd-dbg and libldap-2.4-2-dbg, cause slapd to hang, and then 
capture a backtrace with gdb while it's stuck:

gdb -p $(pidof slapd)
thread apply all bt

thanks,
Ryan