[Pkg-openldap-devel] Bug#877166: slapd: out-of-sync issue with n-way delta-syncrepl mmr

Sven Mäder maeder at phys.ethz.ch
Fri Sep 29 11:01:54 UTC 2017


Package: slapd
Version: 2.4.44+dfsg-5+deb9u1
Severity: important

Dear Maintainer,

I noticed an issue related to delta-syncrepl in a 3-way multi-mastersetup.
The issue seems to be caused by a bug in upstream. A patch isavailable
upstream for version 2.4.45.The version2.4.45+dfsg-1~bpo9+1 is also
affected.

Upstream ITS:
ITS#8444: Out-of-sync issue with memberOf overlay, Delta-syncreplMMR and
>2 nodes
https://www.openldap.org/its/index.cgi/Software%20Bugs?id=8444;selectid=8444

Upstream patch:
ftp://ftp.openldap.org/incoming/Ondrej-Kuznik-20170825-ITS8444.patch
 
I tested a build of openldap-2.4.45+dfsg and openldap-2.4.44+dfsg with the
patchapplied. The issue is resolved in both versions.

Please include the upstream patch in stretch and stretch-backports.


Description:

The issueis related to the cn=config attribute:

    olcSpCheckpoint <ops> <minutes>

On the first ldap_modify operation after a checkpoint has occured, the
contextCSN values do not get updated in the accesslog database (provider),
while the contextCSN values in the corresponding primary database (consumer)
are updated successfully. This leaves slapd in an inconsistent state, where
consumer and provider are out of sync. No data is lost, but it is applied
multiple times, depending on the number of masters, due to a replication
loop caused by the outdated contextCSN values in the accesslog database.
If slapd is restartet in this out-of-sync state, the replication is broken.


Log output after the ldap_modify operation:

sid=000:

conn=656113 op=1 MOD
dn="cn=phd-test-aa1,ou=hb,ou=syncrepl,dc=phys,dc=ethz,dc=ch"
conn=656113 op=1 MOD attr=heartbeatTimestamp
slap_queue_csn: queueing 0x7f2fcd418840
20170929065340.101419Z#000000#000#000000
slap_graduate_commit_csn: removing 0x7f2fcd418840
20170929065340.101419Z#000000#000#000000
conn=656113 op=1 RESULT tag=103 err=0 text=
syncprov_sendresp: to=002, cookie=rid=004
syncprov_sendresp: to=001, cookie=rid=004
conn=656113 op=2 UNBIND
conn=656113 fd=28 closed
do_syncrep2: rid=006 cookie=rid=006,sid=002
slap_queue_csn: queueing 0x7f2fc5417c60
20170929065340.101419Z#000000#000#000000
slap_graduate_commit_csn: removing 0x7f2fc5417c60
20170929065340.101419Z#000000#000#000000
syncrepl_message_to_op: rid=006 be_modify
cn=phd-test-aa1,ou=hb,ou=syncrepl,dc=phys,dc=ethz,dc=ch (0)
do_syncrep2: rid=005 cookie=rid=005,sid=001
slap_queue_csn: queueing 0x7f2fe0012210
20170929065340.101419Z#000000#000#000000
slap_graduate_commit_csn: removing 0x7f2fe0012210
20170929065340.101419Z#000000#000#000000
syncrepl_message_to_op: rid=005 be_modify
cn=phd-test-aa1,ou=hb,ou=syncrepl,dc=phys,dc=ethz,dc=ch (0)

sid=001:

do_syncrep2: rid=004 cookie=rid=004
slap_queue_csn: queueing 0x7f1f6415de10
20170929065340.101419Z#000000#000#000000
slap_graduate_commit_csn: removing 0x7f1f6415de10
20170929065340.101419Z#000000#000#000000
do_syncrep2: rid=006 cookie=rid=006,sid=002
slap_queue_csn: queueing 0x7f1f58157630
20170929065340.101419Z#000000#000#000000
slap_graduate_commit_csn: removing 0x7f1f58157630
20170929065340.101419Z#000000#000#000000
syncrepl_message_to_op: rid=006 be_modify
cn=phd-test-aa1,ou=hb,ou=syncrepl,dc=phys,dc=ethz,dc=ch (0)
syncprov_sendresp: to=002, cookie=rid=005,sid=001
syncrepl_message_to_op: rid=004 be_modify
cn=phd-test-aa1,ou=hb,ou=syncrepl,dc=phys,dc=ethz,dc=ch (0)
syncprov_sendresp: cookie=rid=005,sid=001

sid=002:

do_syncrep2: rid=004 cookie=rid=004
slap_queue_csn: queueing 0x7f87e8138190
20170929065340.101419Z#000000#000#000000
slap_graduate_commit_csn: removing 0x7f87e8138190
20170929065340.101419Z#000000#000#000000
syncrepl_message_to_op: rid=004 be_modify
cn=phd-test-aa1,ou=hb,ou=syncrepl,dc=phys,dc=ethz,dc=ch (0)
syncprov_sendresp: cookie=rid=006,sid=002
syncprov_sendresp: to=001, cookie=rid=006,sid=002
do_syncrep2: rid=005 cookie=rid=005,sid=001
slap_queue_csn: queueing 0x7f8800124620
20170929065340.101419Z#000000#000#000000
slap_graduate_commit_csn: removing 0x7f8800124620
20170929065340.101419Z#000000#000#000000
syncrepl_message_to_op: rid=005 be_modify
cn=phd-test-aa1,ou=hb,ou=syncrepl,dc=phys,dc=ethz,dc=ch (0)


Log output after slapd restart:

sid=000:

conn=1028 op=2 BIND dn="uid=ldap/phd-test-aa2.ethz.ch,cn=gssapi,cn=auth"
mech=GSSAPI sasl_ssf=56 ssf=56
conn=1028 op=2 RESULT tag=97 err=0 text=
conn=1028 op=3 SRCH base="cn=deltalog" scope=2 deref=0
filter="(&(objectClass=auditWriteObject)(reqResult=0))"
conn=1028 op=3 SRCH attr=reqDN reqType reqMod reqNewRDN reqDeleteOldRDN
reqNewSuperior entryCSN
conn=1028 op=3 SEARCH RESULT tag=101 err=53 nentries=0 text=consumer
state is newer than provider!
...
conn=1026 op=2 BIND dn="uid=ldap/phd-test-aa3.ethz.ch,cn=gssapi,cn=auth"
mech=GSSAPI sasl_ssf=56 ssf=56
conn=1026 op=2 RESULT tag=97 err=0 text=
conn=1026 op=3 SRCH base="cn=deltalog" scope=2 deref=0
filter="(&(objectClass=auditWriteObject)(reqResult=0))"
conn=1026 op=3 SRCH attr=reqDN reqType reqMod reqNewRDN reqDeleteOldRDN
reqNewSuperior entryCSN
conn=1026 op=3 SEARCH RESULT tag=101 err=53 nentries=0 text=consumer
state is newer than provider!

sid=001:

do_syncrep2: rid=004 LDAP_RES_SEARCH_RESULT
do_syncrep2: rid=004 LDAP_RES_SEARCH_RESULT (53) Server is unwilling to
perform
do_syncrep2: rid=004 (53) Server is unwilling to perform

sid=002:

do_syncrep2: rid=004 LDAP_RES_SEARCH_RESULT
do_syncrep2: rid=004 LDAP_RES_SEARCH_RESULT (53) Server is unwilling to
perform
do_syncrep2: rid=004 (53) Server is unwilling to perform


Upstream patch:

>From 1d960a78959fd460f894380b44bd5e866763a3af Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ond=C5=99ej=20Kuzn=C3=ADk?= <ondra at mistotebe.net>
Date: Fri, 25 Aug 2017 16:25:23 +0100
Subject: [PATCH] ITS#8444 Do not clear the pending operation when
 checkpointing

When a checkpoint happens, if we remove the CSN from the pending list,
accesslog won't pass it onto the accesslog DB. But in a delta-mmr
scenario, an accesslog entry without a CSN faces a race where it might
be applied twice - that usually fails and causes a full refresh, other
times it can cause a silent desync - both are undesirable.
---
 servers/slapd/overlays/syncprov.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/servers/slapd/overlays/syncprov.c
b/servers/slapd/overlays/syncprov.c
index e2f79f03a..2f0f5e8fe 100644
--- a/servers/slapd/overlays/syncprov.c
+++ b/servers/slapd/overlays/syncprov.c
@@ -1492,6 +1492,7 @@ syncprov_checkpoint( Operation *op, slap_overinst
*on )
    opm.o_bd->bd_info = on->on_info->oi_orig;
    opm.o_managedsait = SLAP_CONTROL_NONCRITICAL;
    opm.o_no_schema_check = 1;
+   opm.o_opid = -1;
    opm.o_bd->be_modify( &opm, &rsm );

    if ( rsm.sr_err == LDAP_NO_SUCH_OBJECT &&
--
2.14.1

--
Sven Mäder <maeder at phys.ethz.ch>



More information about the Pkg-openldap-devel mailing list