King0770-Notes-ldap-fragmentation

Have you ever noticed ldap fragmentation on your ldap replica nodes before?


zimbra@ldap-replica002 ~]$ date;mdb_stat -a -e -f /opt/zimbra/data/ldap/mdb/db | grep "Free pages" | awk '{print $3 * 4096/1024/1024 " MB"}'
Tue Nov 20 12:12:34 MST 2018
3419.11 MB

[zimbra@ldap-replica002 ~]$ date;mdb_stat -a -e -f /opt/zimbra/data/ldap/mdb/db | grep "Free pages" | awk '{print $3 * 4096/1024/1024 " MB"}'
Tue Nov 20 12:14:16 MST 2018
3554.54 MB

[zimbra@ldap-replica002 ~]$ date;mdb_stat -a -e -f /opt/zimbra/data/ldap/mdb/db | grep "Free pages" | awk '{print $3 * 4096/1024/1024 " MB"}'
Tue Nov 20 12:15:14 MST 2018
3627.49 MB

[zimbra@ldap-replica002 ~]$ date;mdb_stat -a -e -f /opt/zimbra/data/ldap/mdb/db | grep "Free pages" | awk '{print $3 * 4096/1024/1024 " MB"}'
Tue Nov 20 12:16:19 MST 2018
3721.03 MB

[zimbra@ldap-replica002 ~]$ date;mdb_stat -a -e -f /opt/zimbra/data/ldap/mdb/db | grep "Free pages" | awk '{print $3 * 4096/1024/1024 " MB"}'
Tue Nov 20 12:19:13 MST 2018
3932.64 MB

[zimbra@ldap-replica002 ~]$ date;mdb_stat -a -e -f /opt/zimbra/data/ldap/mdb/db | grep "Free pages" | awk '{print $3 * 4096/1024/1024 " MB"}'
Tue Nov 20 12:20:24 MST 2018
4031.6 MB

[zimbra@ldap-replica002 ~]$ date;mdb_stat -a -e -f /opt/zimbra/data/ldap/mdb/db | grep "Free pages" | awk '{print $3 * 4096/1024/1024 " MB"}'
Tue Nov 20 12:21:58 MST 2018
4160.29 MB

If so, then do the following as the zimbra user

1) Make sure your LDAP environment is configured to fail-over LDAP read traffic to another Replica or the Master   <<== you must do this FIRST

2) zmcontrol stop

3) mv /opt/zimbra/data/ldap/mdb/ /opt/zimbra/ldap/mdb.BIG

4) cd /opt/zimbra/data/ldap

5) mkdir -p mdb/db

6) mdb_copy /opt/zimbra/data/ldap/mdb.BIG/db /opt/zimbra/data/ldap/mdb/db

7) zmcontrol start

*Should be noted to run this from the replicas, and not on the master*

What was the cause of the fragmentation?

Most likely there was an event, and that event will be in the access log. Export your access log from the ldap-master, and inspect it.

/opt/zimbra/libexec/zmslapcat -a /tmp/

A way to detect LDAP updates that are over 5 seconds

Set the minimum log level required

zmlocalconfig -e ldap_common_loglevel="stats none"

Now find updates that taking over a certain period of time - lets say 5 seconds (5000 ms) or more.

tail -f /var/log/zimbra.log | egrep 'duration=[5-9][0-9]{3}\.' | grep -o 'conn=[^ ]* op=[^ ]*' > /tmp/5secplus.log

Run that for a bit during a time of slowness, or just generally through a busy period during the day (8:30-9:00am for example), and then set the loglevel back to 49152 (because it builds quickly).

zmlocalconfig -e ldap_common_loglevel="none sync"

Now find the attributes associated with the long durations (5 seconds) - this would be the updates causing sync-repl slowness (example, we likely see CSRF token data or a gentime attribute).

fgrep -f /tmp/5secplus.log /var/log/zimbra.log |grep -o 'MOD attr=.*' | sort | uniq -c

Much Thanks to Karl Buchner & John Holder for the mdb_copy syntax!

More articles written by me, https://wiki.zimbra.com/wiki/King0770-Notes

Jump to: navigation, search