[pkg-bacula-devel] Bug#737821: bacula-console-qt: .bvfs_update is carried out VERY long on jobs with several million files and a deep enclosure

Alexander Gordienko alexgo at vtflex.ru
Thu Feb 6 08:02:01 UTC 2014


Package: bacula-console-qt
Version: 5.2.6+dfsg-9
Severity: important

Dear Maintainer,

For recovery of files by means of bRestore, it is necessary to execute at first .bvfs_update.

When trying to build the PathHierarchy with .bvfs_update, the update for big jobs take forever... 

Steps to Reproduce: 
- have a big dataset (> 1 million files AND a deep directory hierarchy)
- .bvfs_update job=N
- mysql used 100% CPU very mach time
- at this time no run jobs on the schedule

Additional Information: 
I updated bacula from version 5.0 to 5.2 (debian 6 -> 7). In the new bat version the "Browse Cataloged Files" function became deprecated. It is offered to use bRestore. My bat hanged at bRestore choice. At this time mysql uses 100% of Cpu

I enable slow query log on mysql. 

The analysis of slow query is lower

TOP 1 from "#mysqldumpslow mysql-slow.log"

Count: 249  Time=243.29s (60579s)  Lock=0.00s (0s)  Rows=0.0 (0), bacula[bacula]@localhost
  INSERT INTO PathVisibility (PathId, JobId)  SELECT a.PathId,N FROM ( SELECT DISTINCT h.PPathId AS PathId FROM PathHierarchy AS h JOIN  PathVisibility AS p ON (h.PathId=p.PathId) WHERE p.JobId=N) AS a LEFT JOIN (SELECT PathId FROM PathVisibility WHERE JobId=N) AS b ON (a.PathId = b.PathId) WHERE b.PathId IS NULL

I carried out the analysis and optimization of this query.

Original query:
	INSERT INTO PathVisibility (PathId, JobId) 
	SELECT a.PathId,N 
	FROM ( 
		SELECT DISTINCT h.PPathId AS PathId 
		FROM PathHierarchy AS h 
		JOIN  PathVisibility AS p 
			ON (h.PathId=p.PathId) 
		WHERE p.JobId=N) AS a 
	LEFT JOIN (
			SELECT PathId 
			FROM PathVisibility 
			WHERE JobId=N) AS b 
		ON (a.PathId = b.PathId) 
	WHERE b.PathId IS NULL

Optimized query:
	INSERT INTO PathVisibility (PathId, JobId) 
	SELECT a.PathId,N 
	FROM ( 
		SELECT DISTINCT h.PPathId AS PathId 
		FROM PathHierarchy AS h 
		JOIN  PathVisibility AS p 
			ON (h.PathId=p.PathId) 
		WHERE p.JobId=N) AS a 
	LEFT JOIN PathVisibility AS b 
		ON (b.JobId=N and a.PathId = b.PathId) 
	WHERE b.PathId IS NULL

Results of query are identical.

In original query in join on the right there is a temporary table (subquery "SELECT PathId FROM PathVisibility WHERE JobId=N"). The temporary table has no index across the field of PathId. For each record of the left table there is a full search of the right table.

The table PathVisibility has a compound index on the fields JobId and PathId. In the optimized query join is carried out with use of this index.

After introduction of editings and recompilation bacula of .bvfs_update it was executed in 17 minutes. With old inquiry in 16-17 hours it didn't end and was still very far from end.

Count: 19052  Time=0.01s (152s)  Lock=0.00s (1s)  Rows=0.0 (0), bacula[bacula]@localhost
  INSERT INTO PathVisibility (PathId, JobId)  SELECT a.PathId,N FROM ( SELECT DISTINCT h.PPathId AS PathId FROM PathHierarchy AS h JOIN  PathVisibility AS p ON (h.PathId=p.PathId) WHERE p.JobId=N) AS a LEFT JOIN PathVisibility AS b ON (JobId=N and a.PathId = b.PathId) WHERE b.PathId IS NULL

The similar situation was stated in http://sourceforge.net/mailarchive/forum.php?thread_name=3ccaa3a6fb746ee27273ef0b1db9045b%40bugs.bacula.org&forum_name=bacula-bugs

I have a base in mysql. So detailed check is necessary for other bases. I asked the friend to check on postgresql for a lot of work. After 1,5 hours of processing of one big job were compelled to stop postgresql.

Patch:

diff -Naur bacula-5.2.6+dfsg/src/cats/bvfs.c_orig bacula-5.2.6+dfsg/src/cats/bvfs.c
--- bacula-5.2.6+dfsg/src/cats/bvfs.c_orig      2012-06-02 13:26:27.000000000 +0400
+++ bacula-5.2.6+dfsg/src/cats/bvfs.c   2014-02-06 10:42:34.798440879 +0400
@@ -404,10 +404,9 @@
      "SELECT DISTINCT h.PPathId AS PathId "
        "FROM PathHierarchy AS h "
        "JOIN  PathVisibility AS p ON (h.PathId=p.PathId) "
-      "WHERE p.JobId=%s) AS a LEFT JOIN "
-       "(SELECT PathId "
-          "FROM PathVisibility "
-         "WHERE JobId=%s) AS b ON (a.PathId = b.PathId) "
+      "WHERE p.JobId=%s) AS a "
+     "LEFT JOIN PathVisibility AS b "
+       "ON (b.JobId=%s and a.PathId = b.PathId) "
    "WHERE b.PathId IS NULL",  jobid, jobid, jobid);
    }



-- System Information:
Debian Release: 7.3
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 3.2.0-4-amd64 (SMP w/4 CPU cores)
Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968)
Shell: /bin/sh linked to /bin/dash

Versions of packages bacula-console-qt depends on:
ii  bacula-common  5.2.6+dfsg-9
ii  libc6          2.13-38
ii  libcap2        1:2.22-1.2
ii  libgcc1        1:4.7.2-5
ii  libqtcore4     4:4.8.2+dfsg-11
ii  libqtgui4      4:4.8.2+dfsg-11
ii  libssl1.0.0    1.0.1e-2+deb7u3
ii  libstdc++6     4.7.2-5
ii  libwrap0       7.6.q-24
ii  zlib1g         1:1.2.7.dfsg-13

bacula-console-qt recommends no packages.

bacula-console-qt suggests no packages.

-- no debconf information



More information about the pkg-bacula-devel mailing list