[pkg-bacula-devel] Bug#737821: bacula-console-qt: .bvfs_update is carried out VERY long on jobs with several million files and a deep enclosure
Alexander Gordienko
alexgo at vtflex.ru
Thu Feb 6 08:02:01 UTC 2014
Package: bacula-console-qt
Version: 5.2.6+dfsg-9
Severity: important
Dear Maintainer,
For recovery of files by means of bRestore, it is necessary to execute at first .bvfs_update.
When trying to build the PathHierarchy with .bvfs_update, the update for big jobs take forever...
Steps to Reproduce:
- have a big dataset (> 1 million files AND a deep directory hierarchy)
- .bvfs_update job=N
- mysql used 100% CPU very mach time
- at this time no run jobs on the schedule
Additional Information:
I updated bacula from version 5.0 to 5.2 (debian 6 -> 7). In the new bat version the "Browse Cataloged Files" function became deprecated. It is offered to use bRestore. My bat hanged at bRestore choice. At this time mysql uses 100% of Cpu
I enable slow query log on mysql.
The analysis of slow query is lower
TOP 1 from "#mysqldumpslow mysql-slow.log"
Count: 249 Time=243.29s (60579s) Lock=0.00s (0s) Rows=0.0 (0), bacula[bacula]@localhost
INSERT INTO PathVisibility (PathId, JobId) SELECT a.PathId,N FROM ( SELECT DISTINCT h.PPathId AS PathId FROM PathHierarchy AS h JOIN PathVisibility AS p ON (h.PathId=p.PathId) WHERE p.JobId=N) AS a LEFT JOIN (SELECT PathId FROM PathVisibility WHERE JobId=N) AS b ON (a.PathId = b.PathId) WHERE b.PathId IS NULL
I carried out the analysis and optimization of this query.
Original query:
INSERT INTO PathVisibility (PathId, JobId)
SELECT a.PathId,N
FROM (
SELECT DISTINCT h.PPathId AS PathId
FROM PathHierarchy AS h
JOIN PathVisibility AS p
ON (h.PathId=p.PathId)
WHERE p.JobId=N) AS a
LEFT JOIN (
SELECT PathId
FROM PathVisibility
WHERE JobId=N) AS b
ON (a.PathId = b.PathId)
WHERE b.PathId IS NULL
Optimized query:
INSERT INTO PathVisibility (PathId, JobId)
SELECT a.PathId,N
FROM (
SELECT DISTINCT h.PPathId AS PathId
FROM PathHierarchy AS h
JOIN PathVisibility AS p
ON (h.PathId=p.PathId)
WHERE p.JobId=N) AS a
LEFT JOIN PathVisibility AS b
ON (b.JobId=N and a.PathId = b.PathId)
WHERE b.PathId IS NULL
Results of query are identical.
In original query in join on the right there is a temporary table (subquery "SELECT PathId FROM PathVisibility WHERE JobId=N"). The temporary table has no index across the field of PathId. For each record of the left table there is a full search of the right table.
The table PathVisibility has a compound index on the fields JobId and PathId. In the optimized query join is carried out with use of this index.
After introduction of editings and recompilation bacula of .bvfs_update it was executed in 17 minutes. With old inquiry in 16-17 hours it didn't end and was still very far from end.
Count: 19052 Time=0.01s (152s) Lock=0.00s (1s) Rows=0.0 (0), bacula[bacula]@localhost
INSERT INTO PathVisibility (PathId, JobId) SELECT a.PathId,N FROM ( SELECT DISTINCT h.PPathId AS PathId FROM PathHierarchy AS h JOIN PathVisibility AS p ON (h.PathId=p.PathId) WHERE p.JobId=N) AS a LEFT JOIN PathVisibility AS b ON (JobId=N and a.PathId = b.PathId) WHERE b.PathId IS NULL
The similar situation was stated in http://sourceforge.net/mailarchive/forum.php?thread_name=3ccaa3a6fb746ee27273ef0b1db9045b%40bugs.bacula.org&forum_name=bacula-bugs
I have a base in mysql. So detailed check is necessary for other bases. I asked the friend to check on postgresql for a lot of work. After 1,5 hours of processing of one big job were compelled to stop postgresql.
Patch:
diff -Naur bacula-5.2.6+dfsg/src/cats/bvfs.c_orig bacula-5.2.6+dfsg/src/cats/bvfs.c
--- bacula-5.2.6+dfsg/src/cats/bvfs.c_orig 2012-06-02 13:26:27.000000000 +0400
+++ bacula-5.2.6+dfsg/src/cats/bvfs.c 2014-02-06 10:42:34.798440879 +0400
@@ -404,10 +404,9 @@
"SELECT DISTINCT h.PPathId AS PathId "
"FROM PathHierarchy AS h "
"JOIN PathVisibility AS p ON (h.PathId=p.PathId) "
- "WHERE p.JobId=%s) AS a LEFT JOIN "
- "(SELECT PathId "
- "FROM PathVisibility "
- "WHERE JobId=%s) AS b ON (a.PathId = b.PathId) "
+ "WHERE p.JobId=%s) AS a "
+ "LEFT JOIN PathVisibility AS b "
+ "ON (b.JobId=%s and a.PathId = b.PathId) "
"WHERE b.PathId IS NULL", jobid, jobid, jobid);
}
-- System Information:
Debian Release: 7.3
APT prefers stable-updates
APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)
Kernel: Linux 3.2.0-4-amd64 (SMP w/4 CPU cores)
Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968)
Shell: /bin/sh linked to /bin/dash
Versions of packages bacula-console-qt depends on:
ii bacula-common 5.2.6+dfsg-9
ii libc6 2.13-38
ii libcap2 1:2.22-1.2
ii libgcc1 1:4.7.2-5
ii libqtcore4 4:4.8.2+dfsg-11
ii libqtgui4 4:4.8.2+dfsg-11
ii libssl1.0.0 1.0.1e-2+deb7u3
ii libstdc++6 4.7.2-5
ii libwrap0 7.6.q-24
ii zlib1g 1:1.2.7.dfsg-13
bacula-console-qt recommends no packages.
bacula-console-qt suggests no packages.
-- no debconf information
More information about the pkg-bacula-devel
mailing list