[Bug 968021] Re: boinc stops on error after a few days (md5_file: Too many open files) in stderrdae.txt
Clint Byrum
clint at fewbar.com
Wed Jul 18 17:09:22 UTC 2012
** Description changed:
+ == SRU Justification ==
+
+ Impact : when opened (oneiric), the bug was/is as described: a long time running boinc system would finally fails to compute more boinc work unit as every computed work unit leads to a leak of 1 file descriptor in the boinc main daemon (irrelevant of the kind of project subscribed). The faster the work units were processed, and the lower the limit of file descriptor in the system, the faster the bug happens (usually within one week of uninterrupted uptime, but might go to months on slower systems).
+ This bug affects all users from Oneiric (boinc 6.12.33+dfsg-1.1ubuntu0.1) to 7.0.27 (excluded, that one is fine).
+ (well, 7.0.23, 7.0.24 & 7.0.25 have another issue: computation error, no more leak; 7.0.26 not tested)
+
+ Test case: easy, but very long: run boinc for at least one complete work
+ unit (according to project, the unit can be 5 minutes to many hours),
+ then use "lsof" on the boinc daemon and check the end of the listing.
+ When more units have been processed, the list reported by "lsof" should
+ not be longer than before. Computation of each unit must succeed.
+
+ Regression Potential: I do not know the change, I cannot discuss the
+ impact. But boinc must be able to run unattended for months without such
+ problem, and without reboot, especially on a LTS.
+
+ == Original Description ==
+
There seems to be a file descriptor leaks in the boinc process (client
side).
After a few days of fine loading the system, it would suddenly stop
working.
Relaunching it is usually ok (but actively managing a system running
boinc is rather not a decent solution).
Clue with the following command:
$ sudo lsof -p `pidof boinc`
The number of open file descriptor will keep increasing as boinc tasks
are completed. (more visible when the projects have fast tasks for the
hardware, such as sudoku or milkyway/nvidia)
A lot of entries are like:
boinc 15348 boinc 623r DIR 8,1 4096 29492116 /var/lib/boinc-client/slots/12
boinc 15348 boinc 624r DIR 8,1 4096 29492173 /var/lib/boinc-client/slots/13
boinc 15348 boinc 625r DIR 8,1 4096 29492116 /var/lib/boinc-client/slots/12
boinc 15348 boinc 626r DIR 8,1 4096 29492084 /var/lib/boinc-client/slots/8
boinc 15348 boinc 627r DIR 8,1 4096 29492085 /var/lib/boinc-client/slots/9
boinc 15348 boinc 628r DIR 8,1 4096 29492116 /var/lib/boinc-client/slots/12
boinc 15348 boinc 629r DIR 8,1 4096 29492173 /var/lib/boinc-client/slots/13
boinc 15348 boinc 630r DIR 8,1 4096 29492116 /var/lib/boinc-client/slots/12
boinc 15348 boinc 632r DIR 8,1 4096 29492018 /var/lib/boinc-client/slots/2
boinc 15348 boinc 633r DIR 8,1 4096 29492040 /var/lib/boinc-client/slots/4
boinc 15348 boinc 634r DIR 8,1 4096 29492018 /var/lib/boinc-client/slots/2
boinc 15348 boinc 635r DIR 8,1 4096 29492062 /var/lib/boinc-client/slots/6
boinc 15348 boinc 636r DIR 8,1 4096 29492116 /var/lib/boinc-client/slots/12
ProblemType: Bug
DistroRelease: Ubuntu 11.10
Package: boinc 6.12.33+dfsg-1.1ubuntu0.1
ProcVersionSignature: Ubuntu 3.0.0-17.30-generic 3.0.22
Uname: Linux 3.0.0-17-generic x86_64
NonfreeKernelModules: nvidia
ApportVersion: 1.23-0ubuntu4
Architecture: amd64
Date: Thu Mar 29 08:59:09 2012
InstallationMedia: Ubuntu 11.10 "Oneiric Ocelot" - Release amd64 (20111012)
PackageArchitecture: all
SourcePackage: boinc
UpgradeStatus: No upgrade log present (probably fresh install)
--
You received this bug notification because you are a member of Debian
BOINC Maintainers, which is subscribed to boinc in Ubuntu.
https://bugs.launchpad.net/bugs/968021
Title:
boinc stops on error after a few days (md5_file: Too many open files)
in stderrdae.txt
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/boinc/+bug/968021/+subscriptions
More information about the pkg-boinc-devel
mailing list