[Neurodebian-users] condor_qsub sentinel jobs release dependencies early in some cases

Chase,Philip B pbc at ufl.edu
Mon Nov 5 15:38:28 UTC 2012


I ran a large test last night on the reconfigured cluster and the problem persists.  I have a patch that mitigates the problem, but it feels kludgy to me.

I submitted a bug report that has all the details:  http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=+692388

Philip

Philip B. Chase
Assistant Director
Clinical and Translational Science IT
University of Florida
pbc at ufl.edu
352-294-5164

From: Michael Hanke <mih at debian.org<mailto:mih at debian.org>>
Date: Sunday, November 4, 2012 11:55 AM
To: Philip Chase <pbc at ufl.edu<mailto:pbc at ufl.edu>>
Cc: "neurodebian-users at lists.alioth.debian.org<mailto:neurodebian-users at lists.alioth.debian.org>" <neurodebian-users at lists.alioth.debian.org<mailto:neurodebian-users at lists.alioth.debian.org>>
Subject: Re: [Neurodebian-users] condor_qsub sentinel jobs release dependencies early in some cases

Hi,

On Fri, Nov 02, 2012 at 01:10:42PM +0000, Chase,Philip B wrote:
?I have hint as to what might be causing problems with my sentinel jobs.  Late yesterday I saw this:
   $ condor_q
   -- Failed to fetch ads from: <127.0.0.1:50116> : name.domain
Moments later all the sentinel jobs released their dependents.
The condor docs suggest my use of BIND_ALL_INTERFACES = TRUE is not enough and that I must explicitly set NETWORK_INTERFACE to the non-loopback address.  The pool is reconfiguring right now.  I'll retest.

Thanks for reporting your experience. Please let us know how it goes. I
can confirm that condor is pretty picky about the network configuration.
Without a non-loopback network device it is tough to get it to run
properly.

It may be possible to improve the sentinel job configuration to not
release the waiting jobs in this case. However, I do not have the time
to look into this right now. I'd be glad if you could file a bug report
with a summary of your findings so I do not forget about this issue, and
can take a look as soon as I find the time.

Thanks,


Michael



--
Michael Hanke
http://mih.voxindeserto.de

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/neurodebian-users/attachments/20121105/fec70dbf/attachment-0001.html>


More information about the Neurodebian-users mailing list