[Neurodebian-users] condor_qsub sentinel jobs release dependencies early in some cases

Michael Hanke mih at debian.org
Sun Nov 4 15:55:25 UTC 2012


Hi,

On Fri, Nov 02, 2012 at 01:10:42PM +0000, Chase,Philip B wrote:
> I have hint as to what might be causing problems with my sentinel jobs.  Late yesterday I saw this:
> 
>   $ condor_q
> 
>   -- Failed to fetch ads from: <127.0.0.1:50116> : name.domain
> 
> Moments later all the sentinel jobs released their dependents.
> 
> The condor docs suggest my use of BIND_ALL_INTERFACES = TRUE is not enough and that I must explicitly set NETWORK_INTERFACE to the non-loopback address.  The pool is reconfiguring right now.  I'll retest.

Thanks for reporting your experience. Please let us know how it goes. I
can confirm that condor is pretty picky about the network configuration.
Without a non-loopback network device it is tough to get it to run
properly.

It may be possible to improve the sentinel job configuration to not
release the waiting jobs in this case. However, I do not have the time
to look into this right now. I'd be glad if you could file a bug report
with a summary of your findings so I do not forget about this issue, and
can take a look as soon as I find the time.

Thanks,


Michael



-- 
Michael Hanke
http://mih.voxindeserto.de



More information about the Neurodebian-users mailing list