[Neurodebian-users] condor and fsl: again

Michael Hanke mih at debian.org
Tue May 15 18:29:02 UTC 2012


Hi,

On Tue, May 15, 2012 at 03:19:25PM +0200, Stefan Kreisel wrote:
> have installed a fresh Debian (debian-6.0.5-amd64) system + fsl and
> condor from the neurodebian repository. Ran fsl-feeds (i.e.
> fsl-selftest) without problems - though running parallelized FDT or
> FEAT actually leads to longer processing times in comparison to
> setting FSLPARALLEL=0 (I think this is something Bertram Walter
> pointed out some time ago
> (http://web.archiveorange.com/archive/v/FQvMCazdbKoptd7PgAqu) ->

The longer processing time is due to the fact the the selftest doesn't
actually run parallel job -- you are observing the scheduling overhead.
For actual analyses the picture differs.

> there has been, however, no solution posted on the list...
> Here's the real problem: tbss_reg_2 -n invokes condor, but
> terminates (within a minute or so) without having done calculations.
> See the exerpts below.
> 
> stefan at gehirne:~$ condor_q
> 
> -- Submitter:  : <127.0.0.1:39795> : gehirne.fritz.box
>  ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
>   44.140 stefan          5/15 08:06   0+00:00:00 R  0   0.1  sh
> /home/stefan/st
>   44.141 stefan          5/15 08:06   0+00:00:00 R  0   0.1  sh
> /home/stefan/st
>   44.142 stefan          5/15 08:06   0+00:00:00 R  0   0.1  sh
> /home/stefan/st
>   44.143 stefan          5/15 08:06   0+00:00:00 R  0   0.1  sh
> /home/stefan/st
> [...]
> 
> 
> stefan at gehirne:~$ condor_status
> 
> Name               OpSys      Arch   State     Activity LoadAv Mem
> ActvtyTime
> 
> slot1 at gehirne.frit LINUX      X86_64 Claimed   Busy     0.000  2001
> 0+00:00:01
> slot2 at gehirne.frit LINUX      X86_64 Claimed   Busy     0.000  2001
> 0+00:00:02
> slot3 at gehirne.frit LINUX      X86_64 Claimed   Busy     0.000  2001
> 0+00:00:01
> slot4 at gehirne.frit LINUX      X86_64 Claimed   Busy     0.510  2001
> 0+00:00:01
>                      Total Owner Claimed Unclaimed Matched
> Preempting Backfill
> 
>         X86_64/LINUX     4     0       4         0       0
> 0        0
> 
>                Total     4     0       4         0       0
> 0        0

This looks all good. Could you post the output of

  condor_q -l 44.140

so we can see what it is trying to run. From the cluster ID it seems
that this is an actual TBBS analisys, not the FEEDS test, right? Getting
some more details might shed some light.

> All the file operations are done - the log files in the
> /FA/tbss_logs directory all say "sed: can't read .commands: No such
> file or directory"
> Contrary to the instructions page (http://neuro.debian.net/blog/2012/2012-03-09_parallelize_fsl_with_condor.html?highlight=fsl%20condor)
> I notice that I'm not the owner of the slots (no idea if this is of
> any relevance...).

Ahhhhhhhhhh! I think you just solved Bertrams problem!! Replying in that
thread.

Thanks,

Michael

-- 
Michael Hanke
http://mih.voxindeserto.de



More information about the Neurodebian-users mailing list