[Neurodebian-users] condor and fsl: again

Stefan Kreisel kreisel.stefan at web.de
Thu May 31 11:20:53 UTC 2012


Hi Michael,
good your back from holiday ;) Here's the thread history.
Cheers
Stefan


-> MESSAGE 1.
Dear list,
have installed a fresh Debian (debian-6.0.5-amd64) system + fsl and
condor from the neurodebian repository. Ran fsl-feeds (i.e.
fsl-selftest) without problems - though running parallelized FDT or FEAT
actually leads to longer processing times in comparison to setting
FSLPARALLEL=0 (I think this is something Bertram Walter pointed out some
time ago (http://web.archiveorange.com/archive/v/FQvMCazdbKoptd7PgAqu)
-> there has been, however, no solution posted on the list...
Here's the real problem: tbss_reg_2 -n invokes condor, but terminates
(within a minute or so) without having done calculations. See the
exerpts below.


stefan at gehirne 
<http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/neurodebian-users>:~$ 
condor_q

-- Submitter: : <127.0.0.1:39795> : gehirne.fritz.box
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
44.140 stefan 5/15 08:06 0+00:00:00 R 0 0.1 sh
/home/stefan/st
44.141 stefan 5/15 08:06 0+00:00:00 R 0 0.1 sh
/home/stefan/st
44.142 stefan 5/15 08:06 0+00:00:00 R 0 0.1 sh
/home/stefan/st
44.143 stefan 5/15 08:06 0+00:00:00 R 0 0.1 sh
/home/stefan/st
[...]


stefan at gehirne 
<http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/neurodebian-users>:~$ 
condor_status

Name OpSys Arch State Activity LoadAv Mem
ActvtyTime

slot1 at gehirne.frit 
<http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/neurodebian-users> 
LINUX X86_64 Claimed Busy 0.000 2001
0+00:00:01
slot2 at gehirne.frit 
<http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/neurodebian-users> 
LINUX X86_64 Claimed Busy 0.000 2001
0+00:00:02
slot3 at gehirne.frit 
<http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/neurodebian-users> 
LINUX X86_64 Claimed Busy 0.000 2001
0+00:00:01
slot4 at gehirne.frit 
<http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/neurodebian-users> 
LINUX X86_64 Claimed Busy 0.510 2001
0+00:00:01
Total Owner Claimed Unclaimed Matched Preempting
Backfill

X86_64/LINUX 4 0 4 0 0
0 0

Total 4 0 4 0 0
0 0


All the file operations are done - the log files in the /FA/tbss_logs
directory all say "sed: can't read .commands: No such file or directory"
Contrary to the instructions page
(http://neuro.debian.net/blog/2012/2012-03-09_parallelize_fsl_with_condor.html?highlight=fsl%20condor) 

I notice that I'm not the owner of the slots (no idea if this is of any
relevance...).

Any ideas?

Cheers

Stefan



-> MESSAGE 2.
Sorry for the html-gibberish in the previous mail - here's another
attempt...



 > Ahhhhhhhhhh! I think you just solved Bertrams problem!! Replying in that
 > thread.

Well glad I could help...



 > This looks all good. Could you post the output of
 >
 > condor_q -l 44.140
 >
 > so we can see what it is trying to run. From the cluster ID it seems
 > that this is an actual TBBS analisys, not the FEEDS test, right? Getting
 > some more details might shed some light.

OK ->

stefan at gehirne 
<http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/neurodebian-users>:~$ 
condor_q -l 45.399

-- Submitter: gehirne.fritz.box : <127.0.0.1:46801> : gehirne.fritz.box
BufferSize = 524288
NiceUser = false
CoreSize = 0
CumulativeSlotTime = 0
OnExitHold = false
GlobalJobId = "gehirne.fritz.box#45.399#1337110950"
RequestCpus = 1
Err = "tbss_logs/tbss_2_reg.e45.400"
BufferBlockSize = 32768
ImageSize = 125
CurrentTime = time()
WantCheckpoint = false
CommittedTime = 0
TargetType = "Machine"
WhenToTransferOutput = "ON_EXIT"
ServerTime = 1337110979
Cmd = "/bin/sh"
JobUniverse = 5
ExitBySignal = false
TransferIn = false
Iwd = "/home/stefan/studies/tbss_-n/FA"
NumRestarts = 0
CommittedSuspensionTime = 0
Owner = "stefan"
NumSystemHolds = 0
CumulativeSuspensionTime = 0
Environment =
"PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/usr/lib/fsl/4.1 

XDG_SESSION_COOKIE=98f228034f053963585a8d1100000008-1337108340.39690-977355352 

LD_LIBRARY_PATH=/usr/lib/fsl/4.1 FSLTCLSH=/usr/bin/tclsh LANG=en_US.utf8
COLORTERM=gnome-terminal FSLREMOTECALL= WINDOWPATH=7
GNOME_KEYRING_CONTROL=/tmp/keyring-TA1iOj
XAUTHORITY=/var/run/gdm3/auth-for-stefan-Zhyelq/database
FSLOUTPUTTYPE=NIFTI_GZ SSH_AUTH_SOCK=/tmp/keyring-TA1iOj/ssh
FSLMULTIFILEQUIT=TRUE
XDG_DATA_DIRS=/usr/share/gnome:/usr/share/gdm/:/usr/local/share/:/usr/share/ 

FSLBROWSER=/etc/alternatives/x-www-browser
SESSION_MANAGER=local/gehirne:@/tmp/.ICE-unix/2025,unix/gehirne:/tmp/.ICE-unix/2025 

GDM_LANG=en_US.utf8 FSLDIR=/usr/share/fsl/4.1
ORBIT_SOCKETDIR=/tmp/orbit-stefan FSLPARALLEL=condor SGE_TASK_LAST=400
SHELL=/bin/bash GNOME_DESKTOP_SESSION_ID=this-is-deprecated
PWD=/home/stefan/studies/tbss_-n/FA _=/usr/lib/fsl/4.1/tbss_2_reg
USERNAME=stefan SGE_TASK_STEPSIZE= WINDOWID=16777219
FSLWISH=/usr/bin/wish USER=stefan
GTK_RC_FILES=/etc/gtk/gtkrc:/home/stefan/.gtkrc-1.2-gnome2
DESKTOP_SESSION=default FSLMACHINELIST= FSLLOCKDIR= SHLVL=1
POSSUMDIR=/usr/share/fsl/4.1 POSIXLY_CORRECT=1 GDMSESSION=default
SGE_TASK_ID=400 HOME=/home/stefan
DBUS_SESSION_BUS_ADDRESS=unix:abstract=/tmp/dbus-Ovra5yhakV,guid=41b5e4185d4605edca2138e900000021 

TERM=xterm OLDPWD=/home/stefan/studies/tbss_-n SSH_AGENT_PID=2062
GDM_KEYBOARD_LAYOUT=de' 'nodeadkeys LOGNAME=stefan SGE_TASK_FIRST=1
DISPLAY=:0.0"
RequestDisk = DiskUsage
Requirements = ( TARGET.Arch == "X86_64" ) && ( TARGET.OpSys == "LINUX"
) && ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory
) && ( TARGET.HasFileTransfer )
MinHosts = 1
JobNotification = 0
NumCkpts = 0
LastSuspensionTime = 0
NumJobStarts = 0
WantRemoteSyscalls = false
JobLeaseDuration = 1200
ImageSize_RAW = 102
JobPrio = 0
RootDir = "/"
CurrentHosts = 0
StreamOut = false
WantRemoteIO = true
DiskUsage_RAW = 102
OnExitRemove = true
DiskUsage = 125
In = "/dev/null"
PeriodicRemove = false
RemoteUserCpu = 0.0
LocalUserCpu = 0.0
ExecutableSize = 125
RemoteSysCpu = 0.0
LocalSysCpu = 0.0
ClusterId = 45
CompletionDate = 0
RemoteWallClockTime = 0.0
Rank = 0.0
LeaveJobInQueue = false
MyType = "Job"
CondorVersion = "$CondorVersion: 7.8.0 May 09 2012
Debian-7.8.0~dfsg.1-1~nd60+1 $"
NumCkpts_RAW = 0
StreamErr = false
ProcId = 399
PeriodicHold = false
User = "stefan at gehirne.fritz.box 
<http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/neurodebian-users>"
LastJobStatus = 0
Out = "tbss_logs/tbss_2_reg.o45.400"
JobStatus = 1
ExecutableSize_RAW = 102
PeriodicRelease = false
AutoClusterAttrs =
"JobUniverse,LastCheckpointPlatform,NumCkpts,RemoteGroup,SubmitterGroup,DiskUsage,ImageSize,RequestDisk,RequestMemory,Requirements,NiceUser,ConcurrencyLimits"
Args = "/home/stefan/studies/tbss_-n/FA/tbss_2_reg_qsub.HRdir6mUJNkG1"
RequestMemory = ifthenelse(MemoryUsage =!= undefined,MemoryUsage,(
ImageSize + 1023 ) / 1024)
MaxHosts = 1
TotalSuspensions = 0
CommittedSlotTime = 0
NotifyUser = "stefan at localhost 
<http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/neurodebian-users>"
CondorPlatform = "$CondorPlatform: X86_64-Debian_6.0 $"
AutoClusterId = 1
ShouldTransferFiles = "YES"
ExitStatus = 0
QDate = 1337110949
EnteredCurrentStatus = 1337110950



Interestingly: If I run tbss_2_reg -n a second time in the same terminal
(on a fresh copy of the original data) I get following error without
anything happening.

stefan at gehirne 
<http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/neurodebian-users>:~/studies/tbss_-n$ 
tbss_2_reg -n
wc: .commands: No such file or directory

I wonder what the word count command is doing here?



Cheers

Stefan



-> MESSAGE 3.
Hi all.
Reinstalled Debian plus tried the process on the neurodebian virtual
machine; no luck, both routes do not lead to success. Any help out there?
Cheers
Stefan


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/neurodebian-users/attachments/20120531/b9920f4b/attachment.html>


More information about the Neurodebian-users mailing list