[Neurodebian-users] condor and fsl: again
Stefan Kreisel
kreisel.stefan at web.de
Thu May 31 11:20:53 UTC 2012
Hi Michael,
good your back from holiday ;) Here's the thread history.
Cheers
Stefan
-> MESSAGE 1.
Dear list,
have installed a fresh Debian (debian-6.0.5-amd64) system + fsl and
condor from the neurodebian repository. Ran fsl-feeds (i.e.
fsl-selftest) without problems - though running parallelized FDT or FEAT
actually leads to longer processing times in comparison to setting
FSLPARALLEL=0 (I think this is something Bertram Walter pointed out some
time ago (http://web.archiveorange.com/archive/v/FQvMCazdbKoptd7PgAqu)
-> there has been, however, no solution posted on the list...
Here's the real problem: tbss_reg_2 -n invokes condor, but terminates
(within a minute or so) without having done calculations. See the
exerpts below.
stefan at gehirne
<http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/neurodebian-users>:~$
condor_q
-- Submitter: : <127.0.0.1:39795> : gehirne.fritz.box
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
44.140 stefan 5/15 08:06 0+00:00:00 R 0 0.1 sh
/home/stefan/st
44.141 stefan 5/15 08:06 0+00:00:00 R 0 0.1 sh
/home/stefan/st
44.142 stefan 5/15 08:06 0+00:00:00 R 0 0.1 sh
/home/stefan/st
44.143 stefan 5/15 08:06 0+00:00:00 R 0 0.1 sh
/home/stefan/st
[...]
stefan at gehirne
<http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/neurodebian-users>:~$
condor_status
Name OpSys Arch State Activity LoadAv Mem
ActvtyTime
slot1 at gehirne.frit
<http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/neurodebian-users>
LINUX X86_64 Claimed Busy 0.000 2001
0+00:00:01
slot2 at gehirne.frit
<http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/neurodebian-users>
LINUX X86_64 Claimed Busy 0.000 2001
0+00:00:02
slot3 at gehirne.frit
<http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/neurodebian-users>
LINUX X86_64 Claimed Busy 0.000 2001
0+00:00:01
slot4 at gehirne.frit
<http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/neurodebian-users>
LINUX X86_64 Claimed Busy 0.510 2001
0+00:00:01
Total Owner Claimed Unclaimed Matched Preempting
Backfill
X86_64/LINUX 4 0 4 0 0
0 0
Total 4 0 4 0 0
0 0
All the file operations are done - the log files in the /FA/tbss_logs
directory all say "sed: can't read .commands: No such file or directory"
Contrary to the instructions page
(http://neuro.debian.net/blog/2012/2012-03-09_parallelize_fsl_with_condor.html?highlight=fsl%20condor)
I notice that I'm not the owner of the slots (no idea if this is of any
relevance...).
Any ideas?
Cheers
Stefan
-> MESSAGE 2.
Sorry for the html-gibberish in the previous mail - here's another
attempt...
> Ahhhhhhhhhh! I think you just solved Bertrams problem!! Replying in that
> thread.
Well glad I could help...
> This looks all good. Could you post the output of
>
> condor_q -l 44.140
>
> so we can see what it is trying to run. From the cluster ID it seems
> that this is an actual TBBS analisys, not the FEEDS test, right? Getting
> some more details might shed some light.
OK ->
stefan at gehirne
<http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/neurodebian-users>:~$
condor_q -l 45.399
-- Submitter: gehirne.fritz.box : <127.0.0.1:46801> : gehirne.fritz.box
BufferSize = 524288
NiceUser = false
CoreSize = 0
CumulativeSlotTime = 0
OnExitHold = false
GlobalJobId = "gehirne.fritz.box#45.399#1337110950"
RequestCpus = 1
Err = "tbss_logs/tbss_2_reg.e45.400"
BufferBlockSize = 32768
ImageSize = 125
CurrentTime = time()
WantCheckpoint = false
CommittedTime = 0
TargetType = "Machine"
WhenToTransferOutput = "ON_EXIT"
ServerTime = 1337110979
Cmd = "/bin/sh"
JobUniverse = 5
ExitBySignal = false
TransferIn = false
Iwd = "/home/stefan/studies/tbss_-n/FA"
NumRestarts = 0
CommittedSuspensionTime = 0
Owner = "stefan"
NumSystemHolds = 0
CumulativeSuspensionTime = 0
Environment =
"PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/usr/lib/fsl/4.1
XDG_SESSION_COOKIE=98f228034f053963585a8d1100000008-1337108340.39690-977355352
LD_LIBRARY_PATH=/usr/lib/fsl/4.1 FSLTCLSH=/usr/bin/tclsh LANG=en_US.utf8
COLORTERM=gnome-terminal FSLREMOTECALL= WINDOWPATH=7
GNOME_KEYRING_CONTROL=/tmp/keyring-TA1iOj
XAUTHORITY=/var/run/gdm3/auth-for-stefan-Zhyelq/database
FSLOUTPUTTYPE=NIFTI_GZ SSH_AUTH_SOCK=/tmp/keyring-TA1iOj/ssh
FSLMULTIFILEQUIT=TRUE
XDG_DATA_DIRS=/usr/share/gnome:/usr/share/gdm/:/usr/local/share/:/usr/share/
FSLBROWSER=/etc/alternatives/x-www-browser
SESSION_MANAGER=local/gehirne:@/tmp/.ICE-unix/2025,unix/gehirne:/tmp/.ICE-unix/2025
GDM_LANG=en_US.utf8 FSLDIR=/usr/share/fsl/4.1
ORBIT_SOCKETDIR=/tmp/orbit-stefan FSLPARALLEL=condor SGE_TASK_LAST=400
SHELL=/bin/bash GNOME_DESKTOP_SESSION_ID=this-is-deprecated
PWD=/home/stefan/studies/tbss_-n/FA _=/usr/lib/fsl/4.1/tbss_2_reg
USERNAME=stefan SGE_TASK_STEPSIZE= WINDOWID=16777219
FSLWISH=/usr/bin/wish USER=stefan
GTK_RC_FILES=/etc/gtk/gtkrc:/home/stefan/.gtkrc-1.2-gnome2
DESKTOP_SESSION=default FSLMACHINELIST= FSLLOCKDIR= SHLVL=1
POSSUMDIR=/usr/share/fsl/4.1 POSIXLY_CORRECT=1 GDMSESSION=default
SGE_TASK_ID=400 HOME=/home/stefan
DBUS_SESSION_BUS_ADDRESS=unix:abstract=/tmp/dbus-Ovra5yhakV,guid=41b5e4185d4605edca2138e900000021
TERM=xterm OLDPWD=/home/stefan/studies/tbss_-n SSH_AGENT_PID=2062
GDM_KEYBOARD_LAYOUT=de' 'nodeadkeys LOGNAME=stefan SGE_TASK_FIRST=1
DISPLAY=:0.0"
RequestDisk = DiskUsage
Requirements = ( TARGET.Arch == "X86_64" ) && ( TARGET.OpSys == "LINUX"
) && ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory
) && ( TARGET.HasFileTransfer )
MinHosts = 1
JobNotification = 0
NumCkpts = 0
LastSuspensionTime = 0
NumJobStarts = 0
WantRemoteSyscalls = false
JobLeaseDuration = 1200
ImageSize_RAW = 102
JobPrio = 0
RootDir = "/"
CurrentHosts = 0
StreamOut = false
WantRemoteIO = true
DiskUsage_RAW = 102
OnExitRemove = true
DiskUsage = 125
In = "/dev/null"
PeriodicRemove = false
RemoteUserCpu = 0.0
LocalUserCpu = 0.0
ExecutableSize = 125
RemoteSysCpu = 0.0
LocalSysCpu = 0.0
ClusterId = 45
CompletionDate = 0
RemoteWallClockTime = 0.0
Rank = 0.0
LeaveJobInQueue = false
MyType = "Job"
CondorVersion = "$CondorVersion: 7.8.0 May 09 2012
Debian-7.8.0~dfsg.1-1~nd60+1 $"
NumCkpts_RAW = 0
StreamErr = false
ProcId = 399
PeriodicHold = false
User = "stefan at gehirne.fritz.box
<http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/neurodebian-users>"
LastJobStatus = 0
Out = "tbss_logs/tbss_2_reg.o45.400"
JobStatus = 1
ExecutableSize_RAW = 102
PeriodicRelease = false
AutoClusterAttrs =
"JobUniverse,LastCheckpointPlatform,NumCkpts,RemoteGroup,SubmitterGroup,DiskUsage,ImageSize,RequestDisk,RequestMemory,Requirements,NiceUser,ConcurrencyLimits"
Args = "/home/stefan/studies/tbss_-n/FA/tbss_2_reg_qsub.HRdir6mUJNkG1"
RequestMemory = ifthenelse(MemoryUsage =!= undefined,MemoryUsage,(
ImageSize + 1023 ) / 1024)
MaxHosts = 1
TotalSuspensions = 0
CommittedSlotTime = 0
NotifyUser = "stefan at localhost
<http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/neurodebian-users>"
CondorPlatform = "$CondorPlatform: X86_64-Debian_6.0 $"
AutoClusterId = 1
ShouldTransferFiles = "YES"
ExitStatus = 0
QDate = 1337110949
EnteredCurrentStatus = 1337110950
Interestingly: If I run tbss_2_reg -n a second time in the same terminal
(on a fresh copy of the original data) I get following error without
anything happening.
stefan at gehirne
<http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/neurodebian-users>:~/studies/tbss_-n$
tbss_2_reg -n
wc: .commands: No such file or directory
I wonder what the word count command is doing here?
Cheers
Stefan
-> MESSAGE 3.
Hi all.
Reinstalled Debian plus tried the process on the neurodebian virtual
machine; no luck, both routes do not lead to success. Any help out there?
Cheers
Stefan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/neurodebian-users/attachments/20120531/b9920f4b/attachment.html>
More information about the Neurodebian-users
mailing list