Bug#805623: systemd: weird behaviour using 'Restart=on-watchdog'

Arturo Borrero Gonzalez arturo.borrero.glez at gmail.com
Fri Nov 20 11:39:31 GMT 2015


Package: systemd
Version: 227-2
Severity: normal

Dear systemd maintainers,

thanks for your work with this package, it's really appreciated.

I've noticed some inconsistend behaviour when a service is using
Restart=on-watchdog.

TL;DR: systemd restart the daemon but no more than 3 times.

The behaviour described below is unexpected and I was unable to find
any documentation about it.

====== 8< ======
[Unit]
Description=Conntrack Daemon

[Service]
Type=notify
NotifyAccess=all
EnvironmentFile=-/usr/share/conntrackd/environmentfile
EnvironmentFile=-/etc/default/conntrackd
ExecStartPre=-/bin/rm -f /var/lock/conntrackd.lock
ExecStart=/usr/sbin/conntrackd -d -C "$CONFIG" $OPTIONS
ProtectSystem=full
ProtectHome=true
WatchdogSec=10
Restart=on-watchdog

[Install]
WantedBy=multi-user.target
====== 8< ======

This is how I trigger this situation.

Get more verbose information:
 % sudo systemd-analyze set-log-level debug

I start the service by:
 % sudo systemctl start conntrackd.service

At some point, I will manually stop the daemon:
 % sudo kill -SIGSTOP <pid>

nov 20 12:15:12 cfdev02 systemd[1]: conntrackd.service: Trying to
enqueue job conntrackd.service/start/replace
nov 20 12:15:12 cfdev02 systemd[1]: conntrackd.service: Installed new
job conntrackd.service/start as 2065
nov 20 12:15:12 cfdev02 systemd[1]: conntrackd.service: Enqueued job
conntrackd.service/start as 2065
nov 20 12:15:12 cfdev02 systemd[1]: conntrackd.service: About to
execute: /bin/rm -f /var/lock/conntrackd.lock
nov 20 12:15:12 cfdev02 systemd[1]: conntrackd.service: Forked /bin/rm as 7847
nov 20 12:15:12 cfdev02 systemd[1]: conntrackd.service: Changed failed
-> start-pre
nov 20 12:15:12 cfdev02 systemd[1]: Starting Conntrack Daemon...
nov 20 12:15:12 cfdev02 systemd[7847]: conntrackd.service: Executing:
/bin/rm -f /var/lock/conntrackd.lock
nov 20 12:15:12 cfdev02 systemd[1]: conntrackd.service: Child 7847
belongs to conntrackd.service
nov 20 12:15:12 cfdev02 systemd[1]: conntrackd.service: Control
process exited, code=exited status=0
nov 20 12:15:12 cfdev02 systemd[1]: conntrackd.service: Got final
SIGCHLD for state start-pre.
nov 20 12:15:12 cfdev02 systemd[1]: conntrackd.service: About to
execute: /usr/sbin/conntrackd -d -C $CONFIG $OPTIONS
nov 20 12:15:12 cfdev02 systemd[1]: conntrackd.service: Forked
/usr/sbin/conntrackd as 7850
nov 20 12:15:12 cfdev02 systemd[1]: conntrackd.service: Changed
start-pre -> start
nov 20 12:15:12 cfdev02 systemd[7850]: conntrackd.service: Executing:
/usr/sbin/conntrackd -d -C /etc/conntrackd/conntrackd.conf
nov 20 12:15:12 cfdev02 conntrack-tools[7850]: disabling internal cache
nov 20 12:15:12 cfdev02 conntrack-tools[7850]: disabling external cache
nov 20 12:15:12 cfdev02 conntrack-tools[7850]: using kernel-space
event filtering
nov 20 12:15:12 cfdev02 conntrack-tools[7850]: netlink event socket
buffer size has been set to 8000000 bytes
nov 20 12:15:12 cfdev02 conntrack-tools[7850]: initialization completed
nov 20 12:15:12 cfdev02 conntrack-tools[7875]: -- starting in daemon mode --
nov 20 12:15:12 cfdev02 systemd[1]: conntrackd.service: Got
notification message from PID 7850 (MAINPID=7875)
nov 20 12:15:12 cfdev02 systemd[1]: conntrackd.service: Supervising
process 7875 which is not our child. We'll most likely not notice when
it exits.
nov 20 12:15:12 cfdev02 systemd[1]: conntrackd.service: Got
notification message from PID 7875 (READY=1)
nov 20 12:15:12 cfdev02 systemd[1]: conntrackd.service: Changed start -> running
nov 20 12:15:12 cfdev02 systemd[1]: conntrackd.service: Job
conntrackd.service/start finished, result=done
nov 20 12:15:12 cfdev02 systemd[1]: Started Conntrack Daemon.
nov 20 12:15:17 cfdev02 systemd[1]: conntrackd.service: Got
notification message from PID 7875 (WATCHDOG=1)
nov 20 12:15:19 cfdev02 systemd[1]: conntrackd.service: Got
notification message from PID 7875 (WATCHDOG=1)
nov 20 12:15:24 cfdev02 systemd[1]: conntrackd.service: Got
notification message from PID 7875 (WATCHDOG=1)

[ 1) send SIGSTOP to the daemon ]

nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Watchdog
timeout (limit 10s)!
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Changed
running -> failed
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Unit entered
failed state.
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Failed with
result 'watchdog'.
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Changed failed
-> auto-restart
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: cgroup is empty
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Service
hold-off time over, scheduling restart.
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Trying to
enqueue job conntrackd.service/restart/fail
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Installed new
job conntrackd.service/restart as 2172
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Enqueued job
conntrackd.service/restart as 2172
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Scheduled restart job.
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Changed
auto-restart -> dead
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Job
conntrackd.service/restart finished, result=done
nov 20 12:15:34 cfdev02 systemd[1]: Stopped Conntrack Daemon.
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Converting job
conntrackd.service/restart -> conntrackd.service/start
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: About to
execute: /bin/rm -f /var/lock/conntrackd.lock
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Forked /bin/rm as 7931
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Changed dead
-> start-pre
nov 20 12:15:34 cfdev02 systemd[1]: Starting Conntrack Daemon...
nov 20 12:15:34 cfdev02 systemd[7931]: conntrackd.service: Executing:
/bin/rm -f /var/lock/conntrackd.lock
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Child 7931
belongs to conntrackd.service
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Control
process exited, code=exited status=0
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Got final
SIGCHLD for state start-pre.
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: About to
execute: /usr/sbin/conntrackd -d -C $CONFIG $OPTIONS
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Forked
/usr/sbin/conntrackd as 7933
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Changed
start-pre -> start
nov 20 12:15:34 cfdev02 systemd[7933]: conntrackd.service: Executing:
/usr/sbin/conntrackd -d -C /etc/conntrackd/conntrackd.conf
nov 20 12:15:34 cfdev02 conntrack-tools[7933]: disabling internal cache
nov 20 12:15:34 cfdev02 conntrack-tools[7933]: disabling external cache
nov 20 12:15:34 cfdev02 conntrack-tools[7933]: using kernel-space
event filtering
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Got
notification message from PID 7933 (MAINPID=7959)
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Supervising
process 7959 which is not our child. We'll most likely not notice when
it exits.
nov 20 12:15:34 cfdev02 conntrack-tools[7959]: -- starting in daemon mode --
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Got
notification message from PID 7959 (READY=1)
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Changed start -> running
nov 20 12:15:34 cfdev02 systemd[1]: conntrackd.service: Job
conntrackd.service/start finished, result=done
nov 20 12:15:34 cfdev02 systemd[1]: Started Conntrack Daemon.
nov 20 12:15:38 cfdev02 systemd[1]: conntrackd.service: Got
notification message from PID 7959 (WATCHDOG=1)
nov 20 12:15:40 cfdev02 systemd[1]: conntrackd.service: Got
notification message from PID 7959 (WATCHDOG=1)

[ 2) send SIGSTOP to the daemon ]

nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Watchdog
timeout (limit 10s)!
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Changed
running -> failed
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Unit entered
failed state.
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Failed with
result 'watchdog'.
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Changed failed
-> auto-restart
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: cgroup is empty
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Service
hold-off time over, scheduling restart.
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Trying to
enqueue job conntrackd.service/restart/fail
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Installed new
job conntrackd.service/restart as 2279
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Enqueued job
conntrackd.service/restart as 2279
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Scheduled restart job.
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Changed
auto-restart -> dead
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Job
conntrackd.service/restart finished, result=done
nov 20 12:15:50 cfdev02 systemd[1]: Stopped Conntrack Daemon.
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Converting job
conntrackd.service/restart -> conntrackd.service/start
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: About to
execute: /bin/rm -f /var/lock/conntrackd.lock
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Forked /bin/rm as 7989
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Changed dead
-> start-pre
nov 20 12:15:50 cfdev02 systemd[1]: Starting Conntrack Daemon...
nov 20 12:15:50 cfdev02 systemd[7989]: conntrackd.service: Executing:
/bin/rm -f /var/lock/conntrackd.lock
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Child 7989
belongs to conntrackd.service
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Control
process exited, code=exited status=0
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Got final
SIGCHLD for state start-pre.
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: About to
execute: /usr/sbin/conntrackd -d -C $CONFIG $OPTIONS
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Forked
/usr/sbin/conntrackd as 7991
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Changed
start-pre -> start
nov 20 12:15:50 cfdev02 systemd[7991]: conntrackd.service: Executing:
/usr/sbin/conntrackd -d -C /etc/conntrackd/conntrackd.conf
nov 20 12:15:50 cfdev02 conntrack-tools[7991]: disabling internal cache
nov 20 12:15:50 cfdev02 conntrack-tools[7991]: disabling external cache
nov 20 12:15:50 cfdev02 conntrack-tools[7991]: using kernel-space
event filtering
nov 20 12:15:50 cfdev02 conntrack-tools[7991]: netlink event socket
buffer size has been set to 8000000 bytes
nov 20 12:15:50 cfdev02 conntrack-tools[7991]: initialization completed
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Got
notification message from PID 7991 (MAINPID=8017)
nov 20 12:15:50 cfdev02 conntrack-tools[8017]: -- starting in daemon mode --
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Supervising
process 8017 which is not our child. We'll most likely not notice when
it exits.
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Got
notification message from PID 8017 (READY=1)
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Changed start -> running
nov 20 12:15:50 cfdev02 systemd[1]: conntrackd.service: Job
conntrackd.service/start finished, result=done
nov 20 12:15:50 cfdev02 systemd[1]: Started Conntrack Daemon.
nov 20 12:15:55 cfdev02 systemd[1]: conntrackd.service: Got
notification message from PID 8017 (WATCHDOG=1)
nov 20 12:16:00 cfdev02 systemd[1]: conntrackd.service: Got
notification message from PID 8017 (WATCHDOG=1)

[ 3) send SIGSTOP ]

nov 20 12:16:10 cfdev02 systemd[1]: conntrackd.service: Watchdog
timeout (limit 10s)!
nov 20 12:16:10 cfdev02 systemd[1]: conntrackd.service: Changed
running -> failed
nov 20 12:16:10 cfdev02 systemd[1]: conntrackd.service: Unit entered
failed state.
nov 20 12:16:10 cfdev02 systemd[1]: conntrackd.service: Failed with
result 'watchdog'.
nov 20 12:16:10 cfdev02 systemd[1]: conntrackd.service: Changed failed
-> auto-restart
nov 20 12:16:10 cfdev02 systemd[1]: conntrackd.service: cgroup is empty
nov 20 12:16:11 cfdev02 systemd[1]: conntrackd.service: Service
hold-off time over, scheduling restart.
nov 20 12:16:11 cfdev02 systemd[1]: conntrackd.service: Trying to
enqueue job conntrackd.service/restart/fail
nov 20 12:16:11 cfdev02 systemd[1]: conntrackd.service: Installed new
job conntrackd.service/restart as 2386
nov 20 12:16:11 cfdev02 systemd[1]: conntrackd.service: Enqueued job
conntrackd.service/restart as 2386
nov 20 12:16:11 cfdev02 systemd[1]: conntrackd.service: Scheduled restart job.
nov 20 12:16:11 cfdev02 systemd[1]: conntrackd.service: Changed
auto-restart -> dead
nov 20 12:16:11 cfdev02 systemd[1]: conntrackd.service: Job
conntrackd.service/restart finished, result=done
nov 20 12:16:11 cfdev02 systemd[1]: Stopped Conntrack Daemon.
nov 20 12:16:11 cfdev02 systemd[1]: conntrackd.service: Converting job
conntrackd.service/restart -> conntrackd.service/start
nov 20 12:16:11 cfdev02 systemd[1]: conntrackd.service: About to
execute: /bin/rm -f /var/lock/conntrackd.lock
nov 20 12:16:11 cfdev02 systemd[1]: conntrackd.service: Forked /bin/rm as 8047
nov 20 12:16:11 cfdev02 systemd[1]: conntrackd.service: Changed dead
-> start-pre
nov 20 12:16:11 cfdev02 systemd[1]: Starting Conntrack Daemon...
nov 20 12:16:11 cfdev02 systemd[8047]: conntrackd.service: Executing:
/bin/rm -f /var/lock/conntrackd.lock
nov 20 12:16:11 cfdev02 systemd[1]: conntrackd.service: Child 8047
belongs to conntrackd.service
nov 20 12:16:11 cfdev02 systemd[1]: conntrackd.service: Control
process exited, code=exited status=0
nov 20 12:16:11 cfdev02 systemd[1]: conntrackd.service: Got final
SIGCHLD for state start-pre.
nov 20 12:16:11 cfdev02 systemd[1]: conntrackd.service: About to
execute: /usr/sbin/conntrackd -d -C $CONFIG $OPTIONS
nov 20 12:16:11 cfdev02 systemd[1]: conntrackd.service: Forked
/usr/sbin/conntrackd as 8049
nov 20 12:16:11 cfdev02 systemd[1]: conntrackd.service: Changed
start-pre -> start
nov 20 12:16:11 cfdev02 systemd[8049]: conntrackd.service: Executing:
/usr/sbin/conntrackd -d -C /etc/conntrackd/conntrackd.conf
nov 20 12:16:11 cfdev02 conntrack-tools[8049]: disabling internal cache
nov 20 12:16:11 cfdev02 conntrack-tools[8049]: disabling external cache
nov 20 12:16:11 cfdev02 systemd[1]: conntrackd.service: Got
notification message from PID 8049 (MAINPID=8075)
nov 20 12:16:11 cfdev02 systemd[1]: conntrackd.service: Got
notification message from PID 8075 (READY=1)
nov 20 12:16:11 cfdev02 conntrack-tools[8075]: -- starting in daemon mode --
nov 20 12:16:11 cfdev02 systemd[1]: conntrackd.service: Changed start -> running
nov 20 12:16:11 cfdev02 systemd[1]: conntrackd.service: Job
conntrackd.service/start finished, result=done
nov 20 12:16:11 cfdev02 systemd[1]: Started Conntrack Daemon.
nov 20 12:16:16 cfdev02 systemd[1]: conntrackd.service: Got
notification message from PID 8075 (WATCHDOG=1)
nov 20 12:16:21 cfdev02 systemd[1]: conntrackd.service: Got
notification message from PID 8075 (WATCHDOG=1)

[ 4) send SIGSTOP ]

nov 20 12:16:31 cfdev02 systemd[1]: conntrackd.service: Watchdog
timeout (limit 10s)!
nov 20 12:16:31 cfdev02 systemd[1]: conntrackd.service: Changed
running -> stop-sigabrt
nov 20 12:16:31 cfdev02 systemd[1]: conntrackd.service: Child 8075
belongs to conntrackd.service
nov 20 12:16:31 cfdev02 systemd[1]: conntrackd.service: Main process
exited, code=killed, status=6/ABRT
nov 20 12:16:31 cfdev02 systemd[1]: conntrackd.service: Changed
stop-sigabrt -> failed
nov 20 12:16:31 cfdev02 systemd[1]: conntrackd.service: Unit entered
failed state.
nov 20 12:16:31 cfdev02 systemd[1]: conntrackd.service: Failed with
result 'signal'.
nov 20 12:16:31 cfdev02 systemd[1]: conntrackd.service: cgroup is empty

-- Package-specific info:

-- System Information:
Debian Release: stretch/sid
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 4.2.0-1-amd64 (SMP w/8 CPU cores)
Locale: LANG=es_ES.UTF-8, LC_CTYPE=es_ES.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages systemd depends on:
ii  adduser         3.113+nmu3
ii  libacl1         2.2.52-2
ii  libapparmor1    2.10-2+b1
ii  libaudit1       1:2.4.4-4
ii  libblkid1       2.27.1-1
ii  libc6           2.19-22
ii  libcap2         1:2.24-12
ii  libcap2-bin     1:2.24-12
ii  libcryptsetup4  2:1.6.6-5
ii  libgcrypt20     1.6.4-3
ii  libkmod2        21-1
ii  liblzma5        5.1.1alpha+20120614-2.1
ii  libmount1       2.27.1-1
ii  libpam0g        1.1.8-3.1
ii  libseccomp2     2.2.3-2
ii  libselinux1     2.3-2+b1
ii  libsystemd0     227-2
ii  mount           2.27.1-1
ii  sysv-rc         2.88dsf-59.2
ii  udev            227-2
ii  util-linux      2.27.1-1

Versions of packages systemd recommends:
ii  dbus            1.10.2-1
pn  libpam-systemd  <none>

Versions of packages systemd suggests:
pn  systemd-container  <none>
pn  systemd-ui         <none>

-- no debconf information

-- 
Arturo Borrero González



More information about the Pkg-systemd-maintainers mailing list