[pkg-go] Bug#1110249: podman corrupted its internal state or something?

Thu Sep 25 01:25:33 BST 2025

Control: tag -1 upstream moreinfo

Hi Ian,

I wish I had seen this earlier and been able to respond more swiftly.
It also seems that this has happened at least twice. Can you quantify
how often that approximately happens?

Ian Jackson <ijackson at chiark.greenend.org.uk> writes:

> Package: podman
> Version: 4.3.1+ds1-8+deb12u1+b1
>
> The tag2upload system is using podman on tag2upload-builder-01 to
> isolate one source package build from the previous one.  Recently the
> service failed.  I investigated and found that podman was saying this:
>
> ERRO[0000] invalid internal status, try resetting the pause process with "podman system migrate": could not find any running process: no such process 
>
> I don't know such an invalid internal state could have occurred.  The
> machine is a DSA-managed host and the account is `tag2upload-builder`,
> a service account.

I haven't seen that myself so far. Unless podman is running as uid==0,
then you are using rootless mode. That makes sense for DSA-managed host.

> The containers are used exclusively (at least in normal operation) via
> autopkgtest-virt-podman.  They are created with a custom script, but
> that runs weekly, but on Sundays, so doesn't seem implicated.  I
> haven't been doing any admin work that might be related and I think
> Sean hasn't either.

I'm also curious how tag2upload is interacting / using podman. Any
chance you can point me to the relevant source code of tag2upload? I
understand from the above that there is no direct interaction with
podman, but only indirectly with autopkgtest-virt-podman. I'm familiar
with this tool in the context of package testing. That doesn't appear to
match with the purpose of the tag2upload infrastructure. What am I
missing here?

> The cause doesn't seem to have been a podman update either.
> /var/log/dpkg.log* on the affected system shows no change to podman
> since 2025-05-17.
>
> I believe our containers are running in a "rootless" mode.  We can
> probably provide more information about the configuration if you can
> tell us how to obtain it.

To diagnose, please provide the output of the command `podman system info`.

See: https://manpages.debian.org/trixie/podman/podman-system-reset.1.en.html

The canonical way to reset is `podman system reset`.

See: https://manpages.debian.org/trixie/podman/podman-system-reset.1.en.html

Looking at the source code, it seems this error is coming from here:

https://sources.debian.org/src/podman/5.6.1+ds2-2/pkg/domain/infra/abi/system_linux.go?hl=102#L93-L102

I'm surprised to see that podman is recommending to use `podman system
migrate` for recovery. The thing is, libpod/podman has been changing its
on-disk format for container storage state a couple of times across
major upstream versions in the past, and the command `podman system
migrate` migrates from the old to new storage formats. The fact that the
message includes the error "no such process" is really puzzling to me:
That indicates an issue with identifying or starting the "pause" process
(think of it as a 'root' container that associates namespaces,
etc. across containers within the same Pod, see
https://www.ianlewis.org/en/almighty-pause-container for a better
explanation).

>
> Without really knowing anything about what was wrong, but wanting to
> restore service, I ran the command suggested:
>
> tag2upload-builder at tag2upload-builder-01:~$ podman container ls
> ERRO[0000] invalid internal status, try resetting the pause process with "podman system migrate": could not find any running process: no such process 
> tag2upload-builder at tag2upload-builder-01:~$ podman system migrate
> stopped 714e0e504f2ccc42e7e31222ca20ccbe1fa68bd985b9d980a80f0240e8ae8c8b
> stopped dc6da82beaa9ff6c0f70d8854c18332a7a4875228f8d6771dc00aa36569148b4
> tag2upload-builder at tag2upload-builder-01:~$ podman container ls
> CONTAINER ID  IMAGE       COMMAND     CREATED     STATUS      PORTS       NAMES
> tag2upload-builder at tag2upload-builder-01:~$ podman image ls
> REPOSITORY                          TAG         IMAGE ID      CREATED       SIZE
> localhost/autopkgtest/debian        bookworm    9e23b4f3cdfe  43 hours ago  985 MB
> localhost/autopkgtest/amd64/debian  bookworm    9e23b4f3cdfe  43 hours ago  985 MB
> <none>                              <none>      54738866ce8a  8 days ago    985 MB
> <none>                              <none>      d9dc91a8bf2c  8 weeks ago   981 MB
> tag2upload-builder at tag2upload-builder-01:~$ 
>
> Now the service is working again.  But, I don't think it should have
> broken down.

I agree, it really shouldn't. I see that there are apparently two
containers being stopped. Any idea what those ideas are and what they
are supposed to do? Are they supposed to be still running?

If you don't rely on long-lived containers running during your
autopkgtest-virt-podman invocations, maybe you can invoke a `podman
system reset -f` before the autopkgtest-virt-podman invocation. Note
that this will clear out all local caches, containers and images. As
such, I'd expect it to also terminate any running
pods/containers. Dependening on how you use autopkgtest-virt-podman,
this may or may not be an acceptable workaround for increasing system
stability.

>
> I don't have much more information afraid.  I do have a ps listing of
> the system in the failed state.  If it were to happen again, and we
> knew what other information to collect, we could do that.

What puzzles me most is that `podman system migrate` appears to solve
this issue. That indicates to me that there is something in
~/.local/share/containers/storage that's triggering it.

I wonder if you could backup that directory when the system is in a
failed state, recover it using `podman system migrate` or `podman system
reset`, and try to reproduce the issue after restoring the tarball. If
you can, then that tarball would be very useful for further diagnostics.

Best of luck!
-rt