[pkg-lxc-devel] Bug#835529: lxc: default debootstrap minimal install hangs waiting for dhcp
Dryden Personalis
bugs at xenhideout.nl
Sun Oct 30 18:06:42 UTC 2016
Evgeni Golov schreef op 30-10-2016 16:53:
> Given that we
> 1) ship a config that works just fine by default (but does not have
> networking at all)
> 2) provide an easy way to enable DHCP on a bridge
> do you think this report can be closed, or do you see any more room
> for improvement here?
Thank you for responding. I was indeed mistaken about the default
config.
There is a page on the wiki that mentions lxc-net and the option you
mention.
However. There is scarcely any documentation on it.
lxc-net seems rather "oblique", you don't know what it is or what it
does.
At some point I did check out https://github.com/CameronNemo/lxc-net but
that was way after. It says it has been obsoleted by inclusing in LXC.
However when you look at the sources at that repo there is no indication
of any DHCP.
The scripts are (in that repo) also rather ... minimal. It is not the
complete set of masquerading rules you'd need for a truly functioning
system.
I certainly cannot find any documentation on lxc-net directly.
So I guess the improvement would be that your point (2) actually reaches
people...
All the documentation you can find instructs you to set the network
configuration in your lxc config for the container. But then you have
that hanging system.
In order to avoid that you have to use that lxc-net, but that is much
more oblique and harder to find, so you won't do that. So the average
newcomer will run into that problem I described. No one is going to run
their container without networking. The first thing you do is to set up
networking and then see if you can connect to it.
What I did eventually was to create my own bridge networking. It took a
lot of time. I wrote a wiki article about it (not the time, but the
instructions on how to do it ;-)). I put in some firewall rules to get
the full loopback functionality and so on. So I'm still not using
lxc-net.
So what was the point where I troubleshooted the network? SystemD puts
me on the wrong track by saying that there is no timeout. That's one
thing that can be improved. See I *did* attach to the console or I would
never even have seen those messages.
I was just impatient enough to reboot the LXC container or proceed with
my next attempt, prior to the dhcp script ever having finished, because
SystemD told me it *wouldn't ever finish*.
If you were to give the systemd unit file for that (which is not related
to LXC) a timeout value (explicitly) of say 20 seconds I wouldn't have
gotten into that mess. This miscommunication causes you to spend more
time on it than you otherwise would have.
SystemD communicates something that it does not actually know, and that
is really the biggest issue here for people first running into this. How
on earth are you supposed to know that SystemD is lying to ya? But then
again, that only works to realize that you need to change your setting
(in the container).
The issue then remains that there are two classes of people:
- those with DHCP in the network who depend on the setting to be dhcp
and who subsequently do not set a fixed IP address
- those with no DHCP in the network and who do set a fixed IP address
(in the container config).
It seems a clear separation of people due to the config, something that
could be treated as a defining characteristic.
I don't know how LXC can change that but I only see 3 solutions:
- don't set it to DHCP which you say will offend the other half of the
people and I guess for a general home computer that is logical but you'd
be flabbergasted if your network-less (dhcp-less) computer system or
network would hang for 15+ seconds or longer booting your computer the
first time, right.
Any computer not on a dhcp network now hangs while booting? That's not
good is it. You can only solve that in one of two ways:
- create a shorter timeout for the dhcp thing (or don't wait for
networking to come on before you give a login prompt).
- or, allow systemd to communicate more clearly that it is not gonna
wait forever for ya.
But really the strange thing from a user point of view is that you
configure the network in LXC and then *it doesn't work* because it is
not evident that the inner container is going to use DHCP by default.
But LXC doesn't determine the inner system. It could be anything right,
not just Debian. It could be anything that does its networking in
whatever way, so it is up to the (Debian) LXC people to determine that
it should work with Debian, it's not like LXC can handle that it itself.
So the only solution comes down to providing that DHCP server by default
(as LXC) instead of waiting for the user to select to use lxc-net for
that.
That is actually what you expect as a user. You expect LXC to do the
DHCP thing when you configure the networking inside (the container
config).
So I would assume that the answer would need to lie in having LXC start
that DHCP server when you configure a fixed IP address and maybe that is
not perfect but it follows the model of what needs to happen anyway:
* you define a static address ---> inner container config must be set to
manual/static OR dhcp must exist.
* you don't define a static address ---> nothing needs to happen because
DHCP will work or you expect yourself to already have it
So the biggest problem at this point is that the LXC inner container
config as set in the external configuration file (for the container) is
completely disjunct from any lxc-net business as far as the
configuration model goes.
LXC-NET apparently evolved as a standalone thing and apparently it is
still is this way.
But people do not want to use lxc-net if they can't see what it is going
to do for them. I don't know ... I have never come across the scripts on
my computer (VPS).
So I can only suggest this thing and these are the 3 solutions I
mentioned, perhaps? :P.
1. lxc-net must be better documented so that people do not set up
networking without it (but some may still not want to use it)
2. the dhcp "server" must be started instantly and automatically when a
static IP address has been configured (and there could be another
configuration flag to control that) and it should not be dependent on
another (external) configuration file like /etc/default/lxc (which
doens't even exist).
And the third solution was to change the debian config to manual
configuration so that it doesn't override the static IP setting of the
container externally (the config of the container on the host).
So if you say the 3rd option is unavailable (and it should be, I guess)
that leaves:
* clear documentation that setting the network in LXC config file (for
the container) is not enough.
* automatically starting DHCP on static config
* make lxc-net more available, more accessible, and more transparent.
I *saw* the reference on the Debian Wiki:
https://wiki.debian.org/LXC/SimpleBridge
However, the documentation is *so minimal* and the lxc-net service
*doesn't exist*.
But hold on, you were mentioning 2.0? The Debian version in Jessie is
1.0.6-6+deb8u2.
That one in Stretch is 2.0.5-1... and LXC mentions that 1.1 has end of
life (but 1.0 hasn't) but it seems they urge everyone to upgrade anyway?
My bug was against Jessie, I'm not sure I mentioned that. That means
everyone in Jessie is going to keep stuck with this behaviour?
I guess all we can do then is make the documentation more explicit on
the Wiki? I will seek to improve it if I have time to update on this
'anomaly' or this current status quo ;-).
Thanks for responding, bye.
More information about the Pkg-lxc-devel
mailing list