[pkg-lxc-devel] Bug#835529: lxc: default debootstrap minimal install hangs waiting for dhcp

Dryden Personalis bugs at xenhideout.nl
Sun Oct 30 18:06:42 UTC 2016


Evgeni Golov schreef op 30-10-2016 16:53:

> Given that we
> 1) ship a config that works just fine by default (but does not have
> networking at all)
> 2) provide an easy way to enable DHCP on a bridge
> do you think this report can be closed, or do you see any more room
> for improvement here?

Thank you for responding. I was indeed mistaken about the default 
config.

There is a page on the wiki that mentions lxc-net and the option you 
mention.

However. There is scarcely any documentation on it.

lxc-net seems rather "oblique", you don't know what it is or what it 
does.

At some point I did check out https://github.com/CameronNemo/lxc-net but 
that was way after. It says it has been obsoleted by inclusing in LXC.

However when you look at the sources at that repo there is no indication 
of any DHCP.

The scripts are (in that repo) also rather ... minimal. It is not the 
complete set of masquerading rules you'd need for a truly functioning 
system.

I certainly cannot find any documentation on lxc-net directly.

So I guess the improvement would be that your point (2) actually reaches 
people...

All the documentation you can find instructs you to set the network 
configuration in your lxc config for the container. But then you have 
that hanging system.

In order to avoid that you have to use that lxc-net, but that is much 
more oblique and harder to find, so you won't do that. So the average 
newcomer will run into that problem I described. No one is going to run 
their container without networking. The first thing you do is to set up 
networking and then see if you can connect to it.

What I did eventually was to create my own bridge networking. It took a 
lot of time. I wrote a wiki article about it (not the time, but the 
instructions on how to do it ;-)). I put in some firewall rules to get 
the full loopback functionality and so on. So I'm still not using 
lxc-net.

So what was the point where I troubleshooted the network? SystemD puts 
me on the wrong track by saying that there is no timeout. That's one 
thing that can be improved. See I *did* attach to the console or I would 
never even have seen those messages.

I was just impatient enough to reboot the LXC container or proceed with 
my next attempt, prior to the dhcp script ever having finished, because 
SystemD told me it *wouldn't ever finish*.

If you were to give the systemd unit file for that (which is not related 
to LXC) a timeout value (explicitly) of say 20 seconds I wouldn't have 
gotten into that mess. This miscommunication causes you to spend more 
time on it than you otherwise would have.

SystemD communicates something that it does not actually know, and that 
is really the biggest issue here for people first running into this. How 
on earth are you supposed to know that SystemD is lying to ya? But then 
again, that only works to realize that you need to change your setting 
(in the container).

The issue then remains that there are two classes of people:

- those with DHCP in the network who depend on the setting to be dhcp 
and who subsequently do not set a fixed IP address
- those with no DHCP in the network and who do set a fixed IP address 
(in the container config).

It seems a clear separation of people due to the config, something that 
could be treated as a defining characteristic.

I don't know how LXC can change that but I only see 3 solutions:

- don't set it to DHCP which you say will offend the other half of the 
people and I guess for a general home computer that is logical but you'd 
be flabbergasted if your network-less (dhcp-less) computer system or 
network would hang for 15+ seconds or longer booting your computer the 
first time, right.

Any computer not on a dhcp network now hangs while booting? That's not 
good is it. You can only solve that in one of two ways:

- create a shorter timeout for the dhcp thing (or don't wait for 
networking to come on before you give a login prompt).
- or, allow systemd to communicate more clearly that it is not gonna 
wait forever for ya.

But really the strange thing from a user point of view is that you 
configure the network in LXC and then *it doesn't work* because it is 
not evident that the inner container is going to use DHCP by default.

But LXC doesn't determine the inner system. It could be anything right, 
not just Debian. It could be anything that does its networking in 
whatever way, so it is up to the (Debian) LXC people to determine that 
it should work with Debian, it's not like LXC can handle that it itself.

So the only solution comes down to providing that DHCP server by default 
(as LXC) instead of waiting for the user to select to use lxc-net for 
that.

That is actually what you expect as a user. You expect LXC to do the 
DHCP thing when you configure the networking inside (the container 
config).

So I would assume that the answer would need to lie in having LXC start 
that DHCP server when you configure a fixed IP address and maybe that is 
not perfect but it follows the model of what needs to happen anyway:

* you define a static address ---> inner container config must be set to 
manual/static OR dhcp must exist.

* you don't define a static address ---> nothing needs to happen because 
DHCP will work or you expect yourself to already have it

So the biggest problem at this point is that the LXC inner container 
config as set in the external configuration file (for the container) is 
completely disjunct from any lxc-net business as far as the 
configuration model goes.

LXC-NET apparently evolved as a standalone thing and apparently it is 
still is this way.

But people do not want to use lxc-net if they can't see what it is going 
to do for them. I don't know ... I have never come across the scripts on 
my computer (VPS).

So I can only suggest this thing and these are the 3 solutions I 
mentioned, perhaps? :P.

1. lxc-net must be better documented so that people do not set up 
networking without it (but some may still not want to use it)
2. the dhcp "server" must be started instantly and automatically when a 
static IP address has been configured (and there could be another 
configuration flag to control that) and it should not be dependent on 
another (external) configuration file like /etc/default/lxc (which 
doens't even exist).

And the third solution was to change the debian config to manual 
configuration so that it doesn't override the static IP setting of the 
container externally (the config of the container on the host).

So if you say the 3rd option is unavailable (and it should be, I guess) 
that leaves:

* clear documentation that setting the network in LXC config file (for 
the container) is not enough.
* automatically starting DHCP on static config
* make lxc-net more available, more accessible, and more transparent.

I *saw* the reference on the Debian Wiki: 
https://wiki.debian.org/LXC/SimpleBridge

However, the documentation is *so minimal* and the lxc-net service 
*doesn't exist*.

But hold on, you were mentioning 2.0? The Debian version in Jessie is 
1.0.6-6+deb8u2.

That one in Stretch is 2.0.5-1... and LXC mentions that 1.1 has end of 
life (but 1.0 hasn't) but it seems they urge everyone to upgrade anyway?

My bug was against Jessie, I'm not sure I mentioned that. That means 
everyone in Jessie is going to keep stuck with this behaviour?

I guess all we can do then is make the documentation more explicit on 
the Wiki? I will seek to improve it if I have time to update on this 
'anomaly' or this current status quo ;-).

Thanks for responding, bye.



More information about the Pkg-lxc-devel mailing list