Bug#934584: IPMasquerade=yes uses iptables (not nftables)
Michael Biebl
biebl at debian.org
Mon Aug 12 12:26:38 BST 2019
Am 12.08.19 um 10:52 schrieb Trent W. Buck:
> Package: systemd
> Version: 241-5
> Severity: normal
> File: /lib/systemd/network/80-container-ve.network
>
> Debian 10 defaults to nftables:
>
> https://www.debian.org/releases/stable/amd64/release-notes/ch-whats-new.en.html#nftables
>
> ...but systemd doesn't for IPMasquerade=, see below.
>
>
>
> AFAICT the default behaviour of "machinectl start my-new-container" is
> to create a veth interface between the host and the container, with a
> private /28 IPv4 address range shared between them, masquerading
> (i.e. source NAT), and no IPv6 RA(?!). This is governed by
>
> /lib/systemd/network/80-container-ve.network
>
> AFAICT this is set up using *legacy* iptables, not using nftables.
> Here is a system with sshguard installed (uses native nftables), and two systemd containers running:
>
> bash5# iptables-save
> # Table `sshguard' is incompatible, use 'nft' tool.
> # Warning: iptables-legacy tables present, use iptables-legacy-save to see them
>
> bash5# iptables-legacy-save
> # Generated by iptables-save v1.8.3 on Mon Aug 12 18:01:37 2019
> *nat
> :PREROUTING ACCEPT [7781:686652]
> :INPUT ACCEPT [7749:685024]
> :OUTPUT ACCEPT [2108:206384]
> :POSTROUTING ACCEPT [2108:206384]
> -A POSTROUTING -s 10.0.0.16/28 -j MASQUERADE
> -A POSTROUTING -s 10.0.0.0/28 -j MASQUERADE
> COMMIT
> # Completed on Mon Aug 12 18:01:37 2019
>
> bash5# nft list ruleset
> table ip sshguard {
> set attackers {
> type ipv4_addr
> flags interval
> }
>
> chain blacklist {
> type filter hook input priority filter - 10; policy accept;
> ip saddr @attackers drop
> }
> }
> table ip6 sshguard {
> set attackers {
> type ipv6_addr
> flags interval
> }
>
> chain blacklist {
> type filter hook input priority filter - 10; policy accept;
> ip6 saddr @attackers drop
> }
> }
>
>
> I *think* the nftables people said mixing legacy (xtables) and
> nftables on the same system is Unsupportedā¢ and Bad Thingsā¢ will happen,
> but I can't find a citation right now.
>
> I can do "busybox ping example.com" within the container, which
> implies systemd's MASQUERADE rules *are* working.
>
> This test nftables ruleset also appears to be working
> (while systemd's legacy ruleset are present):
>
> bash5# nft 'table a { chain b { type nat hook postrouting priority srcnat; }; chain c { type nat hook prerouting priority dstnat; tcp dport 80 counter log; }; }'
> bash5# nft list ruleset
> [...]
> table ip a {
> chain b {
> type nat hook postrouting priority srcnat; policy accept;
> }
>
> chain c {
> type nat hook prerouting priority dstnat; policy accept;
> tcp dport 80 counter packets 1 bytes 60 log
> }
> }
>
> ...so, I may be freaking out about nothing.
>
> At a minimum, the legacy rules created by systemd are "invisible" to
> an admin looking directly at "nft list ruleset", which is the only
> place they will look if they expect the system to be nftables native.
> That violates the principle of least surprise.
>
> Is it possible to make systemd use nftables instead of iptables, when the system is so configured?
> I think this would have Just Happened automatically if systemd was actually running iptables(8) or iptables-restore(8), but
> I think it is instead talking direct to the kernel in src/shared/firewall-util.c:fw_add_masquerade() ?
>
>
> PS: I don't know how to do it directly in C, but from nft(8), your MASQUERADE rules would look like this (comments optional):
>
> #!/usr/sbin/nft --file
> table ip systemd-container-blah-blah {
> chain postrouting {
> type nat hook postrouting priority srcnat
> policy accept
> ip saddr 10.0.0.0/28 masquerade comment "for systemd-nspawn at my-new-container"
> ip saddr 10.0.0.16/28 masquerade comment "for systemd-nspawn at my-other-container"
> }
> chain prerouting {
> type nat hook prerouting priority dstnat
> policy accept
> continue comment "Apparently the postrouting chain won't work unless unless this prerouting chain also exists"
> }
> }
>
> The "modern" way to do it would be to put all the IP ranges into a
> named set, so that you just add/remove from the set, not from the
> rules. This is analogous to "iptables -m set --help" and ipset(8),
> which you could already have used for efficiency when you have
> hundreds of containers:
>
> #!/usr/sbin/nft --file
> table ip systemd-container-blah-blah {
> set systemd-container-masquerade-ranges { type ipv4_addr; flags interval; }
> chain postrouting {
> type nat hook postrouting priority srcnat
> policy accept
> ip saddr @systemd-container-masquerade-ranges masquerade comment "for systemd-nspawn@"
> }
> chain prerouting {
> type nat hook prerouting priority dstnat
> policy accept
> continue comment "Apparently the postrouting chain won't work unless unless this prerouting chain also exists"
> }
> }
>
> Then when a container comes up, do
>
> nft 'add element ip systemd-container-blah-blah systemd-container-masquerade-ranges { 10.0.0.0/24 }'
>
> In fact, this is exactly what sshguard is doing for its filter blacklist.
src/shared/firewall-util.* uses libiptc (which in turn uses iptables)
ttbomk, mixing nftables and iptables is supported, otherwise we'd have
huge problems in buster (e.g. firewalld was explicitly switched back to
use iptables as quite a few components are not yet nft ready, like
libvirt and other container managers like docker).
That said, I've CCed Arturo, maybe he can chime in here.
To me this sounds more like a wishlist bug to get systemd ported from
libiptc to libnftables and that should be filed and addressed upstream.
Michael
--
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?
More information about the Pkg-systemd-maintainers
mailing list