[Pkg-xen-devel] Are squeeze Xen dom0 kernels subject to this the same IPv6 GSO problem?

Andy Smith andy at strugglers.net
Thu Dec 1 17:41:27 UTC 2011


Hi,

I have three squeeze servers running:

ii  linux-image-2.6.32-5-xen-amd64          2.6.32-38 Linux 2.6.32 for 64-bit PCs, Xen dom0 support
ii  xen-hypervisor-4.0-amd64                4.0.1-4 The Xen Hypervisor on AMD64

All three servers have Intel gigabit NICs, but one server uses the
e1000e driver and the other two use the igb driver.

They've been in production for around 6 months now and it seems like
somewhat embarrassingly we've only just now discovered a problem
with IPv6 performance on the two servers with the igb driver.

The problem manifests itself as awful TCP performance to a Xen domU,
on the order of 15-30KB/sec data transfer. Doing the same data
transfer from the server dom0 itself does not show the same issue,
and the expected tens of MB/sec data transfer is achieved.

Here's an example tcpdump of when the problem is occurring:

# tcpdump -vpni bond0 'host 2a00:801:0:11::2'
[...]
23:59:00.672905 IP6 (hlim 55, next-header TCP (6) payload length: 4316) 2a00:801:0:11::2.80 > 2001:db8:1f1:f240::2.35241: Flags [P.], cksum 0x62d3 (incorrect -> 0x1c84), seq 15709:19993, ack 127, win 9, options [nop,nop,TS val 1771553020 ecr 1086205224], length 4284
23:59:00.672987 IP6 (hlim 64, next-header ICMPv6 (58) payload length: 1240) 2001:db8:0:1f1::8 > 2a00:801:0:11::2: [icmp6 sum ok] ICMP6, packet too big, length 1240, mtu 1500
23:59:00.673161 IP6 (hlim 63, next-header TCP (6) payload length: 32) 2001:db8:1f1:f240::2.35241 > 2a00:801:0:11::2.80: Flags [.], cksum 0x24e4 (correct), ack 17137, win 716, options [nop,nop,TS val 1086205237 ecr 1771553020], length 0
23:59:00.725659 IP6 (hlim 55, next-header TCP (6) payload length: 1460) 2a00:801:0:11::2.80 > 2001:db8:1f1:f240::2.35241: Flags [.], cksum 0x16de (correct), seq 19993:21421, ack 127, win 9, options [nop,nop,TS val 1771553033 ecr 1086205237], length 1428
23:59:00.725940 IP6 (hlim 63, next-header TCP (6) payload length: 44) 2001:db8:1f1:f240::2.35241 > 2a00:801:0:11::2.80: Flags [.], cksum 0x25f5 (correct), ack 17137, win 716, options [nop,nop,TS val 1086205250 ecr 1771553020,nop,nop,sack 1 {19993:21421}], length 0
[...]
23:59:01.188463 IP6 (hlim 63, next-header TCP (6) payload length: 32) 2001:db8:1f1:f240::2.35241 > 2a00:801:0:11::2.80: Flags [.], cksum 0x0105 (correct), ack 25705, win 1073, options [nop,nop,TS val 1086205366 ecr 1771553149], length 0
23:59:01.240946 IP6 (hlim 55, next-header TCP (6) payload length: 2888) 2a00:801:0:11::2.80 > 2001:db8:1f1:f240::2.35241: Flags [P.], cksum 0x5d3f (incorrect -> 0xf9ef), seq 25705:28561, ack 127, win 9, options [nop,nop,TS val 1771553162 ecr 1086205366], length 2856
23:59:01.241040 IP6 (hlim 64, next-header ICMPv6 (58) payload length: 1240) 2001:db8:0:1f1::8 > 2a00:801:0:11::2: [icmp6 sum ok] ICMP6, packet too big, length 1240, mtu 1500

2a00:801:0:11::2 is speedtest.tele2.net which helpfully hosts files
like http://speedtest.tele2.net/100MB.zip for testing purposes. The
above is the result of me using wget to download that file from a
domU on this server. The domU is at 2001:db8:1f1:f240::2 and the
dom0 is at 2001:db8:0:1f1::8.

What I'm noticing is the occasional incorrect checksum and "ICMPv6
packet too big" messages seen above around 23:59:00.672905 and
23:59:01.240946 after a packet of length 2856.  These do not occur
on the server with the e1000e driver, where all the packets top out
at 1428. They always occur on the two servers with the igb driver
where the poor throughput is observed.

I'm wondering if I am hitting something like this:

http://amailbox.org/mailarchive/linux-kvm/2010/2/2/6257539/thread

I have played with disabling and enabling GSO and checksums on every
interface I can, both in dom0 and domUs, and that makes no
difference.

Can anyone confirm that that is the issue here? I don't at present
have another machine with igb NICs around to test this.

Looking at linux-source-2.6.32 on squeeze, it does not have this
patch:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=8e1e8a4779cb23c1d9f51e9223795e07ec54d77a

although I notice that this commit also touches e1000e where I am
not currently having any problems.

Any ideas?

Cheers,
Andy



More information about the Pkg-xen-devel mailing list