[Pkg-xen-devel] Bug#679533: Bug#679533: Traffic forwarding issue between Xen domU/dom0 and ovs

Hans van Kranenburg hans.van.kranenburg at mendix.com
Wed Aug 22 16:09:12 UTC 2012


On 07/25/2012 01:55 PM, Hans van Kranenburg wrote:
>
> I really hate the fact I haven't been able to reproduce the situation
> again...

So, it happened again yesterday. Exactly at the moment when one of my 
collegues started a newly set up domU on one of our servers, a virtual 
network interface of another (!) unrelated domU on the same dom0 began 
failing. The affected system is one of a set of VRRP routers, which is 
now taken out of order, and hangs around in a broken state.

The difference between the previous time and now is that the previous 
time it happened when live migrating a domU, but now it's happening to 
another domU than the one which was started, which is even more 
frightening, as it seems it could be any network interface on any domU, 
not only the backup vrrp router.

The newly set up domU and the one that was affected do not share any 
vlan etc.

The dom0 has the same version of packages installed as the other system 
I reported this bug first on.

ii  xen-linux-system-2.6.32-5-xen-amd64            2.6.32-45
ii  xen-hypervisor-4.0-amd64                       4.0.1-5.2
ii  openvswitch-switch                             1.4.0-1~mxbp60+1 
         Open vSwitch switch implementations
ii  openvswitch-datapath-module-2.6.32-5-xen-amd64 1.4.0-1~mxbp60+1 
         Open vSwitch Linux datapath kernel module

One of the three network interfaces inside the domU shows exactly the 
same behaviour I described before. There's no traffic possible from the 
outside/dom0 to the inside of the domU, except when there's traffic from 
the inside to the outside, suddenly old pings get ponged:

 From 188.122.91.211 icmp_seq=7642 Destination Host Unreachable
 From 188.122.91.211 icmp_seq=7643 Destination Host Unreachable
 From 188.122.91.211 icmp_seq=7644 Destination Host Unreachable
 From 188.122.91.211 icmp_seq=7645 Destination Host Unreachable
64 bytes from 188.122.91.194: icmp_req=7150 ttl=63 time=501360 ms
64 bytes from 188.122.91.194: icmp_req=7151 ttl=63 time=500351 ms
64 bytes from 188.122.91.194: icmp_req=7152 ttl=63 time=499343 ms
64 bytes from 188.122.91.194: icmp_req=7153 ttl=63 time=498335 ms
64 bytes from 188.122.91.194: icmp_req=7154 ttl=63 time=497327 ms
64 bytes from 188.122.91.194: icmp_req=7155 ttl=63 time=496319 ms
64 bytes from 188.122.91.194: icmp_req=7156 ttl=63 time=495311 ms
64 bytes from 188.122.91.194: icmp_req=7157 ttl=63 time=494303 ms
64 bytes from 188.122.91.194: icmp_req=7158 ttl=63 time=493295 ms
64 bytes from 188.122.91.194: icmp_req=7159 ttl=63 time=492287 ms
64 bytes from 188.122.91.194: icmp_req=7160 ttl=63 time=491279 ms
64 bytes from 188.122.91.194: icmp_req=7161 ttl=63 time=490271 ms
64 bytes from 188.122.91.194: icmp_req=7162 ttl=63 time=489263 ms
64 bytes from 188.122.91.194: icmp_req=7163 ttl=63 time=488255 ms
64 bytes from 188.122.91.194: icmp_req=7164 ttl=63 time=487247 ms
64 bytes from 188.122.91.194: icmp_req=7165 ttl=63 time=486239 ms
64 bytes from 188.122.91.194: icmp_req=7166 ttl=63 time=485231 ms
64 bytes from 188.122.91.194: icmp_req=7167 ttl=63 time=484223 ms
64 bytes from 188.122.91.194: icmp_req=7168 ttl=63 time=483215 ms
64 bytes from 188.122.91.194: icmp_req=7169 ttl=63 time=482207 ms
64 bytes from 188.122.91.194: icmp_req=7170 ttl=63 time=481199 ms
64 bytes from 188.122.91.194: icmp_req=7171 ttl=63 time=480191 ms
64 bytes from 188.122.91.194: icmp_req=7172 ttl=63 time=479183 ms
64 bytes from 188.122.91.194: icmp_req=7173 ttl=63 time=478175 ms
64 bytes from 188.122.91.194: icmp_req=7174 ttl=63 time=477167 ms
64 bytes from 188.122.91.194: icmp_req=7175 ttl=63 time=476159 ms
64 bytes from 188.122.91.194: icmp_req=7176 ttl=63 time=475151 ms
64 bytes from 188.122.91.194: icmp_req=7177 ttl=63 time=474143 ms
64 bytes from 188.122.91.194: icmp_req=7178 ttl=63 time=473135 ms
64 bytes from 188.122.91.194: icmp_req=7179 ttl=63 time=472127 ms
64 bytes from 188.122.91.194: icmp_req=7180 ttl=63 time=471119 ms
64 bytes from 188.122.91.194: icmp_req=7181 ttl=63 time=470111 ms
64 bytes from 188.122.91.194: icmp_req=7182 ttl=63 time=469103 ms
64 bytes from 188.122.91.194: icmp_req=7183 ttl=63 time=468095 ms
64 bytes from 188.122.91.194: icmp_req=7646 ttl=63 time=3705 ms
64 bytes from 188.122.91.194: icmp_req=7647 ttl=63 time=2697 ms
64 bytes from 188.122.91.194: icmp_req=7648 ttl=63 time=1689 ms
64 bytes from 188.122.91.194: icmp_req=7649 ttl=63 time=689 ms
64 bytes from 188.122.91.194: icmp_req=7650 ttl=63 time=698 ms
64 bytes from 188.122.91.194: icmp_req=7651 ttl=63 time=706 ms
64 bytes from 188.122.91.194: icmp_req=7652 ttl=63 time=4704 ms
64 bytes from 188.122.91.194: icmp_req=7653 ttl=63 time=3696 ms
64 bytes from 188.122.91.194: icmp_req=7654 ttl=63 time=2688 ms
64 bytes from 188.122.91.194: icmp_req=7655 ttl=63 time=1680 ms
64 bytes from 188.122.91.194: icmp_req=7656 ttl=63 time=672 ms
64 bytes from 188.122.91.194: icmp_req=7657 ttl=63 time=672 ms
 From 188.122.91.211 icmp_seq=7687 Destination Host Unreachable
 From 188.122.91.211 icmp_seq=7688 Destination Host Unreachable
 From 188.122.91.211 icmp_seq=7689 Destination Host Unreachable

Funny thing is that it seems there are 34 icmp echo packets that were 
still queued to enter into the domU (7150-7183), after which the next 
got lost. Why 34? Does this have to do with an internal buffer somewhere?

When generating traffic from the domU to the outside world over the 
interface...

beheer at bert.dmz.sdc.mendix.net:~ 1-$ ping -c 3 188.122.91.193
PING 188.122.91.193 (188.122.91.193) 56(84) bytes of data.
64 bytes from 188.122.91.193: icmp_req=1 ttl=64 time=3.69 ms
64 bytes from 188.122.91.193: icmp_req=2 ttl=64 time=0.218 ms
64 bytes from 188.122.91.193: icmp_req=3 ttl=64 time=0.187 ms

--- 188.122.91.193 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.187/1.366/3.695/1.646 ms

...traffic from the outside shows the same pattern. It's only let in 
when the 'door is opened' for outgoing packets.

64 bytes from 188.122.91.194: icmp_req=2120 ttl=63 time=5499 ms <-
64 bytes from 188.122.91.194: icmp_req=2121 ttl=63 time=4499 ms
64 bytes from 188.122.91.194: icmp_req=2122 ttl=63 time=3499 ms
64 bytes from 188.122.91.194: icmp_req=2123 ttl=63 time=2499 ms
64 bytes from 188.122.91.194: icmp_req=2124 ttl=63 time=1499 ms
64 bytes from 188.122.91.194: icmp_req=2125 ttl=63 time=499 ms  <-
64 bytes from 188.122.91.194: icmp_req=2126 ttl=63 time=499 ms
64 bytes from 188.122.91.194: icmp_req=2127 ttl=63 time=498 ms
 From 188.122.91.211 icmp_seq=2148 Destination Host Unreachable  <-
 From 188.122.91.211 icmp_seq=2149 Destination Host Unreachable
 From 188.122.91.211 icmp_seq=2150 Destination Host Unreachable
 From 188.122.91.211 icmp_seq=2151 Destination Host Unreachable

At the dom0:

# xm network-list bert.dmz.sdc.mendix.net
Idx BE     MAC Addr.     handle state evt-ch tx-/rx-ring-ref BE-path
0   0  00:16:3e:01:00:24    0     4      23    768  /769 
/local/domain/0/backend/vif/49/0
1   0  00:16:3e:01:01:c2    1     4      25    1281 /1282 
/local/domain/0/backend/vif/49/1
2   0  00:16:3e:01:02:fa    2     4      26    2304 /2305 
/local/domain/0/backend/vif/49/2
3   0  00:16:3e:01:03:82    3     4      27    2306 /2307 
/local/domain/0/backend/vif/49/3

I can keep this system in the current state for some time without much 
hassle, it's one of our office routers.

Any ideas for further diagnostics I can do right now?

-- 
Hans van Kranenburg - System / Network Engineer
+31 (0)10 2760434 | hans.van.kranenburg at mendix.com | www.mendix.com



More information about the Pkg-xen-devel mailing list