[Babel-users] A happy babel user: re6st
jp at nexedi.com
jp at nexedi.com
Wed Jan 23 11:30:13 UTC 2013
Very often, people complain on mailing lists. Today, I would like to say thank you.
Last summer, we have implemented a wired mesh network system based on babel which can provide stable IPv6 to all nodes of a decentralized cloud operation system. It works great.
Thank you babel.
If you are in a hurry, here is the code: http://git.erp5.org/gitweb/re6stnet.git
What you can do with that code: provide reliable IPv6 to the world
If you think re6st is useful, please feel free to add it to the list of babel links.
+33 629 02 44 25
1- The problem to solve
We implemented a couple of years ago (http://bit.ly/SWVQlx) a Cloud system called SlapOS (http://www.slapos.org) which relies on servers located in people's home and now also in offices, data centers or even your smartphone, tablet or TV. SlapOS is now used by some large corporations. One of its main applications is to create a disaster recovery cloud which can resist any force majeur event (ex. war, terrorism, political instability, software bug) which does affect traditional clouds from time to time (http://iwgcr.org). It is also much cheaper and environmental friendly.
SlapOS relies on IPv6 in order to interconnect all nodes. Each node is allocated usually 100 global IPv6 addresses or more.
This is where our problem started: all IPv6 providers we tried were unable to provide reliable connectivity. We tried providers in France, Germany, Japan, Norway. For example, in France among 200 IPv6 adresses provided by a Freebox (Free), 3 becomes unreachable from time to time, during a couple of minutes or hours. OVH routers sometimes no longer route packers to Free, but only for IPv6, during a couple of hours. Telia routers somtimes "eat" a few bytes during the initialization of a session.
Overall, the use of native IPv6 of ISPs lead to a service availability of 99% or worse. We we searching for a solution.
We also had had the experience that from time to time, IPv4 transit between ISPs can be cut for a while - a couple of hours -although less often. Our ideal solution should also solve that.
2- The solution: re6st + babel
Step 1: create a wired mesh
We coded a litlle deaemon called re6st which is able to find 10 IPv4 neighbours randomly and create a tunnel to each neighbour. re6st can be placed behind a NAT. It is able to capture public IPv4 address of your router through UPnP. After some time, all nodes which run re6st form a global mesh.
Step 2: start babel
Once tunnels are created, babel is used for routing. Babel then finds the best route to interconnect all re6st nodes.
After a couple of month of using re6st + babel we can say that it works quite well. SlapOS no longer experiences the connectivity problems of native IPv6. We can safely host websites with SlapOS over re6st+babel.
4- Next steps
A report will be published.
5- Remaining problems to solve
The problems which remain to be solved are the following:
a- How can we prevent one babel participant to act against other participants by providing wrong information to other participants ? Imagine for example that a bad organization joins re6st + babel network and starts capturing all routes in order to analyze traffic or even block it.
b- How can we create a hierarchical addressing system ? The idea here is to group participants dynamically and assign them a "big" IPv6 address range. Each participant connects to another participant through another participant by first connecting randomly to one participant in a dynamic group and next connect to other participant in the same group. With this grouping approach, there is no need to create a hierachical network with a bakbone. It also solves the problem of scalability.
c- How can we implement more policies (ex. latency) ?
d- How could we implement accounting and billing in a way or another ? (open question, but quite important for example to solve the problem of FTTH participants with upload limited to 3GB / day as in Japan)
Most of the coding of re6st was done by Julien Muchembled (Nexedi), Ulysse Beaugnon (ENS) and Guillaume Bury (ENS).
We could have used other routing protocols (ex. OLSR). But we felt that Babel pluggable policy system was a key design difference which could be used to later customize it to different needs of Cloud applications (ex. low latency). We would also feel ashamed to use a protocol which babel's creator proved that it was flawed.
We could have used tinc. But tinc creates a fully connected mesh. There is also a difference between what it claims to do and what it actually does. Last, mixing tunneling and routing is a bad idea as we were suggested by Juliusz C.
We could have used gre instead of OpenVPN for tunnels. But that does work behind an IPv4 NAT. Yet, nothing prevents use from later using gre.
More information about the Babel-users