[Babel-users] how to set the pacing rate

Mon Nov 12 20:49:03 GMT 2018

Juliusz Chroboczek <jch at irif.fr> writes:

>>>> +    rc = setsockopt(s, SOL_SOCKET, SO_MAX_PACING_RATE, &rate, sizeof(rate));
>
>>> It's only effective on TCP sockets, and only when using the FQ scheduler.
>
>> I am under the impression that since linux 4.12 it works on udp, and I
>> forget when it started working outside the fq scheduler...
>
> Ah.
>
> Still, I think that we should be able to do pacing in userspace.  At least
> in the no-churn case, we should be able to predict how many updates per
> unit of time we want to send, and spread them out across the update
> interval.

Yes. I still kind of like finally leveraging the announcement interval
in the route update message to also spread things out... you can
announce earlier than the interval on a metric change but otherwise idle
along with routes persisting for say half to 1/3 the max. (11 minutes?)
- this is an update interval about equivalent to modern BGP.

One patch I was fiddling with was to limit passes through the resend
routines much like MAX_BUFFERED_UPDATES does, so the main babel loop
gets a chance to do other things, like read new packets.

What happens in "churn mode" reminds me a lot of classic dns/ntp
amplification attacks and trying to keep the ratio of packets in to
packets out lower is a good idea.

So I quit after 64 send_multicast_multihop or send_update requests
and keep processing the loop forward from there incrementally deferring GC...

Haven't benchmarked it yet... not happy with it... been busy doing other things.

Another patch basically adds something fq_codel-like to recv, so that
each call to bab_recv calls recv(args... MSG_NO_WAIT) 42 times or it
gets nothing back, sorts the input by src/dst IP hash and serves things
back to recv. Theoretically this keeps the RCV_BUF as drained as possible.

It does bulk drops from the fattest queue when the internal packet limit
is exceeded, rather than codel, cause babel is not tcp-friendly as
yet. It serves up short (e.g. hello) packets from each speaker first and
short flows faster than long ones... so a big fat speaker can't drown
out the others. 

(this is where I wanted to reach for clang-format, and for purity's sake
I should go reuse the bsd version or rewrite from scratch - and it
totally doesn't work right now)

the edgerouter can't push a gbit, but 1MByte/sec is totally doable from a
pps standpoint

I think it will work, but it's more complicated than I'd like,
reinventing a ton of stuff in userspace. I'm watching the QUIC related
bind-connect work closely.

An alternative idea is to put a skb->hash probalistic dropper into ebpf
for reads when we're in trouble, same rough idea, no FQ.

>
> I'd need to read up on data structures, as I don't currently understand
> the tradeoffs between binary heaps and timer wheels.  (Same goes for
> dealing with resends.  And Christof suggested that we should modify the
> main event loop in babeld to use a proper data structure.)

Yes, I kind of think that whatever happens to resend.* might end up
being a scheduling technique for more of babel itself. 

The timer wheel in the linux kernel is the best implementation I know
of, handling the thundering herd problem, sloppy timings (where you
really don't care if something happens 1ms or 2ms in the future, just do
everything in the range) and much else. There's plenty of others worth
looking at... but the discussions around that code have been endless for
a decade and informative.

You'd schedule a hello and keep it updated til it went out, then reschedule. Etc.

when I was flailing at rabeld I added a good old fashioned "alarm" call
to make sure hellos went out and broke up major loops to check it
periodically.

> -- Juliusz