[Babel-users] Fwd: [ih] Failures of the early Internet

Dave Taht dave.taht at gmail.com
Sat Jan 20 08:10:01 GMT 2024


---------- Forwarded message ---------
From: Jack Haverty via Internet-history <internet-history at elists.isoc.org>
Date: Fri, Jan 19, 2024 at 9:20 PM
Subject: [ih] Failures of the early Internet
To: <internet-history at elists.isoc.org>


On 1/19/24 16:00, Karl Auerbach via Internet-history wrote:
> (I've never felt that I have an adequate understanding of the early
> routing failures and their effects.)

OK, I'll jump in.....

It was painfully easy for routing problems to occur.   All one had to do
was advertise to a neighboring router that you were the best route to
everywhere.   A simple bug could do the job.  Word would quickly spread,
and all traffic would head your way, which sometimes made it impossible
to connect to the offending router to try to fix the problem.  IIRC,
something like that was what the Fuzzballs occasionally did.

Another incident I recall was also a routing issue.  I don't remember
exactly where it happened, but two sites, universities IIRC, were
collaborating on some research project and had a need to send data back
and forth.  Their pathway to each other through the Internet was
somewhat long and often congested.   So they decided to fix the problem
by installing a circuit directly between their two campus' routers.

Money was of course an issue, but they found the funds to pay for a 9.6
kb/s line.  They were surprised to observe that the added line only made
things worse.  File transfers took even longer than before.  Of course
their change to the topology of the Internet had unexpectedly made their
9.6 line the best route for all sorts of Internet traffic unrelated to
their project.

Many of the incidents I remember were caused by the routing algorithms
which were based on "hops" rather than on time (as had been the case in
the Arpanet for a decade or more).   This was a well-known problem which
I think was part of the motivation for Dave Mills to create the NTP
machinery.  In addition to routing, there were other Internet mechanisms
that depended on time, but had necessarily been implemented
"temporarily" until good time mechanisms were available.  For example,
the TTL (Time To Live) and TOS (Type Of Service) values in IP were
supposed to provide the routers with information to route IP datagrams
over the most appropriate route, or quickly discard them if there was no
expectation they could possibly get to their destination in time to
still be useful.

Dave worked hard to get Time as an inherent element of The Internet, and
our expectation was that TCP and IP software throughout the Internet
would be changed to make decisions based on Time rather than Hops.   I'm
not sure if that ever happened.   The Internet now knows what time it
is, but does networking software today ever look at its watch?

Another incident I recall was not an Internet failure, but rather a
situation where the Internet terrorized the Arpanet.

The Arpanet was touted as a "packet network", but in reality it was a
virtual circuit network, using packets internally.   There were lots of
mechanisms inside the Arpanet IMPs to make all user traffic travel to
its destination intact and in the same order it was sent.   The network
was designed to match the typical usage patterns of the era - people
connected to some computer somewhere on the Arpanet, did their work, and
disconnected minutes or even hours later.   Inside the Arpanet, the
mechanisms to set up virtual circuits consumed resources and took time,
but with sessions lasting minutes or hours the impact was tolerable.

One day the Arpanet was having problems and response times were
noticeably slower than usual.  Investigation revealed that the Arpanet
was flailing, constantly setting up and tearing down virtual circuits,
each of which was only lasting for a second or two.   The Arpanet NOC
(down the hall from my office) was in crisis.

Eventually the problem was traced down to a new release of OS software
(BSD, IIRC) that had just been posted on the Arpanet, and was being
installed in the large numbers of workstations (Sun, IIRC) that had
started appearing on the Internet.  The new OS release included a new
tool to advise its users of the current status of the Internet.  It
accomplished that by "pinging" every router every few minutes to see if
that router was up and responsive.

Pinging involved sending a single datagram, and receiving a single
datgram in response.  But each such datagram required the Arpanet to set
up a virtual circuit to carry that traffic.  With lots of OSes and lots
of routers now scattered around the Arpanet, it was trying to do
something it was never designed to do.   As more workstations loaded the
new OS release, the problem only got worse.

Although this wasn't an "Internet failure", it was a system failure,
caused by the Internet.  Administrative action suppressed the problem
and as the Arpanet was decommissioned the problem disappeared.  Or
perhaps moved somewhere else?

Anybody else have recollections of early failures...?

Jack Haverty





--
Internet-history mailing list
Internet-history at elists.isoc.org
https://elists.isoc.org/mailman/listinfo/internet-history


-- 
40 years of net history, a couple songs:
https://www.youtube.com/watch?v=D9RGX6QFm5E
Dave Täht CSO, LibreQos
-------------- next part --------------
-----BEGIN PGP SIGNATURE-----

wsD5BAABCAAjFiEEZLvMn5vmvTAlFEILdGzDIkA7jlAFAmWrLd0FAwAAAAAACgkQdGzDIkA7jlBJ
jQv9ExTWFv2cLI9CmXofDuiwMbxTLTlVgiLcnBXwMoU+VucXWSVI68UKLxFmbsfIPc5fxzBEwqpd
n4fwIE+2YnIXljjclfvFiUuaAEQNED9EqZKIR9z4KWOoapDQLViF7rlALB+WXqCyWHcMh3e3nfNt
fykLsAKxpDpRjo5wbKN5sE2zeRcGZIU18wxB9gV8ptd6Qy9NyLRUt6pbJm7ZAFbA12mhbJYPYMC1
BEmJt9L4UJ24QRkt1CXYSHYAaYLY7i4WRnuQqXJT8y+NxwCs9BpoqjJWpBqEyRQaupXkKC8JaN+P
kjx0WDCJFctcH+RUhXoQfuRG/AWsCqrdrwG9eoGPxaCBo80egcJXYFG9WmV4FsWDyZJezlqXY+ev
4psMoTm14qs1xRRsk8xXJPIgTo4dj2T/BFNf5Bb+beUKws4jNVPPUyQSqoZYM1NlU8pEVlV5vzRc
ZwsjgQbkD8p5Cs3o1OHl4NlcPe/O5/ub/uqM/7bz3LjChYmZsL88bAWeCWEV
=DfBv
-----END PGP SIGNATURE-----


More information about the Babel-users mailing list