pete > courses > CS 431 Spring 25 > Lecture 09: Internet structure
Lecture 09: Internet structure
Goals
- implement traceroute using ICMP
- define autonomous system and its subvarieties: stub, multihomed, transit
- explain the purpose of interior/intradomain gateway protocols (IGPs) and external gateway protocols (EGPs)
- briefly compare and contrast the purpose and behavior of RIP, OSPF, and BGP
last time, we filled in a couple gaps in the functionality of the various protocols we’ve talked about to this point
first, recall that application software will tell the IP code "please send data XYZ to IP address A.B.C.D"; the IP code will construct the appropriate packet, find the appropriate route, and pass it to the link layer (often Ethernet) for sending
the problem was that the Ethernet layer needs to know the Ethernet address to send the frame (containing the IP packet) to, which is not mentioned in the preceding description
this led us to ARP: the Address Resolution Protocol, which allows systems to ask questions like "what is the link-layer (eg, Ethernet) address associated with a particular network-layer (eg, IP) address?" and receive helpful answers
thus, after consulting the routing table to determine the IP address of the router through which to forward the packet in question, the system will issue an ARP request to find the MAC address associated with that IP address
upon receiving a successful response, the system is able to fully construct the Ethernet frame in which the IP packet will be transmitted
the related gap we filled was: what happens if there is no route for the packet?
that is, if a machine is given some data to send to some IP address, but upon consulting the routing table, finds no matching route
unlike all the other failure modes we’ve discussed to this point, the router does not just silently drop the packet: instead, it sends back a message using the ICMP protocol that indicates a route could not be found
in this case, the specific ICMP message sent is "network unreachable"
if an ARP request is made but receives no response, the result is to send back an ICMP "host unreachable" message
recall also that each router through which a packet passes decrements the TTL field in the IP header; when that field reaches 0, the router processing it sends back an ICMP "TTL exceeded" message
and we demonstrated all of these using the ICMP "echo request" and "echo response" messages
now it’s time to play some games with this stuff
to start with, let’s review the ping program, which is what we use to send those ICMP echo request messages:
$ ping -c1 www.google.com PING www.google.com (142.250.65.196): 56 data bytes 64 bytes from 142.250.65.196: icmp_seq=0 ttl=116 time=25.366 ms --- www.google.com ping statistics --- 1 packets transmitted, 1 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 25.366/25.366/25.366/0.000 ms
the -c1 part says to send a single ICMP echo request (otherwise it will send one request per second until we press ^C)
and the www.google.com part identifies the destination
this is new! to this point, we’ve talked about sources and destinations in terms of IP addresses, but this is human-readable text
we will cover how that all soon, but not today
for now, just trust that the name www.google.com is equivalent to the IP address 142.250.65.196, which is where ping tells us it is sending the ICMP echo request
ping likewise tells us that it’s sending 56 data bytes (to satisfy the minimum Ethernet frame size)
that the sequence number is 0 (if we didn’t specify -c1 and let it send out one request per second, we would see the sequence number increase for each one sent)
the ttl of the response is 116, which isn’t terribly interesting on its own, though we will play around with that field in a moment
finally, the time is 25.366 ms
that’s measuring the round-trip-time in milliseconds (thousandths of a second), meaning that it took about 1/40th of a second for the request to travel from my machine to 142.250.65.196 and for the response to travel all the way back
we can also see the exact same information in Wireshark, which shows both request and response packets, as well as all of their contents and timings, corroborating the output of ping
recall that when the packet is sent, the TTL (time to live) field in the IP header is set to some large-ish value (Wireshark tells us that our outgoing ICMP echo request had a TTL of 64)
this TTL is decremented at each hop and, if it reaches zero, a TTL exceeded message comes back
what happens if we intentionally set the TTL of the outgoing ICMP echo request message to a small value? say, 4?
we would expect that the packet is able to get 4 hops along the path to its destination, but when that fourth router decrements the TTL field, it will become zero, and the fourth router will send back a "TTL exceeded" ICMP message
let’s do it:
$ ping -c1 -m4 www.google.com PING www.google.com (142.251.32.100): 56 data bytes 92 bytes from 162.151.149.89: Time to live exceeded Vr HL TOS Len ID Flg off TTL Pro cks Src Dst 4 5 20 0054 fdbf 0 0000 01 01 4f1f 172.24.0.6 142.251.32.100 --- www.google.com ping statistics --- 1 packets transmitted, 0 packets received, 100.0% packet loss
(the -m4 parameter tells ping to set the initial TTL to 4, which we confirm with Wireshark)
(note that other implementations, such as the default one on Linux, use a different letter than -m for this purpose)
the second line of the output above tells us that IP address 162.151.149.89 reported TTL exceeded (again, corroborated by Wireshark)
this is the IP address of that fourth hop!
we can expand this to map out the entire path of routers traversed between source and destination
by sending out a series of ICMP echo requests, with monotonically increasing TTLs
ie, send a request with TTL 1, then another request with TTL 2, then with TTL 3, etc, etc
by examining the TTL exceeded messages we get in response, we can determine the IP address of every router along the way
the result is called a traceroute… because it traces… the route
we could use ping to perform a traceroute, but we’d have to wrap it in some shell-scripting stuff
fortunately, there are self-contained programs that do it all for us
on Windows, the classic tool is tracert; on UNIX-based systems (which includes OS X), the classic tool is traceroute
my preferred program is mtr
(note that the Middlebury network blocks TTL exceeded messages, which is why I’m running these commands on my FreeBSD machine at home)
$ mtr -nrc1 www.google.com Start: 2023-03-26T15:08:35-0400 HOST: rivendell.hiddenrock.com Loss% Snt Last Avg Best Wrst StDev 1.|-- 172.24.0.1 0.0% 1 0.3 0.3 0.3 0.3 0.0 2.|-- 96.120.70.93 0.0% 1 8.7 8.7 8.7 8.7 0.0 3.|-- 24.124.211.201 0.0% 1 11.7 11.7 11.7 11.7 0.0 4.|-- 162.151.149.153 0.0% 1 18.2 18.2 18.2 18.2 0.0 5.|-- 96.110.42.1 0.0% 1 22.3 22.3 22.3 22.3 0.0 6.|-- 96.110.34.18 0.0% 1 22.7 22.7 22.7 22.7 0.0 7.|-- 75.149.229.30 0.0% 1 23.5 23.5 23.5 23.5 0.0 8.|-- 142.250.234.189 0.0% 1 22.4 22.4 22.4 22.4 0.0 9.|-- 142.251.65.103 0.0% 1 26.1 26.1 26.1 26.1 0.0 10.|-- 142.250.80.68 0.0% 1 24.2 24.2 24.2 24.2 0.0
the -n option tells mtr to show the router addresses numerically rather than as hostnames (more about this next time)
(the exception is the origin: "rivendell.hiddenrock.com" is the name of the machine sending the request… because I’m a nerd)
the -r option says to just print out a report rather than provide an interactive interface
the -c1 option is the same as for ping: just send a single request
we can cross-reference the output of mtr to the packets observed by Wireshark
an ICMP echo request with TTL=1 is sent, and an ICMP TTL exceeded message from 172.24.0.1 comes back
then TTL=2, error from 96.120.70.93
etc, etc
eventually, mtr gets an ICMP echo response back instead of TTL exceeded, which tells it that it has reached its destination
some interesting things to note:
if we get rid of the -rc1 parameters, we can let mtr just keep sending out packets and updating the statistics
this can give us some idea of the speed and reliability of each hop in the route
if we do so, we might even see multiple routes!
that is, different packets might take different paths through the Internet
mtr will show that as multiple IP addresses associated with, eg, hop 6
let’s now get replace the -n option with the -w option, which asks mtr to show long names instead of IP addresses where possible:
$ mtr -wrc1 www.google.com Start: 2023-03-26T15:16:59-0400 HOST: rivendell.hiddenrock.com Loss% Snt Last Avg Best Wrst StDev 1.|-- echoriath.hiddenrock.com 0.0% 1 0.3 0.3 0.3 0.3 0.0 2.|-- 96.120.70.93 0.0% 1 15.3 15.3 15.3 15.3 0.0 3.|-- 24.124.211.209 0.0% 1 10.9 10.9 10.9 10.9 0.0 4.|-- 162.151.149.89 0.0% 1 9.0 9.0 9.0 9.0 0.0 5.|-- be-325-ar01.needham.ma.boston.comcast.net 0.0% 1 19.8 19.8 19.8 19.8 0.0 6.|-- be-32031-cs03.newyork.ny.ibone.comcast.net 0.0% 1 27.7 27.7 27.7 27.7 0.0 7.|-- be-3311-pe11.111eighthave.ny.ibone.comcast.net 0.0% 1 23.4 23.4 23.4 23.4 0.0 8.|-- 50.242.150.62 0.0% 1 21.5 21.5 21.5 21.5 0.0 9.|-- 108.170.248.1 0.0% 1 22.7 22.7 22.7 22.7 0.0 10.|-- 209.85.253.143 0.0% 1 22.7 22.7 22.7 22.7 0.0 11.|-- lga25s81-in-f4.1e100.net 0.0% 1 26.7 26.7 26.7 26.7 0.0
(in this particular output, interesting to note that the response to the packet with TTL=4 came back super fast; this is not the case in subsequent runs—the Internet is an unpredictable place!)
unsurprisingly, because Comcast is my Internet Service Provider (ISP), some of these hops are identified with names under their jurisdiction
now a second, identical run:
$ mtr -wrc1 www.google.com Start: 2023-03-26T15:18:10-0400 HOST: rivendell.hiddenrock.com Loss% Snt Last Avg Best Wrst StDev 1.|-- echoriath.hiddenrock.com 0.0% 1 0.3 0.3 0.3 0.3 0.0 2.|-- 96.120.70.93 0.0% 1 12.9 12.9 12.9 12.9 0.0 3.|-- po-304-1221-rur01.williston.vt.boston.comcast.net 0.0% 1 11.7 11.7 11.7 11.7 0.0 4.|-- be-325-ar01.needham.ma.boston.comcast.net 0.0% 1 22.4 22.4 22.4 22.4 0.0 5.|-- be-32041-cs04.newyork.ny.ibone.comcast.net 0.0% 1 22.4 22.4 22.4 22.4 0.0 6.|-- be-3412-pe12.111eighthave.ny.ibone.comcast.net 0.0% 1 26.0 26.0 26.0 26.0 0.0 7.|-- 96-87-11-70-static.hfc.comcastbusiness.net 0.0% 1 26.6 26.6 26.6 26.6 0.0 8.|-- 142.251.225.89 0.0% 1 28.4 28.4 28.4 28.4 0.0 9.|-- 142.251.64.7 0.0% 1 29.9 29.9 29.9 29.9 0.0 10.|-- lga25s78-in-f4.1e100.net 0.0% 1 24.2 24.2 24.2 24.2 0.0
different path!
(also different destination: ignore that for now)
not only that, but first the packet went to Williston (which is north of Middlebury) before going south to Boston and thence to New York
(assuming, of course, that we trust the names—in this case, I’m inclined to place a fair amount of trust in them: they are in place to help Comcast manage their own network, and thus I expect them to selfishly maintain their accuracy)
if we do the same thing with www.amazon.com, we get a different result:
$ mtr -wrc1 www.amazon.com Start: 2023-03-26T15:20:28-0400 HOST: rivendell.hiddenrock.com Loss% Snt Last Avg Best Wrst StDev 1.|-- echoriath.hiddenrock.com 0.0% 1 0.3 0.3 0.3 0.3 0.0 2.|-- 96.120.70.93 0.0% 1 14.5 14.5 14.5 14.5 0.0 3.|-- po-304-1221-rur01.williston.vt.boston.comcast.net 0.0% 1 13.7 13.7 13.7 13.7 0.0 4.|-- be-325-ar01.needham.ma.boston.comcast.net 0.0% 1 21.4 21.4 21.4 21.4 0.0 5.|-- 69.241.35.90 0.0% 1 19.1 19.1 19.1 19.1 0.0 6.|-- ??? 100.0 1 0.0 0.0 0.0 0.0 0.0 7.|-- ??? 100.0 1 0.0 0.0 0.0 0.0 0.0 8.|-- ??? 100.0 1 0.0 0.0 0.0 0.0 0.0 9.|-- ??? 100.0 1 0.0 0.0 0.0 0.0 0.0 10.|-- ??? 100.0 1 0.0 0.0 0.0 0.0 0.0 11.|-- 150.222.71.101 0.0% 1 19.0 19.0 19.0 19.0 0.0 12.|-- server-18-161-19-226.bos50.r.cloudfront.net 0.0% 1 19.9 19.9 19.9 19.9 0.0
first, hops 6 through 10 declined to even send back a TTL exceeded message, reflected by both the "???" values in the IP/name column and the "100.0" values in the Loss% column
as with one of the paths to Google, first the packet went north to Williston and then south to Boston
but unlike the path to Google, it does not appear to have gone on to New York
we could do all sorts of experiments mapping out the Internet like this, but the aspect I want to focus on is that different collections of routers (because that’s what piece of hardware exists at each hop) are managed by different organizations
the traceroutes above show that my packets traverse several routers owned and operated by Comcast before escaping to the Internet at large
except calling it "the Internet at large" is misleading, because that just means routers owned and operated by other organizations
(remember that "The Internet" is "just" a collection of interconnected networks)
and all of these routers are constantly making routing decisions: whenever a packet arrives at a router, that router consults its routing table, performs the transformations we talked about a few lectures ago, and sends the packet on its way
which begs the question: how do routes get into these routing tables?
especially given that, as we saw above, Comcast’s routers have to know both a) which other network can get my packet to its destination and b) how to route the packet within the Comcast network so that it can hand it off to that other network
vocabulary time
an autonomous system (AS) is a collection of networks and routers that is all under the same administrative control
example: Comcast operates an autonomous system
the mechanism that determines the contents of routing tables within an AS is called an interior gateway protocol (IGP) (sometimes you’ll also hear "intradomain gateway protocol")
every router within a given AS will use the same IGP (using the same routing protocol is, in fact, part of what defines a particular AS)
the mechanism that determines routes between autonomous systems is called an exterior gateway protocol (EGP)
("gateway" is an old synonym for "router")
we will look at examples of these shortly
first we need to refine our idea of autonomous systems
from the preceding lectures, you may have gotten the idea that The Internet is comprised of bazillions of routers and hosts
this is not wrong, but if we zoom out, and imagine that every router and host belongs to some autonomous system (accurate), we can also view The Internet as being comprised of a much more modest number of autonomous systems, many of which are connected to other autonomous systems
you could almost imagine it like each autonomous system is a cloud, and sometimes one cloud touches another cloud, meaning that the two clouds can exchange packets
(note that this is not "The Cloud", as in "my data is stored in The Cloud", but the notion of using clouds to visually represent networks has been around for ages and in fact forms the basis for the modern idea of "The Cloud")
so as the above traceroutes show, the source and destination for a given packet may lie in different clouds
there may also be several clouds (ASes) between the cloud (AS) containing the source and the cloud (AS) containing the destination
"local traffic" refers to packets for which either the source or the destination lies within the current AS
by contrast, "transit" means packets for which neither the source nor the destination lies within the current AS
example: if my computer is in AS 1, sends a packet to a host in AS 5, which passes through routers in ASes 2, 3, and 4 on its way, that packet is considered "local traffic" to AS 1 and AS 5, and considered "transit traffic" in AS 2, 3, and 4
more vocab
a stub AS is an AS that a) only carries local traffic and b) is connected to exactly one other AS
a multihomed AS is an AS that a) only carries local traffic and b) is connected to more than one other AS
a transit AS is an AS that a) carries transit traffic, b) possibly also carries local traffic and c) is connected to more than one other AS
before digging into routing protocols, we need to shift our perspective a bit more
to this point, when we’ve discussed networks, they’ve been Ethernet networks, where a bunch of hosts are all on the same network and maybe one or a few of those hosts are also routers, which pass packets to other networks
these kinds of networks definitely exist, but mostly at the periphery of autonomous systems
in the core of autonomous systems, there are a bunch of routers and these routers are usually connected by point-to-point links
so machines that humans use directly may exist on an Ethernet upon which a router also lives, but the "other networks" to which that router is connected may be point-to-point links instead of Ethernets
now let’s talk about routing protocols
these are the protocols used to populate routing tables
my intent here is not to give you a deep, nuanced understanding how these work, but rather give you a sense of their broad strokes: the kinds of problems that arise, the kinds of solutions that have been used, and the factors that come into play
the classic internal gateway protocol is the Routing Information Protocol (RIP)
it hasn’t been used in decades, but it’s a good place to start on concepts
it is pretty much just the Bellman-Ford algorithm for single-source shortest path in a graph, except every router in the network is running the algorithm independently
in the context of networks, the generic term is "distance-vector routing" for reasons that will become clear soon
every router participating in RIP will maintain a table containing records with three pieces of information
- a network (eg, 192.168.93/24)
- the identity of a directly-adjacent router (thus indicating that network is accessible via that router)
- the number of hops it takes to reach that network through that router
ie, it keeps track of the distance to the network and the direction (vector) in which packets destined for that network should be sent
every router will periodically (every 30 seconds) share the contents of this table with its directly-adjacent neighbors
let’s say that router A has the following record in its table: "network 192.168.93/24 is reachable through router B using 3 hops"
router A is directly connected to routers B, C, and D
router A sends the aforementioned record to routers B, C, and D
upon receipt of that record, routers B, C, and D conclude that they can reach 192.168.93/24 through router A with 4 hops
routers B, C, and D look at their own tables and see if this is a smaller number of hops than their previous route to 192.168.93/24; if it is, they replace the previous route with this new one; if it’s not, they drop it
since every router is periodically sending out these announcements, eventually all routers will converge on a good collection of routes
or so we hope…
there are some big problems with this algorithm
the first problem is that it takes some time to re-stabilize in the case of link problems
so if an important link goes down, that fact will take quite a while to propagate, and many routers in the system will continue to send packets in its direction with no chance of success
the second problem is that, when router B receives a record from router A, router B can’t tell whether the route within goes through router B itself
this leads to a problem known as "count-to-infinity", in which router C advertises a route to B, except that route goes through B itself, and so we get routing tables with ever-increasing numbers of hops that just ping-pong the packets back and forth
the third problem with RIP is that using hops as a metric lacks sufficient nuance
a coast-to-coast point-to-point connection would have a metric of 1, as would a 3-foot Ethernet cable
clearly one of these should be preferred over the other, but RIP does not permit that sort of information to be encoded
RIP was retired in 1979 for the preceding reasons, which were really laid bare because networks had grown large enough and varied enough that they became fatal
the replacement is Open Shortest Path First (OSPF), and is still in use today
OSPF maintains a notion of every router announcing stuff to its neighbors
but the stuff in question is the whole routing table, not just distances
so eventually every router in the network learns the entire network topology
the second big improvement is a more nuanced metric: it’s not just hops
remember that these are internal gateway protocols: used for managing routing tables within an AS
external gateway protocols manage routing tables between ASes
when two ASes connect to each other, this means that a router in one AS has a link to a router in a second AS
because these routers exist on the periphery of their respective ASes, they are often called "border routers"
it is often desirable for ASes to connect to several other ASes
to facilitate that, there exist facilities called Internet Exchanges, which are basically well air-conditioned buildings to which many different ASes run links and house routers, and then run links between the various routers to connect the ASes to each other
on the business side, a peering agreement is made between AS operators that establishes the terms and conditions for those ASes to exchange packets
historically, many peering agreements involved no money exchanging hands: the idea was that hosts within both ASes were benefitted by connecting the two ASes directly that both AS operators were incentivized to do so
(many years ago, Verizon made a stink about Netflix and tried to get them to pay extra in a peering agreement, contending that Netflix was getting more benefit from peering and therefore ought to pay)
the original EGP was called EGP
now, the only one that exists is BGP and it’s a beast
it’s super complicated, and frankly outside the scope of an undergraduate networks course
one of the reasons it’s complicated is that it transcends purely technical concerns
to understand why, let’s first consider the high-level information that BGP propagates
this part is fairly simple: BGP only advertises reachability
imagine this setup:
+-----------+ +-----------+ | AS 01 | | AS 02 | | 1.20.3/24 | | 40.5.6/24 | +-----+-----+ +-----+-----+ | | +-----+-----+ +-----+-----+ | AS 03 | | AS 04 | | 7.8.93/24 | | 2.90.1/24 | +---------+-+ +-+---------+ | | +-+-------+-+ | AS 05 | | 19.2.6/24 | +-----------+
AS 01 sends this advertisement to AS 03: "if you have packets for 1.20.3/24, send them to AS 01"
AS 03 then sends this advertisement to AS 05: "if you have packets for 1.20.3/24, send them through AS 03 and AS 01"
there is no metric involved here, again because BGP only advertises reachability
now the complicated part
it could very well be that the collection of ASes forms a graph: there may be multiple paths between pairs of ASes
and so there are choices to be made
some paths may be preferred for financial reasons (one peering agreement is more beneficial than another)
some paths may be preferred for political reasons (an AS in Country A may not want its traffic flowing through an AS in Country B, even though there are several intermediate ASes in the path)
and so on
pathological real-life example:
in 2008, Pakistan decided that Youtube was evil and should not be accessible to its citizens
to make this happen, they configured their routers to advertise routes for Youtube that resulted in all packets being discarded
worldwide
nobody on Earth could access Youtube until this was fixed
(Cloudflare has a page on BGP hijacking)
all of this is to say that the Internet is horrifically complicated and the fact that it is so shockingly reliable for delivering such an absurd amount of traffic day-in and day-out is more than a little mind-blowing