pete > courses > CS 431 Spring 25 > Lecture 09: Internet structure


Lecture 09: Internet structure

Goals


last time, we filled in a couple gaps in the functionality of the various protocols we’ve talked about to this point

first, recall that application software will tell the IP code "please send data XYZ to IP address A.B.C.D"; the IP code will construct the appropriate packet, find the appropriate route, and pass it to the link layer (often Ethernet) for sending

the problem was that the Ethernet layer needs to know the Ethernet address to send the frame (containing the IP packet) to, which is not mentioned in the preceding description

this led us to ARP: the Address Resolution Protocol, which allows systems to ask questions like "what is the link-layer (eg, Ethernet) address associated with a particular network-layer (eg, IP) address?" and receive helpful answers

thus, after consulting the routing table to determine the IP address of the router through which to forward the packet in question, the system will issue an ARP request to find the MAC address associated with that IP address

upon receiving a successful response, the system is able to fully construct the Ethernet frame in which the IP packet will be transmitted


the related gap we filled was: what happens if there is no route for the packet?

that is, if a machine is given some data to send to some IP address, but upon consulting the routing table, finds no matching route

unlike all the other failure modes we’ve discussed to this point, the router does not just silently drop the packet: instead, it sends back a message using the ICMP protocol that indicates a route could not be found

in this case, the specific ICMP message sent is "network unreachable"

if an ARP request is made but receives no response, the result is to send back an ICMP "host unreachable" message

recall also that each router through which a packet passes decrements the TTL field in the IP header; when that field reaches 0, the router processing it sends back an ICMP "TTL exceeded" message

and we demonstrated all of these using the ICMP "echo request" and "echo response" messages


now it’s time to play some games with this stuff

to start with, let’s review the ping program, which is what we use to send those ICMP echo request messages:

$ ping -c1 www.google.com
PING www.google.com (142.250.65.196): 56 data bytes
64 bytes from 142.250.65.196: icmp_seq=0 ttl=116 time=25.366 ms

--- www.google.com ping statistics ---
1 packets transmitted, 1 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 25.366/25.366/25.366/0.000 ms

the -c1 part says to send a single ICMP echo request (otherwise it will send one request per second until we press ^C)

and the www.google.com part identifies the destination

this is new! to this point, we’ve talked about sources and destinations in terms of IP addresses, but this is human-readable text

we will cover how that all soon, but not today

for now, just trust that the name www.google.com is equivalent to the IP address 142.250.65.196, which is where ping tells us it is sending the ICMP echo request

ping likewise tells us that it’s sending 56 data bytes (to satisfy the minimum Ethernet frame size)

that the sequence number is 0 (if we didn’t specify -c1 and let it send out one request per second, we would see the sequence number increase for each one sent)

the ttl of the response is 116, which isn’t terribly interesting on its own, though we will play around with that field in a moment

finally, the time is 25.366 ms

that’s measuring the round-trip-time in milliseconds (thousandths of a second), meaning that it took about 1/40th of a second for the request to travel from my machine to 142.250.65.196 and for the response to travel all the way back

we can also see the exact same information in Wireshark, which shows both request and response packets, as well as all of their contents and timings, corroborating the output of ping


recall that when the packet is sent, the TTL (time to live) field in the IP header is set to some large-ish value (Wireshark tells us that our outgoing ICMP echo request had a TTL of 64)

this TTL is decremented at each hop and, if it reaches zero, a TTL exceeded message comes back

what happens if we intentionally set the TTL of the outgoing ICMP echo request message to a small value? say, 4?

we would expect that the packet is able to get 4 hops along the path to its destination, but when that fourth router decrements the TTL field, it will become zero, and the fourth router will send back a "TTL exceeded" ICMP message

let’s do it:

$ ping -c1 -m4 www.google.com
PING www.google.com (142.251.32.100): 56 data bytes
92 bytes from 162.151.149.89: Time to live exceeded
Vr HL TOS  Len   ID Flg  off TTL Pro  cks      Src      Dst
 4  5  20 0054 fdbf   0 0000  01  01 4f1f 172.24.0.6  142.251.32.100


--- www.google.com ping statistics ---
1 packets transmitted, 0 packets received, 100.0% packet loss

(the -m4 parameter tells ping to set the initial TTL to 4, which we confirm with Wireshark)

(note that other implementations, such as the default one on Linux, use a different letter than -m for this purpose)

the second line of the output above tells us that IP address 162.151.149.89 reported TTL exceeded (again, corroborated by Wireshark)

this is the IP address of that fourth hop!


we can expand this to map out the entire path of routers traversed between source and destination

by sending out a series of ICMP echo requests, with monotonically increasing TTLs

ie, send a request with TTL 1, then another request with TTL 2, then with TTL 3, etc, etc

by examining the TTL exceeded messages we get in response, we can determine the IP address of every router along the way

the result is called a traceroute… because it traces… the route


we could use ping to perform a traceroute, but we’d have to wrap it in some shell-scripting stuff

fortunately, there are self-contained programs that do it all for us

on Windows, the classic tool is tracert; on UNIX-based systems (which includes OS X), the classic tool is traceroute

my preferred program is mtr

(note that the Middlebury network blocks TTL exceeded messages, which is why I’m running these commands on my FreeBSD machine at home)

$ mtr -nrc1 www.google.com
Start: 2023-03-26T15:08:35-0400
HOST: rivendell.hiddenrock.com    Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 172.24.0.1                 0.0%     1    0.3   0.3   0.3   0.3   0.0
  2.|-- 96.120.70.93               0.0%     1    8.7   8.7   8.7   8.7   0.0
  3.|-- 24.124.211.201             0.0%     1   11.7  11.7  11.7  11.7   0.0
  4.|-- 162.151.149.153            0.0%     1   18.2  18.2  18.2  18.2   0.0
  5.|-- 96.110.42.1                0.0%     1   22.3  22.3  22.3  22.3   0.0
  6.|-- 96.110.34.18               0.0%     1   22.7  22.7  22.7  22.7   0.0
  7.|-- 75.149.229.30              0.0%     1   23.5  23.5  23.5  23.5   0.0
  8.|-- 142.250.234.189            0.0%     1   22.4  22.4  22.4  22.4   0.0
  9.|-- 142.251.65.103             0.0%     1   26.1  26.1  26.1  26.1   0.0
 10.|-- 142.250.80.68              0.0%     1   24.2  24.2  24.2  24.2   0.0

the -n option tells mtr to show the router addresses numerically rather than as hostnames (more about this next time)

(the exception is the origin: "rivendell.hiddenrock.com" is the name of the machine sending the request… because I’m a nerd)

the -r option says to just print out a report rather than provide an interactive interface

the -c1 option is the same as for ping: just send a single request

we can cross-reference the output of mtr to the packets observed by Wireshark

an ICMP echo request with TTL=1 is sent, and an ICMP TTL exceeded message from 172.24.0.1 comes back

then TTL=2, error from 96.120.70.93

etc, etc

eventually, mtr gets an ICMP echo response back instead of TTL exceeded, which tells it that it has reached its destination


some interesting things to note:

if we get rid of the -rc1 parameters, we can let mtr just keep sending out packets and updating the statistics

this can give us some idea of the speed and reliability of each hop in the route

if we do so, we might even see multiple routes!

that is, different packets might take different paths through the Internet

mtr will show that as multiple IP addresses associated with, eg, hop 6


let’s now get replace the -n option with the -w option, which asks mtr to show long names instead of IP addresses where possible:

$ mtr -wrc1 www.google.com
Start: 2023-03-26T15:16:59-0400
HOST: rivendell.hiddenrock.com                       Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- echoriath.hiddenrock.com                        0.0%     1    0.3   0.3   0.3   0.3   0.0
  2.|-- 96.120.70.93                                    0.0%     1   15.3  15.3  15.3  15.3   0.0
  3.|-- 24.124.211.209                                  0.0%     1   10.9  10.9  10.9  10.9   0.0
  4.|-- 162.151.149.89                                  0.0%     1    9.0   9.0   9.0   9.0   0.0
  5.|-- be-325-ar01.needham.ma.boston.comcast.net       0.0%     1   19.8  19.8  19.8  19.8   0.0
  6.|-- be-32031-cs03.newyork.ny.ibone.comcast.net      0.0%     1   27.7  27.7  27.7  27.7   0.0
  7.|-- be-3311-pe11.111eighthave.ny.ibone.comcast.net  0.0%     1   23.4  23.4  23.4  23.4   0.0
  8.|-- 50.242.150.62                                   0.0%     1   21.5  21.5  21.5  21.5   0.0
  9.|-- 108.170.248.1                                   0.0%     1   22.7  22.7  22.7  22.7   0.0
 10.|-- 209.85.253.143                                  0.0%     1   22.7  22.7  22.7  22.7   0.0
 11.|-- lga25s81-in-f4.1e100.net                        0.0%     1   26.7  26.7  26.7  26.7   0.0

(in this particular output, interesting to note that the response to the packet with TTL=4 came back super fast; this is not the case in subsequent runs—the Internet is an unpredictable place!)

unsurprisingly, because Comcast is my Internet Service Provider (ISP), some of these hops are identified with names under their jurisdiction

now a second, identical run:

$ mtr -wrc1 www.google.com
Start: 2023-03-26T15:18:10-0400
HOST: rivendell.hiddenrock.com                          Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- echoriath.hiddenrock.com                           0.0%     1    0.3   0.3   0.3   0.3   0.0
  2.|-- 96.120.70.93                                       0.0%     1   12.9  12.9  12.9  12.9   0.0
  3.|-- po-304-1221-rur01.williston.vt.boston.comcast.net  0.0%     1   11.7  11.7  11.7  11.7   0.0
  4.|-- be-325-ar01.needham.ma.boston.comcast.net          0.0%     1   22.4  22.4  22.4  22.4   0.0
  5.|-- be-32041-cs04.newyork.ny.ibone.comcast.net         0.0%     1   22.4  22.4  22.4  22.4   0.0
  6.|-- be-3412-pe12.111eighthave.ny.ibone.comcast.net     0.0%     1   26.0  26.0  26.0  26.0   0.0
  7.|-- 96-87-11-70-static.hfc.comcastbusiness.net         0.0%     1   26.6  26.6  26.6  26.6   0.0
  8.|-- 142.251.225.89                                     0.0%     1   28.4  28.4  28.4  28.4   0.0
  9.|-- 142.251.64.7                                       0.0%     1   29.9  29.9  29.9  29.9   0.0
 10.|-- lga25s78-in-f4.1e100.net                           0.0%     1   24.2  24.2  24.2  24.2   0.0

different path!

(also different destination: ignore that for now)

not only that, but first the packet went to Williston (which is north of Middlebury) before going south to Boston and thence to New York

(assuming, of course, that we trust the names—in this case, I’m inclined to place a fair amount of trust in them: they are in place to help Comcast manage their own network, and thus I expect them to selfishly maintain their accuracy)


if we do the same thing with www.amazon.com, we get a different result:

$ mtr -wrc1 www.amazon.com
Start: 2023-03-26T15:20:28-0400
HOST: rivendell.hiddenrock.com                          Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- echoriath.hiddenrock.com                           0.0%     1    0.3   0.3   0.3   0.3   0.0
  2.|-- 96.120.70.93                                       0.0%     1   14.5  14.5  14.5  14.5   0.0
  3.|-- po-304-1221-rur01.williston.vt.boston.comcast.net  0.0%     1   13.7  13.7  13.7  13.7   0.0
  4.|-- be-325-ar01.needham.ma.boston.comcast.net          0.0%     1   21.4  21.4  21.4  21.4   0.0
  5.|-- 69.241.35.90                                       0.0%     1   19.1  19.1  19.1  19.1   0.0
  6.|-- ???                                               100.0     1    0.0   0.0   0.0   0.0   0.0
  7.|-- ???                                               100.0     1    0.0   0.0   0.0   0.0   0.0
  8.|-- ???                                               100.0     1    0.0   0.0   0.0   0.0   0.0
  9.|-- ???                                               100.0     1    0.0   0.0   0.0   0.0   0.0
 10.|-- ???                                               100.0     1    0.0   0.0   0.0   0.0   0.0
 11.|-- 150.222.71.101                                     0.0%     1   19.0  19.0  19.0  19.0   0.0
 12.|-- server-18-161-19-226.bos50.r.cloudfront.net        0.0%     1   19.9  19.9  19.9  19.9   0.0

first, hops 6 through 10 declined to even send back a TTL exceeded message, reflected by both the "???" values in the IP/name column and the "100.0" values in the Loss% column

as with one of the paths to Google, first the packet went north to Williston and then south to Boston

but unlike the path to Google, it does not appear to have gone on to New York


we could do all sorts of experiments mapping out the Internet like this, but the aspect I want to focus on is that different collections of routers (because that’s what piece of hardware exists at each hop) are managed by different organizations

the traceroutes above show that my packets traverse several routers owned and operated by Comcast before escaping to the Internet at large

except calling it "the Internet at large" is misleading, because that just means routers owned and operated by other organizations

(remember that "The Internet" is "just" a collection of interconnected networks)

and all of these routers are constantly making routing decisions: whenever a packet arrives at a router, that router consults its routing table, performs the transformations we talked about a few lectures ago, and sends the packet on its way

which begs the question: how do routes get into these routing tables?

especially given that, as we saw above, Comcast’s routers have to know both a) which other network can get my packet to its destination and b) how to route the packet within the Comcast network so that it can hand it off to that other network


vocabulary time

an autonomous system (AS) is a collection of networks and routers that is all under the same administrative control

example: Comcast operates an autonomous system

the mechanism that determines the contents of routing tables within an AS is called an interior gateway protocol (IGP) (sometimes you’ll also hear "intradomain gateway protocol")

every router within a given AS will use the same IGP (using the same routing protocol is, in fact, part of what defines a particular AS)

the mechanism that determines routes between autonomous systems is called an exterior gateway protocol (EGP)

("gateway" is an old synonym for "router")

we will look at examples of these shortly


first we need to refine our idea of autonomous systems

from the preceding lectures, you may have gotten the idea that The Internet is comprised of bazillions of routers and hosts

this is not wrong, but if we zoom out, and imagine that every router and host belongs to some autonomous system (accurate), we can also view The Internet as being comprised of a much more modest number of autonomous systems, many of which are connected to other autonomous systems

you could almost imagine it like each autonomous system is a cloud, and sometimes one cloud touches another cloud, meaning that the two clouds can exchange packets

(note that this is not "The Cloud", as in "my data is stored in The Cloud", but the notion of using clouds to visually represent networks has been around for ages and in fact forms the basis for the modern idea of "The Cloud")

so as the above traceroutes show, the source and destination for a given packet may lie in different clouds

there may also be several clouds (ASes) between the cloud (AS) containing the source and the cloud (AS) containing the destination

"local traffic" refers to packets for which either the source or the destination lies within the current AS

by contrast, "transit" means packets for which neither the source nor the destination lies within the current AS

example: if my computer is in AS 1, sends a packet to a host in AS 5, which passes through routers in ASes 2, 3, and 4 on its way, that packet is considered "local traffic" to AS 1 and AS 5, and considered "transit traffic" in AS 2, 3, and 4


more vocab

a stub AS is an AS that a) only carries local traffic and b) is connected to exactly one other AS

a multihomed AS is an AS that a) only carries local traffic and b) is connected to more than one other AS

a transit AS is an AS that a) carries transit traffic, b) possibly also carries local traffic and c) is connected to more than one other AS


before digging into routing protocols, we need to shift our perspective a bit more

to this point, when we’ve discussed networks, they’ve been Ethernet networks, where a bunch of hosts are all on the same network and maybe one or a few of those hosts are also routers, which pass packets to other networks

these kinds of networks definitely exist, but mostly at the periphery of autonomous systems

in the core of autonomous systems, there are a bunch of routers and these routers are usually connected by point-to-point links

so machines that humans use directly may exist on an Ethernet upon which a router also lives, but the "other networks" to which that router is connected may be point-to-point links instead of Ethernets


now let’s talk about routing protocols

these are the protocols used to populate routing tables

my intent here is not to give you a deep, nuanced understanding how these work, but rather give you a sense of their broad strokes: the kinds of problems that arise, the kinds of solutions that have been used, and the factors that come into play


the classic internal gateway protocol is the Routing Information Protocol (RIP)

it hasn’t been used in decades, but it’s a good place to start on concepts

it is pretty much just the Bellman-Ford algorithm for single-source shortest path in a graph, except every router in the network is running the algorithm independently

in the context of networks, the generic term is "distance-vector routing" for reasons that will become clear soon

every router participating in RIP will maintain a table containing records with three pieces of information

ie, it keeps track of the distance to the network and the direction (vector) in which packets destined for that network should be sent

every router will periodically (every 30 seconds) share the contents of this table with its directly-adjacent neighbors

let’s say that router A has the following record in its table: "network 192.168.93/24 is reachable through router B using 3 hops"

router A is directly connected to routers B, C, and D

router A sends the aforementioned record to routers B, C, and D

upon receipt of that record, routers B, C, and D conclude that they can reach 192.168.93/24 through router A with 4 hops

routers B, C, and D look at their own tables and see if this is a smaller number of hops than their previous route to 192.168.93/24; if it is, they replace the previous route with this new one; if it’s not, they drop it

since every router is periodically sending out these announcements, eventually all routers will converge on a good collection of routes

or so we hope…


there are some big problems with this algorithm

the first problem is that it takes some time to re-stabilize in the case of link problems

so if an important link goes down, that fact will take quite a while to propagate, and many routers in the system will continue to send packets in its direction with no chance of success

the second problem is that, when router B receives a record from router A, router B can’t tell whether the route within goes through router B itself

this leads to a problem known as "count-to-infinity", in which router C advertises a route to B, except that route goes through B itself, and so we get routing tables with ever-increasing numbers of hops that just ping-pong the packets back and forth

the third problem with RIP is that using hops as a metric lacks sufficient nuance

a coast-to-coast point-to-point connection would have a metric of 1, as would a 3-foot Ethernet cable

clearly one of these should be preferred over the other, but RIP does not permit that sort of information to be encoded


RIP was retired in 1979 for the preceding reasons, which were really laid bare because networks had grown large enough and varied enough that they became fatal

the replacement is Open Shortest Path First (OSPF), and is still in use today

OSPF maintains a notion of every router announcing stuff to its neighbors

but the stuff in question is the whole routing table, not just distances

so eventually every router in the network learns the entire network topology

the second big improvement is a more nuanced metric: it’s not just hops


remember that these are internal gateway protocols: used for managing routing tables within an AS

external gateway protocols manage routing tables between ASes

when two ASes connect to each other, this means that a router in one AS has a link to a router in a second AS

because these routers exist on the periphery of their respective ASes, they are often called "border routers"

it is often desirable for ASes to connect to several other ASes

to facilitate that, there exist facilities called Internet Exchanges, which are basically well air-conditioned buildings to which many different ASes run links and house routers, and then run links between the various routers to connect the ASes to each other

on the business side, a peering agreement is made between AS operators that establishes the terms and conditions for those ASes to exchange packets

historically, many peering agreements involved no money exchanging hands: the idea was that hosts within both ASes were benefitted by connecting the two ASes directly that both AS operators were incentivized to do so

(many years ago, Verizon made a stink about Netflix and tried to get them to pay extra in a peering agreement, contending that Netflix was getting more benefit from peering and therefore ought to pay)


the original EGP was called EGP

now, the only one that exists is BGP and it’s a beast

it’s super complicated, and frankly outside the scope of an undergraduate networks course

one of the reasons it’s complicated is that it transcends purely technical concerns

to understand why, let’s first consider the high-level information that BGP propagates

this part is fairly simple: BGP only advertises reachability

imagine this setup:

+-----------+   +-----------+
|   AS 01   |   |   AS 02   |
| 1.20.3/24 |   | 40.5.6/24 |
+-----+-----+   +-----+-----+
      |               |
+-----+-----+   +-----+-----+
|   AS 03   |   |   AS 04   |
| 7.8.93/24 |   | 2.90.1/24 |
+---------+-+   +-+---------+
          |       |
        +-+-------+-+
        |   AS 05   |
        | 19.2.6/24 |
        +-----------+

AS 01 sends this advertisement to AS 03: "if you have packets for 1.20.3/24, send them to AS 01"

AS 03 then sends this advertisement to AS 05: "if you have packets for 1.20.3/24, send them through AS 03 and AS 01"

there is no metric involved here, again because BGP only advertises reachability

now the complicated part

it could very well be that the collection of ASes forms a graph: there may be multiple paths between pairs of ASes

and so there are choices to be made

some paths may be preferred for financial reasons (one peering agreement is more beneficial than another)

some paths may be preferred for political reasons (an AS in Country A may not want its traffic flowing through an AS in Country B, even though there are several intermediate ASes in the path)

and so on


pathological real-life example:

in 2008, Pakistan decided that Youtube was evil and should not be accessible to its citizens

to make this happen, they configured their routers to advertise routes for Youtube that resulted in all packets being discarded

worldwide

nobody on Earth could access Youtube until this was fixed

(Cloudflare has a page on BGP hijacking)


all of this is to say that the Internet is horrifically complicated and the fact that it is so shockingly reliable for delivering such an absurd amount of traffic day-in and day-out is more than a little mind-blowing

Last modified: