pete > courses > CS 431 Spring 25 > Lecture 01: introduction and overview
Lecture 01: introduction and overview
Goals
- describe the purpose of a protocol
- describe the notion of separation of concerns
- describe the purpose of the BSD sockets interface
administrivia
who is not officially enrolled?
go/cs431
no textbooks, everything online
course goal: "know" the various protocols by which the Internet "works"
in practical terms, be able to describe the core protocols of the Internet:
- purpose of each
- what they do and do not claim to do
- how they achieve these claims
you will demonstrate your understanding by implementing them in code
rewind a bit: what’s a protocol?
dictionary.com: "the customs and regulations dealing with diplomatic formality, precedence, and etiquette"
this is surprisingly applicable
rephrased, it’s a set of rules for groups to get something done
in the context of this class, it’s a set of rules for computers to accomplish some task
what are some high-level tasks computers use networks to accomplish?
think about the applications you use that use the network
- web
- games
and in fact there are protocols that dictate how these work
- web: HTTP
- email: SMTP
- games: often proprietary
but web and email (and many other applications) share some requirements
- reliability
- orderedness
- timeliness
so it makes sense to gather all the logic that fulfills those requirements and put it in a common place for the "higher-level" protocols to use
this is exactly what happens
and in fact there are multiple instances of this happening
HTTP is build on TCP, is built on IP, is built on Ethernet, etc
each protocol makes some claims/promises/guarantees about services it provides
so that other protocols can be built on top of them
and not have to worry about certain aspects of communication
for instance, IP takes care of getting data from one machine to another over the Internet
so TCP doesn’t have to worry about this
but IP doesn’t make sure the data sent arrives in order
so TCP does have to worry about that
don’t worry about internalizing or remembering any of this: it’s only an overview and we will get very specific about the services provided by these protocols
this is a design philosophy called separation of concerns
intentionally and explicitly delegating responsibility for different functionality to different components
it’s a way to make it easier to:
- reason about a system
- write and maintain code that implements it
otherwise you’d be left with a monolithic code base where everything was mixed together, which would be a pain to write and even worse to debug
this may remind you of the layers of abstraction we looked at in Computer Architecture, and you’re exactly right
in Architecture, though, we were mostly concerned with computational abstractions: that is, what building blocks do we need and how to we combine them to make computation happen
we also talked about data abstractions: if we’ve got the ability to store zeroes and ones, how can we represent more complex data using that foundation?
that led to discussion of data formats like unsigned integers, two’s complement, ASCII, etc
in this class, we’re concerned with communication abstractions: what building blocks do we need and how can we combine them to make communication happen
just as in Architecture, we’ll start with physics and work our way up
unlike Architecture, however, there will be more options at any given layer of abstraction, as shown in the diagram
speaking of the diagram…
note that there are some rows here
somebody decided a long time ago that they were going to categorize protocols
and declared that transport protocols are responsible for some set of features and network protocols are responsible for another
this is a very loose ontology
ultimately, the categorization doesn’t matter
what matters is how all the parts fit together
but you need to know these names
you also need to know about the existence of the OSI model
somebody took this idea of categorization to the extreme
and came up with a 7-layer model that nobody really uses
the bottom 4 layers match, but there are others higher up
(see: shirt)
we’ll talk more about this too, but mostly dismissively
still, it’s an important model because everybody in this space knows about it, so you need to know what it tries to describe, but you also need to know how modern reality deviates
we’ll see this as the semester unfolds
what does a protocol look like?
defined in a document called an "RFC": Request for Comment
(a bit weird because, by the time it becomes an official RFC, it’s not really up for comment)
who decides what makes up a protocol? committees, IETF
who owns these protocols? usually nobody
"open" protocols (which aren’t restricted) are usually more successful, because it means anybody can implement them, which leads to more products, which leads to more widespread adoption of the protocol
so where does this code live?
at the end of the day, we care about a process (eg, Firefox) sending data over the network
in this sense, "the network" is accessed through a physical piece of hardware that is connected to the Internet
for most of you, this is likely a radio (802.11 or cellular)
for simplicity, we’re going to start with Ethernet
which is still used on desktop machines for various reasons we’ll get into later
so there’s a piece of hardware, the Ethernet port, that’s our access point to the network
and a process wants to send some data over the network
recall from Systems Programming that processes can’t just access hardware willy-nilly (for reasons of both efficiency and security)
instead, a process hands off its request to the kernel (using a syscall), and then the kernel accesses the hardware on behalf of the process
there are a set of syscalls specifically for networking
this set was defined many years ago by the authors of BSD
hence it’s called the BSD sockets interface
instance of an API
there are other interfaces (eg, Winsock, historically)
what is BSD?
that’s a question of history, with a long and convoluted answer
summed up in part by the diagram shown on Wikipedia: History of UNIX
the short answer is that UNIX was originally developed at Bell Labs (the research arm of AT&T, before its monopoly was broken up)
the code has lived on and fragmented and evolved
one of the big fragments started with work by the Computer Systems Research Group at UC Berkeley, which took the UNIX code and developed it further, to produce the Berkeley Software Distribution, first released in 1978
in the ensuing decades, different groups had different goals and interests in software development, and those groups have taken the original BSD code in different directions
notably, the first version of the network system calls appeared in 4.2BSD (1983)
there’s more nuance to the different efforts than can be succinctly summed up, but the veeeery short story is that OpenBSD strives for security, NetBSD strives for portability (run on as many architectures as possible), FreeBSD strives for performance, DragonflyBSD strives for user experience
you’ll note there’s no direct lineage from the original UNIX to Linux, and that there’s a dashed line separating Linux from the rest
this is because Linux did not inherit any of the original code: Linus Torvalds sat down and wrote a system that had the same interface (ie, implemented the same system calls and such) without knowledge of the UNIX code
the BSDs are actually pretty important, though they don’t have Linux’s mindshare
Apple forked the FreeBSD code to form the basis of OS X
as did Sony for the Playstation operating system
Netflix uses FreeBSD for lots of its servers
we’re going to use FreeBSD in this course, for a few reasons
first, variety: you’ve already used Linux, now you’ll use something slightly different, and hopefully come to understand both better
second, quality: the FreeBSD code is cleaner, more understandable, and better documented than the Linux code
third, portability: I’ve found that getting a consistent virtual-machine experience across Mac and Windows hosts is easier with FreeBSD
fourth, history: BSD is important, historically, and I think it’s good for students taking senior-level systems electives to have been exposed to it
back to the protocol diagram again…
some of these protocols can be implemented entirely in hardware (ie, a dedicated chip that implements the behaviors specified by the protocol)
some of them can be implemented entirely in software
some of them require both hardware and software
we’ll be looking at both over the course of the semester: for a given layer, we’ll talk about the functionality it expects from the lower layer, and the promises it makes to the higher layer, and how it might be implemented
we’ll talk about existing software that implements it
and we’ll also talk about hardware that implements it
for instance, my guess is that many/most/all? of you have a device at home that your family leases from your Internet service provider that allows your devices to connect to the Internet
we’ll understand exactly what that device does
also, in some cases, you’ll implement the protocols yourselves
assignment 0, due Wednesday: install FreeBSD in a virtual machine
needs to be a virtual machine because we’ll need root access