CS 431 Lecture 01 – introduction and overview

pete > courses > CS 431 Spring 25 > Lecture 01: introduction and overview

Lecture 01: introduction and overview

Goals

describe the purpose of a protocol
describe the notion of separation of concerns
describe the purpose of the BSD sockets interface

administrivia

who is not officially enrolled?

go/cs431

no textbooks, everything online

course goal: "know" the various protocols by which the Internet "works"

in practical terms, be able to describe the core protocols of the Internet:

purpose of each
what they do and do not claim to do
how they achieve these claims

you will demonstrate your understanding by implementing them in code

rewind a bit: what’s a protocol?

dictionary.com: "the customs and regulations dealing with diplomatic formality, precedence, and etiquette"

this is surprisingly applicable

rephrased, it’s a set of rules for groups to get something done

in the context of this class, it’s a set of rules for computers to accomplish some task

what are some high-level tasks computers use networks to accomplish?

think about the applications you use that use the network

web
email
games

and in fact there are protocols that dictate how these work

web: HTTP
email: SMTP
games: often proprietary

but web and email (and many other applications) share some requirements

reliability
orderedness
timeliness

so it makes sense to gather all the logic that fulfills those requirements and put it in a common place for the "higher-level" protocols to use

Protocols

this is exactly what happens

and in fact there are multiple instances of this happening

HTTP is build on TCP, is built on IP, is built on Ethernet, etc

each protocol makes some claims/promises/guarantees about services it provides

so that other protocols can be built on top of them

and not have to worry about certain aspects of communication

for instance, IP takes care of getting data from one machine to another over the Internet

so TCP doesn’t have to worry about this

but IP doesn’t make sure the data sent arrives in order

so TCP does have to worry about that

don’t worry about internalizing or remembering any of this: it’s only an overview and we will get very specific about the services provided by these protocols

this is a design philosophy called separation of concerns

intentionally and explicitly delegating responsibility for different functionality to different components

it’s a way to make it easier to:

reason about a system
write and maintain code that implements it

otherwise you’d be left with a monolithic code base where everything was mixed together, which would be a pain to write and even worse to debug

this may remind you of the layers of abstraction we looked at in Computer Architecture, and you’re exactly right

in Architecture, though, we were mostly concerned with computational abstractions: that is, what building blocks do we need and how to we combine them to make computation happen

we also talked about data abstractions: if we’ve got the ability to store zeroes and ones, how can we represent more complex data using that foundation?

that led to discussion of data formats like unsigned integers, two’s complement, ASCII, etc

in this class, we’re concerned with communication abstractions: what building blocks do we need and how can we combine them to make communication happen

just as in Architecture, we’ll start with physics and work our way up

unlike Architecture, however, there will be more options at any given layer of abstraction, as shown in the diagram

speaking of the diagram…

note that there are some rows here

somebody decided a long time ago that they were going to categorize protocols

and declared that transport protocols are responsible for some set of features and network protocols are responsible for another

this is a very loose ontology

ultimately, the categorization doesn’t matter

what matters is how all the parts fit together

but you need to know these names

you also need to know about the existence of the OSI model

somebody took this idea of categorization to the extreme

and came up with a 7-layer model that nobody really uses

the bottom 4 layers match, but there are others higher up

(see: shirt)

we’ll talk more about this too, but mostly dismissively

still, it’s an important model because everybody in this space knows about it, so you need to know what it tries to describe, but you also need to know how modern reality deviates

we’ll see this as the semester unfolds

what does a protocol look like?

defined in a document called an "RFC": Request for Comment

(a bit weird because, by the time it becomes an official RFC, it’s not really up for comment)

who decides what makes up a protocol? committees, IETF

who owns these protocols? usually nobody

"open" protocols (which aren’t restricted) are usually more successful, because it means anybody can implement them, which leads to more products, which leads to more widespread adoption of the protocol

so where does this code live?

at the end of the day, we care about a process (eg, Firefox) sending data over the network

in this sense, "the network" is accessed through a physical piece of hardware that is connected to the Internet

for most of you, this is likely a radio (802.11 or cellular)

for simplicity, we’re going to start with Ethernet

which is still used on desktop machines for various reasons we’ll get into later

so there’s a piece of hardware, the Ethernet port, that’s our access point to the network

and a process wants to send some data over the network

recall from Systems Programming that processes can’t just access hardware willy-nilly (for reasons of both efficiency and security)

instead, a process hands off its request to the kernel (using a syscall), and then the kernel accesses the hardware on behalf of the process

syscalls

there are a set of syscalls specifically for networking

this set was defined many years ago by the authors of BSD

hence it’s called the BSD sockets interface

instance of an API

there are other interfaces (eg, Winsock, historically)

what is BSD?

that’s a question of history, with a long and convoluted answer

summed up in part by the diagram shown on Wikipedia: History of UNIX

the short answer is that UNIX was originally developed at Bell Labs (the research arm of AT&T, before its monopoly was broken up)

the code has lived on and fragmented and evolved

one of the big fragments started with work by the Computer Systems Research Group at UC Berkeley, which took the UNIX code and developed it further, to produce the Berkeley Software Distribution, first released in 1978

in the ensuing decades, different groups had different goals and interests in software development, and those groups have taken the original BSD code in different directions

notably, the first version of the network system calls appeared in 4.2BSD (1983)

there’s more nuance to the different efforts than can be succinctly summed up, but the veeeery short story is that OpenBSD strives for security, NetBSD strives for portability (run on as many architectures as possible), FreeBSD strives for performance, DragonflyBSD strives for user experience

you’ll note there’s no direct lineage from the original UNIX to Linux, and that there’s a dashed line separating Linux from the rest

this is because Linux did not inherit any of the original code: Linus Torvalds sat down and wrote a system that had the same interface (ie, implemented the same system calls and such) without knowledge of the UNIX code

the BSDs are actually pretty important, though they don’t have Linux’s mindshare

Apple forked the FreeBSD code to form the basis of OS X

as did Sony for the Playstation operating system

Netflix uses FreeBSD for lots of its servers

we’re going to use FreeBSD in this course, for a few reasons

first, variety: you’ve already used Linux, now you’ll use something slightly different, and hopefully come to understand both better

second, quality: the FreeBSD code is cleaner, more understandable, and better documented than the Linux code

third, portability: I’ve found that getting a consistent virtual-machine experience across Mac and Windows hosts is easier with FreeBSD

fourth, history: BSD is important, historically, and I think it’s good for students taking senior-level systems electives to have been exposed to it

back to the protocol diagram again…

some of these protocols can be implemented entirely in hardware (ie, a dedicated chip that implements the behaviors specified by the protocol)

some of them can be implemented entirely in software

some of them require both hardware and software

we’ll be looking at both over the course of the semester: for a given layer, we’ll talk about the functionality it expects from the lower layer, and the promises it makes to the higher layer, and how it might be implemented

we’ll talk about existing software that implements it

and we’ll also talk about hardware that implements it

for instance, my guess is that many/most/all? of you have a device at home that your family leases from your Internet service provider that allows your devices to connect to the Internet

we’ll understand exactly what that device does

also, in some cases, you’ll implement the protocols yourselves

assignment 0, due Wednesday: install FreeBSD in a virtual machine

needs to be a virtual machine because we’ll need root access