Under the hood

One of the most important reasons the Internet is both cool and interesting is its modularity. Different layers of the network model do their thing, without needing to care what the layers above or below them are doing. It makes writing Internet-based applications relatively straightforward, which means we get a lot of cool, shiny toys to play with. This is, of course, a Good Thing ™.

For technology geeks, though, this modularity also means that individual pieces of technology can be learned in relative isolation. You don’t have to know or care how (most of) the rest of the stack works in order to teach yourself a new piece of the puzzle.

(A brief and more-or-less-correct description of how the Internet does its thing follows. Feel free to read, skim, or ignore it; the “cool stuff” is below.)


There are generally two basic approaches to learning how the stack of Internet protocols works: “bottom-up” and “top-down.” At the bottom of the stack is the physical layer — the wires, fiber optic cables, or radiofrequency links that actually move the information around. The next few layers deal with the electronics that translate the information back and forth and get it where it’s going. Higher up are layers that handle authentication (where needed), and the Application Layer — which is concerned with actually doing the task at hand. Information travels up and down these layers every time any action is initiated on the Internet (or other TCP/IP network).

For instance, suppose you click on a link to this blog. Your browser, at the Application layer, translates this into a request to the appropriate web server (www.paleotechnologist.net). It then uses knowledge of HTTP (HyperText Transfer Protocol) to formulate a request in a way the webserver can understand. Once this specific, formal request is created, the browser passes it to your computer’s network protocol stack, which looks up the IP address of the webserver, packages the request into a TCP/IP packet, and sends it off to your computer’s default gateway router, with its destination marked as Port 80 at the webserver’s IP address. The packet passes through your computer’s network adapter (or modem, if you’re on dial-up), to the gateway, which starts routing it across the Internet.

Once the packet gets where it’s going, the information makes its way back through the layers in reverse order. The network card on the webserver computer receives the request, strips out the relevant information, and passes it to the webserver program. This program inspects it and finds that it understands the request, gets the information requested from local memory or the hard drive, and creates a reply packet (or many reply packets) containing the requested information. These go back through the same process to your computer, where the information is processed into a web page.

The amazing part is that all of this (usually) happens in a few milliseconds’ time.

The HTTP protocol is one of those things that don’t always get a lot of attention. It does its thing, so most folks don’t think about it. I recently came across a post on Slashdot that suggested taking a look at the HTTP headers being sent out by the Slashdot web server. It suggested running the following command in Linux:


echo -e “HEAD / HTTP/1.1\nHost: slashdot.org\n\n” | netcat slashdot.org 80 

Being an inquisitive sort (and not seeing anything really harmful-looking in the command), I decided to try it out. As it turns out, the Linux box I have available doesn’t have netcat installed, so I decided to take the lazy way out and try telnetting to Port 80 using Windows. ( C:\> telnet slashdot.org 80 )

Amazingly, it actually worked. Here’s what I sent:


HEAD / HTTP/1.1 <enter>
Host: slashdot.org <enter>
<enter>

…and I received:


HTTP/1.1 200 OK
Server: Apache/1.3.41 (Unix) mod_perl/1.31-rc4
SLASH_LOG_DATA: shtml
X-Powered-By: Slash 2.005001
X-Bender: The laws of science be a harsh mistress.
X-XRDS-Location: http://slashdot.org/slashdot.xrds
Cache-Control: no-cache
Pragma: no-cache
Content-Type: text/html; charset=iso-8859-1
Content-Length: 99945
Date: Sat, 12 Sep 2009 16:14:08 GMT
X-Varnish: 1558400757 1558400652
Age: 6
Connection: keep-alive 

First of all, I just impersonated a Web browser — and you can, too! How cool is that? You open a Telnet connection to the webserver’s Port 80, send a properly-formatted request for information, and it replies.

The other interesting part — a little wasteful but still cool — is the “X-Bender:” header. Slashdot’s webservers apparently insert interesting little quotes from Bender and other staples of geek culture into their HTTP headers. (If the Internet is the Information Superhighway, think of this sort of thing like the “wash me” and “I may be slow, but I’m ahead of you” messages you sometimes see written in the dust on the back of eighteen-wheelers. The Muggles shopping at the Wal-Marts where the trucks deliver their goods will never see these messages, but they’re there if you know where to look. Geeks being geeks, the dark steam tunnels of Internet protocols are no doubt full of such things.)

Finally, a bit of philosophy. The reason it all works — and the reason that it’s so open and free — is exactly this sort of modularity. Why is playing with the HTTP protocol cool? Not because it’s particularly interesting, but because, out there, there’s a document that shows the publicly-available, agreed-upon protocol for requesting information from a webserver — and if this protocol is followed, it works no matter who wrote the webserver, what country it’s in, what language the programmers used (or speak), or what information is on the server. HTTP, in turn, rests on other, similar layers, with other, similar protocols. With the proper (freely available) documentation in hand, you could build your own network from scratch — blowing glass to make vacuum tubes and designing your own computer, and connect it to a cable or DSL modem in your house. If you did everything right, it would work. No magic, no proprietary “sorry, you don’t get to see how this bit works,” no smoke-and-mirrors. If you use an open-source OS like Linux, you can (in theory) literally understand every movement of every bit, all the way through the whole process. It’s both a great way to learn about technology — and a great way to show just how cool freedom is.

We need to make sure it always stays this way.

This entry was posted in Coding, System Administration. Bookmark the permalink.

One Response to Under the hood

  1. Dosquatch says:

    ((wild applause)) A very nice wading-pool overview!

    There's a lot of glossing over – not surprising when there are 3" thick volumes on the OSI model. One thing that really should be mentioned about layers, though, is that they are abstract constructs, not firm points of delineation. It helps to think of functions in logical chunks, even when reality might not exactly line up.

    Like your South African pigeon, for instance – depending on how one looks at it, the pigeon is either or both of Layer 1 and Layer 2, *maybe* even a tad bit of Layer 3… but I'm not sure how they know which way they're going, so I'm not sure.

Leave a Reply