Building HTTP from Scratch


See the code on GitHub at https://github.com/JedidiahMiller/my-http

This project initially started as a part of another project to build WebSockets from scratch. Upon digging into how websockets worked, it became quickly apparent that HTTP was a core piece. This spawned a project that became a bit more general purpose than WebSockets: building an HTTP server from scratch.

The first step was to build a basic server that could handle TCP.

int server_fd = socket(AF_INET, SOCK_STREAM, 0);
    
struct sockaddr_in address;
address.sin_family = AF_INET;
address.sin_addr.s_addr = INADDR_ANY;
address.sin_port = htons(80);

// Binding could error if the port is in use, etc.
bind(server_fd, &address, sizeof(address));

listen(server->server_fd, 10);

while (true) {
    int client_fd = accept(server->server_fd, NULL, NULL);

    // Handle request
}

This code creates a socket to handle any clients attempting to connect via port 80 (the default HTTP port). In the loop, the accept function waits for a client to connect, and then returns a file descriptor pointing to the socket where the client is waiting to transmit data.

One of the nice things about implementing HTTP is that the protocol works completely with readable text data, meaning it is easy to debug input. The exact format of the messages in HTTP/1.1 is outlined mainly in the IETF standards document RFC 7230. While I intended to follow the standard, I also realized that implementing a fully standards-compliant server was really not necessary.

Do I need to make sure I support all the types of files that can be sent via HTTP? Not really. Do I really need to dig deep into the standard to figure out all the exact characters that are allowed in header values? Hopefully not.

All I needed to get the bragging right of “I made an HTTP server from scratch” in my mind was to get a server that could send “Hello world!” to a browser.

That being said, there is a decent amount of things that I needed to do.

The first step was to read the full request in. Since HTTP works just with text, I read the whole request in as one big string.

// Max message size + null terminator
char buffer[MAX_MESSAGE_SIZE + 1]; 

int recv(client_fd, buffer, MAX_MESSAGE_SIZE + 1, 0);

int message_length = recv(client_fd, buffer, MAX_MESSAGE_SIZE + 1, 0);

if (message_length < 0) {
    printf("Failed to read message from client\n");
    exit(1)
}
if (message_length > MAX_MESSAGE_SIZE) {
    printf("Client's message was too big\n");
    exit(1)
}

buffer[message_length] = '\0';

Now that we have a big string, we need to read the first line. The standard states that the first line should follow the form “method SP request-target SP HTTP-version CRLF” where SP is the space character and CRLF is \r\n.

After splitting the entire string on “\r\n” (the HTTP newline separator), I parsed the first line using

sscanf(line, "%s %s %s", method, target, http_version);

After the first line, the header lines come in. Each header is on its own line and has the structure “field-name ’:’ OWS field-value OWS” where OWS is optional whitespace. I processed each line rather simply using

sscanf(line, " %[^:]: %s\r\n", key, value);

While this does not handle the O part of OWS, it will do for the majority of nicely formatted headers.

The list of headers ends with an empty line. After this empty line, the rest of the request is just the body and can be read directly.

The entire request has been processed! Now what?

The HTTP standard requires a call and a response. Just because the request came in successfully does not mean that the protocol has been completed. HTTP requires that every request have a response. If you were to run the code as it is now, the client would end up hanging indefinitely until it finally gives up hope and shows you an error.

We should give hope to the hopeless clients.

The format of an HTTP response is very similar to the format of the request. The first line of a response has the structure “HTTP-version SP status-code SP reason-phrase CRLF”. The HTTP version will remain “HTTP/1.1”. The status-code is one of the many 3-digit HTTP status codes. The reason-phrase is a relic of the past, and RFC 7230 states that behavior should not depend on it. For our purposes, it is just a convenient place to put some text for debugging.

Through a bit of handwaving magic, I added some data structures and processing to better handle headers and messages. This allowed me to pretty easily generate the response message using

// First line
snprintf(
    result + strlen(result), 
    MAX_LINE_LENGTH, 
    "HTTP/1.1 %d %s\r\n", 
    response->status_code, 
    response->reason_phrase
);

// Headers
HttpHeaderListItem *current = response->header_list.head;
while (current != NULL) {
    snprintf(
        result + strlen(result),
        MAX_LINE_LENGTH,
        "%s: %s\r\n",
        current->header.name,
        current->header.value
    );
    current = current->next;
}

// End headers with empty line
snprintf(
    result + strlen(result), 
    MAX_LINE_LENGTH, 
    "\r\n"
);

// Body if it exists
if (response->body != NULL) {
    snprintf(
        result + strlen(result), 
        MAX_LINE_LENGTH, 
        "%s",
        response->body
    );
}

A lot of structures have come out of nowhere! result is the string that is being built up to send back to the client. current holds a node in a linked list that holds the headers. response is an instance of a struct that holds all the parts that make a complete HTTP response.

The only thing that is left to do now is send that response back to the client!

send(client_fd, response_str, strlen(response_str), 0);

At this point, we are done! We have created an HTTP server that can return a message when visited by a web browser.