Bloom: The making of a new HTTP server

I was browsing the release notes of JDK 21, and something caught my eye: Project Loom became stable. Loom (or officially JEP 444) is a project that aims to add green threads to the JVM.

Green threads

Green threads are lightweight threads managed by the runtime instead of the OS. Generally, when using green threads, you can write blocking code without worrying about blocking the underlying thread.

When you are using green threads, the runtime detects situations where the thread would be blocked (e.g., waiting on a network request, a database query, a file read, etc…), takes away the underlying OS thread, and gives it to another green thread which was freed up.

This way, you can maximize the usage of your OS threads (heavy and expensive) by using green threads (lightweight and cheap).

Let’s build an HTTP server, then…

I’m interested in this challenge, as I expect the implementation to be pretty straightforward, but I’m confident there will be some head-scratching in the process.

When thinking about what should be achieved here, the following requirements come to mind about the server:

Only uses blocking code
Code should be as idiomatic as possible
Have “good” performance

Req #1: Only uses blocking code

This is the main requirement: I only want to write blocking code. We are going to leverage Loom to make sure that we can use blocking code without wasting threads.

Req #2: Code should be as idiomatic as possible

This one is not about the threading model per se, but I’m interested in how far you can push idiomatic, standard library-heavy code performance-wise.

Req #3: Have “good” performance

This one is vague, but I want to make a server that can handle the churn. My ballpark estimation would be in the [1-10]000 RPS¹ range.

Overall, I would be happy if I could get in the same magnitude as Spring (measured against the different backends).

The fun part: RFC 9110 + RFC 9112

Of course, when the time comes to implement an HTTP server, you have to pick the official spec from the shelf: RFC 9110 and RFC 9112. It is the latest of the long line of RFCs that defines the HTTP protocol, originating with the RFC 2068.²

I’m not going to pretend that I’ve read RFC 9110 from cover to cover (though I should), but this exercise is excellent to motivate me to read through it piece by piece!

Let’s get started ☕️!

So, to get started, we need something that can accept connections. Looking at the ServerSocket class, I found what I was looking for: a good ol’ blocking socket.

After this, we can make our first attempt to at least accept a connection:

public class HttpServer {
    private final int port;

    // Constructors, getters, setters, etc...

    public void start() {
        // 1. Create a server socket and wrap it into a try-with-resources block
        try (var serverSocket = new ServerSocket(port)) {
            // 2. Loop (technically forever)
            while (!serverSocket.isClosed()) {
                // 3. Accept a connection
                var socket = serverSocket.accept();
                // 4. Wrap the socket into our handler
                var connection = new HttpConnection(client, handlers);
                // 5. Start a new green thread to handle the connection
                Thread.ofVirtual().start(connection);
            }
        } catch (IOException ex) {
            throw ex;
        }
    }
}

Not much code so far! Of course, we omitted the elephant in the room, handling HTTP, but you can reason about the code much easier than, e.g. an NIO-based implementation with Selectors, Channels and whatnot.

Draw the rest of the owl (i.e. handling HTTP)

We finished our previous conversation leaving out the details of HttpConnection, so let’s see what it looks like:

public class HttpConnection implements Runnable {

    private final Socket socket;
    // RequestHandler is a simple interface:
    // Takes a request and a response and does something with them
    // We so far can only handle one type of request
    private final RequestHandler handler;

    // Constructors, getters, setters, etc...

    @Override
    // If you know the RFC, you will probably see a blatant mistake here
    // Don't worry, we will fix it later
    public void run() {
        // 1. We wrap the socket into a try-with-resources block
        try (socket) {
            // 2. We parse the request
            var request = HttpRequestParser.parse(socket.getInputStream());
            // 3. We create a response
            var response = new HttpResponse(socket.getOutputStream());
            // 4. We handle the request
            handler.handle(request, response);
            // 5. We send the response
            response.send();
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
    }
}

It might seem strange if you’re not too familiar with HTTP: why do we handle the response before sending it? Isn’t it better to send the response as soon as we can?

Well, it turns out that you need to know how long the response is going to be before you send it³: The HTTP spec requires you to send the Content-Length header, which is the length of the response body in bytes. [RFC 9110 §8.6]

On the server side, this is unfortunate: you need to buffer the whole response before sending it.

The request parser

Meanwhile, interesting in itself, I don’t want to pad the post with the whole implementation of the request parser. Instead, here are the highlights of what I learned from the RFC while implementing it:

The only methods you must support are GET and HEAD. Everything else is optional. [RFC 9110 §9.1]
Technically, there is no such thing as a header. Rather, the header is a collection of fields that are separated by newlines. [RFC 9112 $2.1]
Keep-alive is the default, and you need to explicitly ask for a connection to be closed. [RFC 9112 §9.3]
- This is the blatant mistake I mentioned earlier: our implementation should implement this, but it doesn’t. We will see how this bites me later.
- This is one of the key differences between HTTP/1.0 and HTTP/1.1 – HTTP/1.0 closes the connection by default, and you need to ask for keep-alive.

Benchmark time

Setup

The following benchmark by far is not designed to be comprehensive. My target was to establish a baseline – as fast as possible – and then complicate my life later.

My first benchmark is a simple one: let’s make a very simple endpoint that does minimal work and see how many requests we can handle. My random choice of work is division: we pass a number as the last segment of the path, and the server will divide it by 2. E.g., /divide/10 will return 5.

I’ve implemented a simple Spring Boot application that does the same thing, and I’m going to compare the two. The Spring server runs on its default Tomcat backend.

All of the benchmarks (currently) are running on my laptop, a 2021 M1 MacBook Pro with an M1 Max CPU and 64GB of RAM.

As the starting load-testing tool, I’ve picked Vegeta. It seemed the simplest tool to see what happens when you drop the clutch on your webserver.

Benchmark #1: Weird numbers

The presented values are after a first warm-up run. Vegeta is configured to run for 30 seconds, with a rate of 0 (i.e., as fast as possible), and 4 workers.

# Spring Boot
> echo "GET http://localhost:8081/div/4" | vegeta attack -duration=30s -rate=0 -max-workers=4 | tee results.bin | vegeta report
Requests      [total, rate, throughput]         722307, 24076.90, 24076.75
Duration      [total, attack, wait]             30s, 30s, 191.041µs
Latencies     [min, mean, 50, 90, 95, 99, max]  68.792µs, 163.368µs, 146.083µs, 209.392µs, 251.543µs, 386.968µs, 517.392ms
Bytes In      [total, mean]                     1444614, 2.00
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:722307
Error Set:

We are interested in the throughput, which is ~24k RPS. That’s an admirable number, let’s see how we are doing:

# Bloom
> echo "GET http://localhost:8080/div/4" | vegeta attack -duration=30s -rate=0 -max-workers=4 | tee results.bin | vegeta report
Requests      [total, rate, throughput]         8187, 261.02, 236.75
Duration      [total, attack, wait]             34.581s, 31.366s, 3.216s
Latencies     [min, mean, 50, 90, 95, 99, max]  127.583µs, 15.836ms, 382.501µs, 716.486µs, 1.161ms, 6.331ms, 13.113s
Bytes In      [total, mean]                     8187, 1.00
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:8187
Error Set:

Wow, that’s a big defeat with around ~200 RPS. It seems that Spring could handle ~100x more requests than my server. I was expecting a difference, but not this much. I had the suspicion that something fishy was going on, so I started to investigate.

Just because of due diligence, I’ve also pulled out Locust, and ran the same benchmark with it. The results were similar, so my confidence in the tools being wrong was basically zero.

One thing that kept me believing that something was not right was the application console: it was printing out a lot of exceptions.

Exception in thread "connection-60554" java.lang.NullPointerException: Invalid request line: null
    at java.base/java.util.Objects.requireNonNull(Objects.java:259)
    at dev.blnt.bloom.http.request.HttpRequestParser.parse(HttpRequestParser.java:17)
    at dev.blnt.bloom.http.HttpConnection.run(HttpConnection.java:30)
    at java.base/java.lang.VirtualThread.run(VirtualThread.java:311)

The exceptions were caused by not being able to parse a request, as the connection was closed by the time the server wanted to read the request. I couldn’t phantom why a load-testing tool would do this, so I started to be weary of some tricky transport-level bug caused by my implementation.

In order to see what was happening in the comms, I pulled out Wireshark and started to look at the traffic, comparing the two servers. This is where I’ve noticed a small difference: Spring was sending a Connection: keep-alive header back, while my server was not.

Well, it turns out that this is a big deal:

First of all, we are not following the spec.
- As mentioned before, the default is to keep the connection alive, and we are not doing that. [RFC 9112 §9.3]
Regardless of being out of spec, opening a socket is an expensive operation, and we are doing it for every request.
Fun fact: I’m surprised that Vegeta and Locust did not blow up with errors, but they kept running the benchmark without complaining too much.
- Vegeta complained about closed connections, but it did not stop the benchmark.

Pitstop: fixing the keep-alive

You might recall that I’ve left an ominous comment in the HttpConnection class:

// If you know the RFC, you will probably see a blatant mistake here
// Don't worry, we will fix it later

Well, it’s time to fix it!

On the response side, keep-alive should be the default, and we don’t need to do anything.
On the request side, we need to check if there is a Connection header field and close the connection only if we see a Connection: close header.

Let’s jump into the code and patch up our mistakes:

public class HttpConnection implements Runnable {

    // ...

    @Override
    public void run() {
        var keepAlive = false;

        try (socket) {
            // +1. We've added a loop here
            //     Until we don't receive a close connection, we keep the socket open
            while (keepAlive) {
                var request = HttpRequestParser.parse(socket.getInputStream());
                // 2+. For every request, we check if we need to close the connection
                keepAlive = shouldKeepAlive(request);

                var response = new HttpResponse(socket.getOutputStream());
                handler.handle(request, response);
                response.send();
            }
        } catch (IOException ex) {
            throw new RuntimeException(ex);
        }
    }

    /**
     * Checks if the connection should be kept alive.
     *
     * @param request The request to check.
     * @return {@code true} if the connection should be kept alive,
     *         {@code false} otherwise.
     *         The default is {@code true}.
     */
    private static boolean shouldKeepAlive(HttpRequest request) {
        switch (request.fields().get("Connection")) {
            case null -> {
                return true;
            }
            case "keep-alive" -> {
                return true;
            }
            default -> {
                return false;
            }
        }
    }
}

Alright, so from this point on, we should be able to reuse the same socket for multiple requests. Now, with limp mode off, let’s see what happens.

Benchmark 1.1:

Let’s run the same benchmark again and see what happens:

# Bloom
> echo "GET http://localhost:8080/div/4" | vegeta attack -duration=30s -rate=0 -max-workers=4 | tee results.bin | vegeta report
Requests      [total, rate, throughput]         919856, 30661.90, 30661.74
Duration      [total, attack, wait]             30s, 30s, 153.958µs
Latencies     [min, mean, 50, 90, 95, 99, max]  32.333µs, 124.787µs, 115.03µs, 173.928µs, 203.59µs, 303.207µs, 57.566ms
Bytes In      [total, mean]                     919856, 1.00
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:919856
Error Set:

Now we are talking! We are suddenly in the same magnitude as Spring+Tomcat, and we are handling ~30k RPS.

Let’s not pretend that this is a fair comparison: Spring comes with a lot more baggage than my server; heck, I can only handle one type of request!

All in all, it seems that Loom is doing a good job at handling an individual (virtual / green) thread per request. That signifies to me that there is hope in using only blocking code and still having a performant server (we still have to draw a lot of that owl, though).

Benchmark #2: Spring + Netty

I was curious about how Spring would perform with a different backend, especially with Netty. Netty is a highly optimized NIO framework, and I was interested in how it would perform against my server.

# Spring Boot + Netty
> echo "GET http://localhost:8081/div/4" | vegeta attack -duration=30s -rate=0 -max-workers=6 | tee results.bin | vegeta report
Requests      [total, rate, throughput]         882719, 29423.96, 29423.82
Duration      [total, attack, wait]             30s, 30s, 146.833µs
Latencies     [min, mean, 50, 90, 95, 99, max]  76.958µs, 200.459µs, 172.047µs, 309.633µs, 375.704µs, 543.972µs, 38.478ms
Bytes In      [total, mean]                     1765438, 2.00
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:882719
Error Set:

It seems that Netty has a improvement over Tomcat, and it can handle ~30k RPS – just like the implementation with Loom. Again, Netty and Spring are doing a lot more than my server, but it’s a datapoint nonetheless, which we can use later.

Conclusions

I think I will call it a day here with my first experiment. What it did suggest to me is that there is hope in using only blocking code and still having a performant server.

The code can be found on GitHub.

RPS: Requests Per Second ↩︎
Fun to look at the organizations: RFC 2068 shows DEC and MIT/LCS. The latest RFC shows Adobe, Fastly, and greenbytes. So eventually, this web thing caught on, huh? ↩︎
Yeees, multipart responses are a thing, but let’s not complicate this further now. ↩︎