Skip to content

Adeelp1/Flint

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Flint

A high-performance HTTP/1.1 server built from raw TCP sockets in Go — no frameworks, no net/http. Every layer from socket to middleware is implemented from scratch to understand what production HTTP servers actually do.


What this is

Most Go developers use net/http without knowing what it hides. Flint answers the question: what is actually happening beneath the abstraction?

Built over six weeks, Flint implements the full HTTP/1.1 request lifecycle — TCP listener, request parser, Trie-based router, middleware chain, response writer, worker pool, Keep-Alive, TLS, and graceful shutdown — without importing a single HTTP framework.


Architecture

Client
  │
  │  TCP / TLS
  ▼
┌─────────────────────────────────────────────────┐
│  TCP Listener          (server.go)              │
│  net.Listen → Accept loop → connChan            │
└───────────────────┬─────────────────────────────┘
                    │  net.Conn
                    ▼
┌─────────────────────────────────────────────────┐
│  Worker Pool           (server.go)              │
│  100 goroutines reading from buffered channel   │
└───────────────────┬─────────────────────────────┘
                    │
                    ▼
┌─────────────────────────────────────────────────┐
│  Connection Handler    (conn.go)                │
│  Keep-Alive loop · SetDeadline · bufio.Reader   │
└───────────────────┬─────────────────────────────┘
                    │
                    ▼
┌─────────────────────────────────────────────────┐
│  Request Parser        (request.go)             │
│  Request line · Headers map · Body via          │
│  Content-Length · RemoteAddr                    │
└───────────────────┬─────────────────────────────┘
                    │  *Request
                    ▼
┌─────────────────────────────────────────────────┐
│  Middleware Chain      (middleware.go)          │
│  Logger → Auth → RateLimit → Handler            │
└───────────────────┬─────────────────────────────┘
                    │
                    ▼
┌─────────────────────────────────────────────────┐
│  Trie Router           (router.go)              │
│  Static + wildcard segment matching             │
│  Method dispatch · 404 / 405 handling           │
└───────────────────┬─────────────────────────────┘
                    │  *Response
                    ▼
┌─────────────────────────────────────────────────┐
│  Response Writer       (response.go)            │
│  Status line · Headers · Body · conn.Write      │
└─────────────────────────────────────────────────┘

Each layer has exactly one responsibility. A change to the router never touches the parser. A change to middleware never touches the transport layer. This boundary discipline is what makes the codebase navigable at scale.


Features

  • Raw TCPnet.Listen, Accept loop, net.Conn read/write with no HTTP library
  • HTTP/1.1 parser — request line, headers map, body via Content-Length, edge case handling
  • Trie router — O(k) path matching where k is path depth, dynamic params (:id), method dispatch, 404 / 405
  • Middleware chain — composable func(HandlerFunc) HandlerFunc pattern, Chain() helper
  • Logger middleware — method, path, status code, duration per request
  • Auth middleware — Bearer token validation, configurable secret, 401 on failure
  • Rate limiter — token bucket algorithm, per-IP tracking, mutex-protected, 429 on exhaustion
  • Worker pool — 100 fixed goroutines, 1000-connection buffer channel, bounded concurrency
  • Keep-Alive — TCP connection reuse across multiple requests, deadline reset per cycle
  • Connection timeoutsSetDeadline per request, silent EOF and timeout handling
  • TLS / HTTPStls.Listen with X.509 certificate, transport-level encryption
  • Graceful shutdownsync.WaitGroup per connection, drains in-flight requests on SIGTERM

Project structure

flint/
├── main.go                 ← composition root — wires routes, starts server
├── handler/
│   ├── ping.go             ← GET /ping
│   ├── echo.go             ← POST /echo
│   └── home.go             ← GET /users/:id
└── server/
    ├── server.go           ← Config, Server, worker pool, TLS listener, shutdown
    ├── conn.go             ← per-connection handler, Keep-Alive loop, deadlines
    ├── request.go          ← HTTP parser, Request struct
    ├── response.go         ← Response struct, status codes, write
    ├── router.go           ← Trie data structure, dispatch, 404/405
    └── middleware.go       ← Logger, Auth, RateLimit, Chain, tokenBucket

Getting started

Generate a self-signed TLS certificate:

bash scripts/gen_cert.sh

Run the server:

go run main.go
# Flint listening on :8443

Test the endpoints:

# health check — no auth required
curl -k https://localhost:8443/ping

# protected route — requires Bearer token
curl -k -H "Authorization: Bearer 123456789abcdef" https://localhost:8443/users/42

# echo — returns request body
curl -k -X POST -d "hello flint" -H "Authorization: Bearer 123456789abcdef" https://localhost:8443/echo

# 404
curl -k https://localhost:8443/nonexistent

# 405
curl -k -X POST https://localhost:8443/ping

# 401
curl -k https://localhost:8443/users/42

Benchmark results

Flint vs Go stdlib net/http

hey -n 10000 -c 100 — includes rate limiter (both servers ~1500 pass, ~8500 rate limited)

Server Req/sec p50 p95 p99 Notes
Flint 4,902 16.8ms 53.1ms 121ms HTTPS, worker pool
net/http 5,408 16.0ms 48.0ms 74ms HTTP, goroutine per conn

Gap: stdlib is ~10% faster on throughput, ~38% faster at p99 tail latency.

Key finding: the rate limiter produces identical behaviour in both servers — ~1,500 successful responses and ~8,500 rate limited (429) responses from 10,000 requests. The performance gap is transport-level, not business logic.

Primary sources of the gap:

  1. Flint uses HTTPS (TLS handshake overhead) vs stdlib plain HTTP
  2. stdlib reuses buffers via sync.Pool — Flint allocates per request
  3. Flint worker pool adds channel dispatch latency under high concurrency

Apples-to-apples comparison (both HTTP, no rate limiter) would likely close the gap to under 5% — the architecture is sound.

Microbenchmarks — parser, router, middleware

go test -bench=. -benchmem ./server/ AMD Ryzen 5 5500U · Windows · amd64

Component ns/op B/op allocs/op Notes
Parse GET request 3,147 4,864 13 317k parses/sec/core
Parse POST request 3,728 5,016 18 +581ns for body read/alloc
Router static path 980 576 7 exact Trie match
Router dynamic path 1,400 920 10 +420ns for param extraction
Router not found 1,108 770 12 fails at first segment — fast exit
Logger middleware 497 34 3 cheapest middleware — timer only
Rate limiter 1,058 189 2 mutex contention is the bottleneck
Full middleware chain 1,470 864 11 Logger + Auth + RateLimit + handler

Key findings:

  1. Full middleware chain (1,470 ns) costs less than a single dynamic route lookup (1,400 ns) — Chain() is pure function nesting with zero overhead.

  2. Rate limiter is the most expensive middleware at 1,058 ns due to sync.Mutex contention. Under high concurrency all goroutines contend for one lock — the primary argument for replacing it with Redis at scale.

  3. Parser allocates 13 times per GET request — the main optimisation target for 1M RPS. A zero-allocation state machine parser (the approach net/http uses) would reduce this to 2-3 allocs/op.

  4. Test coverage: 58.1% of statements — uncovered paths are primarily network error handling (EOF, timeout, TLS failure) which require a live connection to test.

Worker pool behaviour

The worker pool is slower on localhost with fast handlers. This is expected and important to understand.

Configuration Req/sec p99 Notes
Naked goroutines 2,985 102ms No concurrency limit
Worker pool (100 workers) 1,975 168ms Channel dispatch overhead

Finding: on localhost, the channel send + goroutine context switch adds measurable latency when handlers complete in under 1ms. The worker pool's value is not throughput — it is memory stability under extreme concurrency. With 500 concurrent clients and a 50ms handler (simulating a DB query), naked goroutines spawn 500 goroutines consuming ~4MB of stack space. The worker pool holds at 100 goroutines regardless of connection count. The advantage is resource predictability, not raw speed.


Design decisions

1. bufio.Reader created once per connection, not per request

The Keep-Alive loop calls parseRequest on every iteration. The naive approach creates a new bufio.NewReader(conn) each call. The problem is that bufio.Reader has a 4096-byte internal buffer — on each read it may pull more bytes from the TCP stream than the current request needs, buffering the start of the next request. When the reader is thrown away, those bytes are lost. The next call reads from the raw conn and misses them.

The fix is to create the reader once in handleConn and pass it into parseRequest on every call. The buffer persists across the connection lifetime, carrying buffered bytes correctly between requests. This is the same pattern Go's stdlib uses internally.

2. Trie returns 405 instead of 404 when the path matches but the method does not

Most naive routers return 404 for any unmatched request. Flint's Trie distinguishes between two failure modes — path not found (true 404) and path found but method not registered (405 Method Not Allowed). This distinction matters for API clients — a 404 tells the client "this resource does not exist", while a 405 tells them "the resource exists but you used the wrong verb." HTTP/1.1 spec requires a 405 to include an Allow header listing valid methods. Flint implements the correct status code; the Allow header is a known missing feature.

3. allowRequest holds the mutex for the entire read-modify-write

The token bucket rate limiter uses a map[string]*tokenBucket protected by sync.Mutex. An earlier version called getBucket under the lock and allowRequest outside it. This created a race condition — two goroutines serving requests from the same IP could both read tokens > 0, both decrement, and both be admitted for the price of one token.

The fix is to merge the lookup, refill, and consume operations into a single function that holds the mutex throughout. This is the classic check-then-act race condition. The performance cost is negligible — the critical section is three integer operations taking nanoseconds.

4. Graceful shutdown uses sync.WaitGroup per connection, not per worker

An early implementation called wg.Add(1) once per worker goroutine at startup and wg.Done() inside handleConn once per connection. Since each worker handles many connections, Done() was called more times than Add(), causing a panic from a negative WaitGroup counter.

The correct model increments wg once per accepted connection in the accept loop and decrements once when handleConn returns. wg.Wait() in Shutdown() blocks until every in-flight connection is finished — exactly the semantic graceful shutdown requires.

5. dispatch returns *Response instead of writing directly to net.Conn

The original design had router.dispatch(conn, req) write the response internally. This meant handleConn had no access to the response after dispatch — it could not set the Connection header (keep-alive vs close) based on the request. By making dispatch return *Response, handleConn owns the final write step and can set transport-level headers after the handler runs. This is also better for testing — dispatch can be unit tested without a real net.Conn.


What I would do to scale this to 1 million RPS

The current architecture handles roughly 4,900 req/sec on a single machine (benchmarked with hey -n 10000 -c 100 including TLS and rate limiting).

Transport layer — replace goroutines with epoll

One goroutine per connection does not scale past ~100,000 concurrent connections due to memory pressure (each goroutine stack starts at 8KB). Production servers use event-driven I/O — Linux epoll, BSD kqueue — where a single thread monitors thousands of file descriptors and only wakes up when data arrives. Go's runtime already uses epoll internally but abstracts it behind goroutines. A custom epoll-based event loop would eliminate goroutine overhead entirely. This is how Nginx handles millions of connections on a single core.

Parser — zero-allocation header parsing

Every strings.SplitN call in the header parser allocates a new slice. At 1M RPS that is 1M allocations per second just for header parsing. The fix is a hand-written state machine parser that reads bytes directly without allocating — the same approach used by picohttpparser in C and Go's own net/http internally.

Router — pre-compiled regex or radix tree

The Trie is correct but traverses one node per path segment. A radix tree compresses common prefixes into single nodes, reducing traversal depth. At 1M RPS the difference between 5 node traversals and 2 becomes measurable.

Rate limiter — Redis instead of in-process map

The current rate limiter is in-process — it only works on a single server instance. At scale you run many server instances behind a load balancer. A client can bypass per-instance rate limits by having their requests spread across instances. Replacing the in-process map with a Redis INCR + EXPIRE command moves rate limit state to a shared store visible to all instances. Redis processes ~1M operations/sec with sub-millisecond latency, making it suitable for the hot path.

Load balancing — consistent hashing

With multiple server instances, rate limiting by IP requires routing the same IP to the same instance — otherwise the per-IP bucket is split across instances. Consistent hashing in the load balancer ensures requests from the same IP always reach the same server, making in-process rate limiting viable at scale without Redis.

Connection — HTTP/2 multiplexing

HTTP/1.1 Keep-Alive reuses the TCP connection but requests are still sequential — the next request cannot start until the previous response is complete (head-of-line blocking). HTTP/2 multiplexes multiple requests over a single TCP connection simultaneously. A client loading a page with 50 assets sends all 50 requests at once instead of sequentially. This reduces latency significantly without increasing server-side concurrency.


Known limitations

  • No Transfer-Encoding: chunked support — only Content-Length bodies
  • No HTTP/2 — HTTP/1.1 only
  • No Allow header on 405 responses
  • Auth uses static Bearer token — no JWT signature verification
  • Rate limiter buckets are never evicted from memory — long-running server with many unique IPs will grow unboundedly
  • No request size limit — a malicious client can send an arbitrarily large Content-Length

What I learned

Building from TCP up made abstract concepts concrete. HTTP is just text — \r\n delimited lines over a socket. A router is a Trie traversal. Middleware is the decorator pattern. Keep-Alive is a for loop that blocks on bufio.Reader.ReadString. Rate limiting is a mutex-protected counter. Graceful shutdown is a sync.WaitGroup.

The most valuable insight came from bugs. Recreating bufio.NewReader on every request in the Keep-Alive loop silently dropped bytes from the next request — the kind of bug that only surfaces under load. Calling wg.Add(1) per worker instead of per connection caused a WaitGroup panic at shutdown. Placing allowRequest outside the mutex lock created a race condition that allowed double-spending tokens under concurrent load. Each bug made an abstract concept permanently concrete.

Every abstraction in net/http exists for a specific reason discovered through a specific bug or performance problem. Rebuilding those abstractions from scratch makes you a better user of them.


License

MIT

About

A high-performance HTTP/1.1 server built from raw TCP sockets in Go — no frameworks, no net/http. Implements request parsing, trie-based routing, middleware chaining, and goroutine-per-connection concurrency from scratch.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors