Skip to content

saksham-2000/Distributed-Hash-Table

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dht-grpc

A peer-to-peer distributed hash table built on the Chord protocol and exposed over gRPC, with an embedded HTTP front-end and a small REPL client. Designed to run as multiple processes on a laptop or as separate JVMs on AWS EC2.

                 +-----------+
       client -> | WebClient | (HTTP / JSON)
                 +-----+-----+
                       |
                 +-----v-----+         gRPC          +-----------+
                 |    Dht    | <------------------>  | NodeService| (peer)
                 +-----+-----+                       +-----------+
                       |
                 +-----v-----+
                 |   State   |  routing table + bindings
                 +-----------+

Features

  • Chord ring with 7-bit identifier space (128 slots), tunable in IRouting.NBITS
  • O(log N) lookups via finger tables
  • Background stabilizer (stabilize / fixFinger / checkPredecessor)
  • Key handoff during a join (bindings in (pred, n] move to the new node)
  • gRPC channel pool with keepalive and a clean shutdown hook
  • HTTP/JSON API for inspection and integration with non-Java clients
  • Single-JAR deployment via the Maven shade plugin

Build

Requirements: Java 17+ and Maven 3.6+.

mvn clean package          # -> target/dht.jar (shaded fat-jar, main class: dht.App)
mvn test                   # 10 JUnit 5 tests; should report 0 failures

Run a local 3-node ring

In three terminals:

# Terminal 1 - first node
java -jar target/dht.jar --id 17 --http 8080

# Terminal 2 - join the ring
java -jar target/dht.jar --id 36 --http 8181
curl -X POST 'http://localhost:8181/join?id=17&host=localhost&port=9080'

# Terminal 3 - join the ring
java -jar target/dht.jar --id 52 --http 8282
curl -X POST 'http://localhost:8282/join?id=17&host=localhost&port=9080'

(The gRPC port defaults to --http + 1000, so node 17 above listens on gRPC port 9080.)

After ~5 seconds the stabilizer will have rebuilt the finger tables. Inspect with:

curl http://localhost:8080/routes | jq

REPL client

java -cp target/dht.jar dht.CliClient --target http://localhost:8080
dht> add foo bar
dht> get foo
dht> bindings
dht> routes
dht> join 17 localhost 9080

Local testing examples

End-to-end recipes you can copy-paste straight into a shell. They assume you've already run mvn clean package and have a 3-node ring running (see Run a local 3-node ring).

1. Inspect ring topology

for port in 8080 8181 8282; do
  curl -s "http://localhost:$port/routes" | python3 -c "
import json, sys
r = json.load(sys.stdin)
me   = r['self']['id']
pred = r['predecessor']['id'] if r['predecessor'] else None
succ = r['successor']['id']
print(f'  node {me:3d}: pred={pred}  succ={succ}')
"
done

Expected output for a healthy 17 → 36 → 52 → 17 ring:

  node  17: pred=52  succ=36
  node  36: pred=17  succ=52
  node  52: pred=36  succ=17

If two nodes both list themselves as their own successor, stabilization hasn't converged yet -- wait NBITS * 1.5s ≈ 10s and re-run.

2. Routing-on-write

Add 10 keys exclusively through node 36's HTTP endpoint. Each key should land on the owner node based on its SHA-1 hash, not on the node that received the request:

for kv in foo:bar baz:qux alpha:1 beta:2 gamma:3 \
         delta:4 apple:red banana:yellow cherry:dark hello:world; do
  curl -s -X POST "http://localhost:8181/add?key=${kv%:*}&value=${kv#*:}" > /dev/null
done

for port in 8080 8181 8282; do
  echo "--- node $(curl -s http://localhost:$port/info | python3 -c 'import json,sys; print(json.load(sys.stdin)["id"])') ---"
  curl -s "http://localhost:$port/bindings" | python3 -m json.tool
done

Expected distribution (with NBITS=7, so the ring has 128 slots):

Owner range Node Keys (hash)
(52, 17] 17 foo(53), beta(61), gamma(67), apple(68), cherry(72), banana(113)
(17, 36] 36 alpha(27), hello(29), baz(34)
(36, 52] 52 world(51), delta(52)

Read every key back through node 52 (forces multi-hop routing for most keys) -- all 10 should resolve:

for k in foo baz alpha beta gamma delta apple banana cherry hello; do
  printf "  %-8s -> %s\n" "$k" "$(curl -s "http://localhost:8282/get?key=$k")"
done

3. Cross-node delete

delete routes the same way add does. Issue it on node 17 against a key stored on node 36:

curl -s -X POST 'http://localhost:8080/delete?key=hello&value=world'
sleep 1
curl -s 'http://localhost:8282/get?key=hello'      # -> {"key":"hello","values":[]}
curl -s http://localhost:8181/bindings              # hello no longer there

4. Key handoff during a join

Start a 4th node (id 27, between 17 and 36) and verify that alpha (hash 27) migrates from node 36 to node 27 atomically as part of the notify handshake:

# In a 4th terminal:
java -jar target/dht.jar --id 27 --http 8383
curl -s -X POST 'http://localhost:8383/join?id=17&host=localhost&port=9080'
sleep 5

echo "--- alpha should be on node 27, not 36 ---"
curl -s http://localhost:8181/bindings   # node 36, should NOT contain alpha
curl -s http://localhost:8383/bindings   # node 27, should contain alpha
curl -s 'http://localhost:8080/get?key=alpha'   # lookup via node 17 still works

Ring after join: 17 → 27 → 36 → 52 → 17. Node 17 will list pred=52, succ=27; node 27 will list pred=17, succ=36.

5. Predecessor failure recovery

Kill one node and watch the survivors clear stale predecessor pointers. The checkPredecessor task runs every 5 seconds, so allow ~10s for the ring to notice:

# Find and kill the node 36 process
pkill -f "id 36"

sleep 10
curl -s http://localhost:8282/routes | python3 -c "
import json, sys
r = json.load(sys.stdin)
print('node 52 pred =', r['predecessor'])   # was 36, should now be null
print('node 52 succ =', r['successor']['id'])
"

Note: with the current implementation, bindings on the dead node are lost -- successor lists and replication are on the roadmap. Once node 17's successor (36) is unreachable, lookups for keys in (17, 36] will fail until you restart node 36 or add a new node in that range. This is expected behavior for vanilla Chord without replication.

6. Stress test the lookup path

Issue 1000 random gets through node 52 and time them:

time (for i in $(seq 1 1000); do
  k="key-$RANDOM"
  curl -s "http://localhost:8282/get?key=$k" > /dev/null
done)

On a laptop with the 3-node ring this typically completes in 8-12s (~10ms per lookup including HTTP overhead). Latency is dominated by the JSON encoding and curl process startup, not by Chord routing -- a direct gRPC stub would be ~10x faster.

7. gRPC introspection with grpcurl

The server registers proto reflection, so grpcurl works without a local copy of dht.proto:

grpcurl -plaintext localhost:9080 list
grpcurl -plaintext localhost:9080 dht.DhtService/getNodeInfo
grpcurl -plaintext -d '{"id": 27}' localhost:9080 dht.DhtService/findSuccessor
grpcurl -plaintext -d '{"key": "foo"}' localhost:9080 dht.DhtService/getBindings

8. Cleanup

pkill -f "dht.jar"      # all nodes shut down via the JVM hook

The shutdown hook drains the gRPC channel pool, stops the embedded HTTP server, and shuts down the background scheduler -- you should see Shutting down node <id> in each node's log before it exits.

Troubleshooting

  • /get returns [] for a key you just added -- wait a few seconds; on a fresh ring the finger table is still being filled in by fixFinger, so a lookup may temporarily route to the wrong node and read an empty store. Stabilization fixes this within NBITS * 1.5s.
  • Address already in use -- another node (or stale JVM) is on the same port. Run pkill -f "dht.jar" and retry.
  • Node never finds its successor after /join -- the bootstrap node isn't reachable. Verify with curl http://localhost:<bootstrap-http>/info first. On EC2, double-check the security group allows the gRPC port (= HTTP port + 1000 by default).

HTTP API

Method Path Query params Description
GET /info - This node's identity
GET /routes - Predecessor, successor, finger table
GET /bindings - All bindings stored locally
GET /get key DHT lookup (routes to owner node)
POST /add key, value DHT add (multi-valued)
POST /delete key, value DHT delete
POST /join id, host, port Join existing ring through bootstrap

Responses are JSON. Errors return {"error": "..."} with 4xx.

gRPC API

See src/main/proto/dht.proto. The service is registered with proto reflection, so grpcurl works out of the box:

grpcurl -plaintext localhost:9080 list
grpcurl -plaintext localhost:9080 dht.DhtService/getNodeInfo

Chord on one page

Each node holds a finger table of NBITS entries; entry i points to the successor of (n + 2^i) mod 2^m. To find the successor of an id, walk the finger table from high to low for the closest preceding node and recurse on that peer -- the distance to the target halves each hop, giving O(log N).

The background stabilize task runs every second and:

  1. asks the successor for its predecessor,
  2. adopts that predecessor as our successor if it sits between us, and
  3. notifies the (possibly new) successor that we exist.

A separate fixFinger tick refreshes one finger entry at a time so the table converges over NBITS * 1.5s after any ring change. checkPredecessor clears the predecessor pointer if it stops responding so a future notify can repair it.

Deploy on AWS EC2

# On each EC2 instance:
sudo dnf install java-17-amazon-corretto -y      # Amazon Linux 2023
scp -i key.pem target/dht.jar ec2-user@$HOST:~/

# Open ports in your security group: HTTP port and HTTP+1000 (gRPC).

# First node:
java -jar dht.jar --id 17 --http 8080 \
  --host $(curl -s http://169.254.169.254/latest/meta-data/public-hostname)

# Subsequent nodes (then POST /join from any client):
java -jar dht.jar --id 36 --http 8080 \
  --host $(curl -s http://169.254.169.254/latest/meta-data/public-hostname)

Project layout

dht-grpc/
├── pom.xml
├── README.md
├── CLAUDE.md                     # guidance for AI assistants editing the repo
├── src/
│   ├── main/
│   │   ├── java/dht/
│   │   │   ├── App.java          # entry point
│   │   │   ├── Background.java   # stabilize / fixFinger / checkPred ticks
│   │   │   ├── Channels.java     # gRPC channel pool
│   │   │   ├── CliClient.java    # REPL over the HTTP API
│   │   │   ├── Dht.java          # Chord algorithms
│   │   │   ├── DhtBase.java      # self-vs-remote dispatch helpers
│   │   │   ├── DhtServer.java    # gRPC server lifecycle
│   │   │   ├── IRouting.java     # routing-table contract
│   │   │   ├── NodeKey.java      # (host, port) value type
│   │   │   ├── NodeService.java  # gRPC adapter -> Dht
│   │   │   ├── Ring.java         # modular arithmetic + key hashing
│   │   │   ├── State.java        # node state (predecessor, fingers, bindings)
│   │   │   └── WebClient.java    # HTTP/JSON front-end
│   │   ├── proto/
│   │   │   └── dht.proto
│   │   └── resources/
│   │       └── logback.xml
│   └── test/java/dht/
│       ├── RingTest.java         # ring arithmetic invariants
│       └── StateTest.java        # bindings store + handoff helpers

Tests

mvn test

Two JUnit 5 suites (10 tests):

  • RingTest -- modular-arithmetic predicates (between, betweenInclusiveRight, mod, fingerStart, hash). The wrap-around case is the easiest place for a Chord implementation to silently break.
  • StateTest -- bindings store, multi-valued add semantics, and the takeRange / takeOutsideRange split that backs the notify handoff. The latter prevents the "give back the entire store" bug when a fresh node accepts its first predecessor.

For end-to-end tests against a live ring, see Local testing examples.

Roadmap

  • Successor list of length r for fault tolerance under simultaneous failures
  • Replication: store each binding on the next r successors
  • Persistent backend (RocksDB) instead of an in-memory map
  • Prometheus metrics for stabilize convergence and lookup latency
  • TLS between peers (gRPC has first-class support)

About

A peer-to-peer distributed hash table built on the Chord protocol and exposed over gRPC

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages