Pylon

Distributed real-time chat system built with Go microservices, Connect-Go, WebSocket, Kafka, PostgreSQL, Redis, Kubernetes, Prometheus, Grafana, and OpenTelemetry.

Features

Feature	Description
Real-time messaging	Authenticated WebSocket gateway for room join, leave, message, typing, and ping events
Microservice architecture	API Gateway, Chat Service, Presence Service, Room Service, and Notification Service
Typed RPC contracts	Protobuf contracts generated with Buf and served through Connect-Go
Event streaming	Kafka message events from Chat Service to Notification Service
Durable storage	PostgreSQL for users, rooms, messages, refresh tokens, and notifications
Fast ephemeral state	Redis for presence, typing, rate limiting, and short-lived state
JWT authentication	Register, login, refresh token, and protected HTTP/WebSocket routes
Rate limiting	Redis-backed API Gateway rate limiter
Observability	Prometheus metrics, Grafana dashboards, OpenTelemetry tracing, and Jaeger
Kubernetes deployment	Kustomize-style base manifests with probes, resources, HPA, and security contexts
Load testing	HTTP load-test suite using clank-cli

Architecture

                                     ┌──────────────────────┐
                                     │      Clients         │
                                     │  HTTP + WebSocket    │
                                     └──────────┬───────────┘
                                                │
                                                ▼
                                  ┌─────────────────────────┐
                                  │      API Gateway        │
                                  │  REST, WebSocket, JWT   │
                                  │        :8080            │
                                  └───────┬─────────┬───────┘
                                          │         │
                         Connect RPC      │         │ WebSocket fan-out
                                          │         │
               ┌──────────────────────────┼─────────┼──────────────────────────┐
               │                          │         │                          │
               ▼                          ▼         ▼                          ▼
   ┌─────────────────────┐    ┌─────────────────────┐          ┌─────────────────────┐
   │    Chat Service     │    │    Room Service     │          │  Presence Service   │
   │       :9001         │    │       :9003         │          │       :9002         │
   │ PostgreSQL + Kafka  │    │    PostgreSQL       │          │        Redis        │
   └──────────┬──────────┘    └──────────┬──────────┘          └──────────┬──────────┘
              │                          │                                │
              │ message-events           │                                │
              ▼                          ▼                                ▼
        ┌──────────┐              ┌──────────────┐                 ┌──────────┐
        │  Kafka   │              │  PostgreSQL  │                 │  Redis   │
        └────┬─────┘              └──────────────┘                 └──────────┘
             │
             ▼
   ┌─────────────────────┐
   │ Notification Service│
   │       :9004         │
   │ PostgreSQL + Kafka  │
   └─────────────────────┘

Observability:
- `/metrics` endpoints -> Prometheus -> Grafana
- OpenTelemetry SDK -> OTel Collector -> Jaeger

Services

Service	Port	Purpose
API Gateway	`8080`	Public HTTP API, WebSocket gateway, JWT auth, rate limiting
Chat Service	`9001`	Message persistence, message history, Kafka message events
Presence Service	`9002`	Online, offline, typing, and room presence state
Room Service	`9003`	Room creation, listing, membership, join, and leave
Notification Service	`9004`	Notification persistence and Kafka-driven notification processing

Tech Stack

Area	Technology	Purpose
Language	Go `1.26.4`	Backend services
RPC	Connect-Go	Typed service-to-service communication
Protobuf	Buf	Proto linting and Go code generation
WebSocket	coder/websocket	Real-time client connections
Message broker	Kafka + segmentio/kafka-go	Event streaming
Database	PostgreSQL 17 + pgx	Durable relational data
Cache/state	Redis 7 + go-redis	Presence, typing, and rate limiting
Auth	JWT	API and WebSocket authentication
Metrics	Prometheus client	HTTP, RPC, Kafka, and business metrics
Dashboards	Grafana	Service, infrastructure, Kafka, and business dashboards
Tracing	OpenTelemetry + Jaeger	Distributed tracing
Local infra	Docker Compose	PostgreSQL, Redis, Kafka, Kafka UI, Jaeger, OTel Collector
Deployment	Kubernetes	Production-style service deployment
Load testing	clank-cli	HTTP load testing for API Gateway endpoints

Getting Started

Prerequisites

Required:

Go 1.26.4+
Docker
Docker Compose
Buf CLI
golang-migrate
golangci-lint

Optional:

kubectl
clank-cli
jq
curl

Clone

git clone https://github.com/kodokbakar/pylon.git
cd pylon

Configure Environment

Create a local .env file:

cp .env.example .env

When using the Docker Compose infrastructure from this repository, use the exposed host ports:

DATABASE_URL=postgres://pylon:pylon_dev@localhost:5433/pylon?sslmode=disable
REDIS_URL=redis://localhost:6380
KAFKA_BROKERS=localhost:9092

CHAT_SERVICE_URL=http://localhost:9001
PRESENCE_SERVICE_URL=http://localhost:9002
ROOM_SERVICE_URL=http://localhost:9003
NOTIFICATION_SERVICE_URL=http://localhost:9004

JWT_SECRET=change-this-local-secret
OTEL_EXPORTER_OTLP_ENDPOINT=localhost:4317

Start Local Infrastructure

make dev

This starts:

Component	Local Address
PostgreSQL	`localhost:5433`
Redis	`localhost:6380`
Kafka	`localhost:9092`
Kafka UI	`http://localhost:8085`
Jaeger	`http://localhost:16686`
OTel Collector gRPC	`localhost:4317`
OTel Collector HTTP	`localhost:4318`
OTel Collector health	`http://localhost:13133`

Generate Protobuf Code

make proto

Run Database Migrations & Seed Demo Data

export DATABASE_URL="postgres://pylon:pylon_dev@localhost:5433/pylon?sslmode=disable"
make migrate-up
make seed

Run Services Locally

Run each service in a separate terminal:

go run ./cmd/chat-service

go run ./cmd/room-service

go run ./cmd/presence-service

go run ./cmd/notification-service

go run ./cmd/api-gateway

Then check the API Gateway health endpoint:

curl http://localhost:8080/health

Expected response:

{
  "data": {
    "service": "api-gateway",
    "status": "ok"
  }
}

API Reference

All public HTTP routes are exposed through the API Gateway.

Base URL:

http://localhost:8080

Auth

Method	Endpoint	Auth	Description
`POST`	`/api/v1/auth/register`	No	Register a user
`POST`	`/api/v1/auth/login`	No	Login and receive access + refresh tokens
`POST`	`/api/v1/auth/refresh`	No	Refresh access token

Register request:

{
  "username": "alice",
  "email": "alice@example.com",
  "password": "password123"
}

Login request:

{
  "email": "alice@example.com",
  "password": "password123"
}

Refresh request:

{
  "refresh_token": "REFRESH_TOKEN"
}

Auth response shape:

{
  "data": {
    "user": {
      "id": "user-id",
      "username": "alice",
      "email": "alice@example.com"
    },
    "token": "ACCESS_TOKEN",
    "refresh_token": "REFRESH_TOKEN",
    "expires_at": "2026-06-23T00:00:00Z",
    "refresh_expires_at": "2026-06-30T00:00:00Z"
  }
}

Rooms

Protected routes require:

Authorization: Bearer ACCESS_TOKEN

Method	Endpoint	Auth	Description
`GET`	`/api/v1/rooms`	Yes	List rooms for the authenticated user
`POST`	`/api/v1/rooms`	Yes	Create a room
`POST`	`/api/v1/rooms/{id}/join`	Yes	Join a room
`POST`	`/api/v1/rooms/{id}/leave`	Yes	Leave a room

Create room request:

{
  "name": "general",
  "type": "channel",
  "member_ids": []
}

Supported room types:

Type	Description
`direct`	1-on-1 room
`group`	Group chat
`channel`	Public channel-style room

Messages

Method	Endpoint	Auth	Description
`GET`	`/api/v1/rooms/{id}/messages`	Yes	List message history for a room

Query parameters:

Parameter	Default	Max	Description
`limit`	`50`	`100`	Number of messages to return
`before_id`	empty	-	Cursor for older messages

Example:

curl "http://localhost:8080/api/v1/rooms/ROOM_ID/messages?limit=50" \
  -H "Authorization: Bearer ACCESS_TOKEN"

Message sending is handled through WebSocket, not through an HTTP POST /api/v1/rooms/{id}/messages route.

WebSocket Protocol

Endpoint:

GET /ws

Authentication can be sent with either:

Authorization: Bearer ACCESS_TOKEN

or query string:

/ws?token=ACCESS_TOKEN
/ws?access_token=ACCESS_TOKEN

Client to Server

All client messages are JSON text frames.

Join Room

{
  "type": "join",
  "room_id": "ROOM_ID"
}

Leave Room

{
  "type": "leave",
  "room_id": "ROOM_ID"
}

Send Message

{
  "type": "message",
  "room_id": "ROOM_ID",
  "content": "hello world",
  "msg_type": "text"
}

Supported msg_type values:

Type	Description
`text`	Text message
`image`	Image message
`file`	File message
`system`	System message

Typing

{
  "type": "typing",
  "room_id": "ROOM_ID"
}

Ping

{
  "type": "ping"
}

Server to Client

Message

{
  "type": "message",
  "message_id": "MESSAGE_ID",
  "room_id": "ROOM_ID",
  "sender": {
    "id": "USER_ID",
    "username": "alice",
    "display_name": "Alice",
    "avatar_url": ""
  },
  "content": "hello world",
  "msg_type": "text",
  "created_at": "2026-06-23T00:00:00Z"
}

User Joined

{
  "type": "user_joined",
  "room_id": "ROOM_ID",
  "user": {
    "id": "USER_ID"
  }
}

User Left

{
  "type": "user_left",
  "room_id": "ROOM_ID",
  "user_id": "USER_ID"
}

Typing

{
  "type": "typing",
  "room_id": "ROOM_ID",
  "user_id": "USER_ID",
  "username": "alice"
}

Pong

{
  "type": "pong"
}

Error

{
  "type": "error",
  "code": "BAD_REQUEST",
  "message": "room_id is required"
}

Internal RPC Services

Pylon uses Connect-Go generated from Protobuf definitions in proto/.

Service	Proto	Main RPCs
Chat Service	`proto/pylon/chat/v1/chat_service.proto`	`SendMessage`, `StreamMessages`, `GetMessages`
Presence Service	`proto/pylon/presence/v1/presence_service.proto`	`SetOnline`, `SetOffline`, `SetTyping`, `GetPresence`, `GetRoomPresence`, `StreamPresence`
Room Service	`proto/pylon/room/v1/room_service.proto`	`CreateRoom`, `GetRoom`, `ListRooms`, `JoinRoom`, `LeaveRoom`, `GetRoomMembers`
Notification Service	`proto/pylon/notification/v1/notification_service.proto`	`SendNotification`, `GetNotifications`, `MarkAsRead`
Gateway Service	`proto/pylon/gateway/v1/gateway_service.proto`	`ValidateToken`, `RefreshToken`

Generate code:

make proto

Lint proto files:

make proto-lint

Check breaking changes:

make proto-breaking

Project Structure

pylon/
├── cmd/                         # Service entry points and Dockerfiles
│   ├── api-gateway/
│   ├── chat-service/
│   ├── presence-service/
│   ├── room-service/
│   └── notification-service/
├── deploy/                      # Kubernetes, Grafana, Jaeger, and OTel configs
│   ├── base/
│   ├── grafana/
│   └── jaeger/
├── docs/
│   └── ADR/                     # Architecture Decision Records
├── gen/                         # Generated Go protobuf/connect code
├── internal/                    # Shared packages
│   ├── config/
│   ├── database/
│   ├── metrics/
│   ├── middleware/
│   ├── observability/
│   ├── response/
│   └── tracing/
├── migrations/                  # PostgreSQL migrations
├── proto/                       # Protobuf definitions and Buf config
├── services/                    # Service implementations
│   ├── api-gateway/
│   ├── chat-service/
│   ├── presence-service/
│   ├── room-service/
│   └── notification-service/
├── tests/
│   └── load/                    # clank-cli HTTP load tests
├── docker-compose.yml
├── Makefile
└── README.md

Development

Commands

make help              # Show all available commands

make dev               # Start local infrastructure
make dev-down          # Stop local infrastructure
make dev-logs          # Follow local infrastructure logs

make proto             # Generate protobuf code
make proto-lint        # Lint protobuf definitions
make proto-breaking    # Check proto breaking changes against main

make fmt               # Format Go code
make vet               # Run go vet
make lint              # Run golangci-lint

make test              # Run all tests
make test-cover        # Generate coverage.out and coverage.html
make test-integration  # Run integration tests with integration build tag
make test-load         # Run HTTP load tests with clank-cli

make build             # Build all service binaries
make build-gateway     # Build API Gateway
make build-chat        # Build Chat Service
make build-presence    # Build Presence Service
make build-room        # Build Room Service
make build-notification # Build Notification Service

make docker-build      # Build all service Docker images

make migrate-up        # Run database migrations
make seed              # Seed demo users, rooms, messages, and notifications

make migrate-down      # Roll back one migration
make migrate-create name=add_users # Create a new migration

make deps              # Download and tidy Go dependencies
make clean             # Remove build and coverage artifacts

Quality Check

Run this before opening a PR:

go fmt ./...
go test ./...
make build
golangci-lint run ./...

Integration Tests

go test -tags=integration -v ./tests/...

Load Testing

HTTP load tests live in tests/load.

Run:

make test-load

The script uses clank-cli against API Gateway HTTP endpoints and writes JSON results to:

tests/load/results/

Covered HTTP scenarios:

Scenario	Endpoint	Method
List Rooms	`/api/v1/rooms`	`GET`
Get Messages	`/api/v1/rooms/{room_id}/messages`	`GET`
Create Room	`/api/v1/rooms`	`POST`
Join Room	`/api/v1/rooms/{room_id}/join`	`POST`
Leave Room	`/api/v1/rooms/{room_id}/leave`	`POST`

WebSocket message sending is not tested with clank-cli because clank-cli does not support WebSocket load testing.

See:

tests/load/README.md
tests/load/ws/README.md
tests/load/grpc/README.md

Monitoring and Tracing

Local URLs

Tool	URL
Kafka UI	`http://localhost:8085`
Jaeger	`http://localhost:16686`
OTel Collector health	`http://localhost:13133`

Prometheus and Grafana dashboards are provisioned under deploy/grafana, but a full local Prometheus/Grafana Compose stack is not currently included in docker-compose.yml.

Metrics

Pylon services expose:

GET /metrics

Application metrics include:

Metric	Source
`pylon_http_requests_total`	API Gateway HTTP middleware
`pylon_http_request_duration_seconds`	API Gateway HTTP middleware
`pylon_http_requests_in_flight`	API Gateway HTTP middleware
`pylon_grpc_requests_total`	Service Connect/RPC middleware
`pylon_grpc_request_duration_seconds`	Service Connect/RPC middleware
`pylon_kafka_messages_published_total`	Chat Service Kafka producer
`pylon_kafka_messages_consumed_total`	Notification Service Kafka consumer
`pylon_kafka_publish_duration_seconds`	Chat Service Kafka producer
`pylon_websocket_connections_active`	API Gateway WebSocket handler
`pylon_messages_sent_total`	Chat Service message flow
`pylon_rooms_created_total`	Room Service room creation flow
`pylon_users_online`	Presence Service online/offline flow

Grafana

Grafana provisioning files are available in:

deploy/grafana/

Dashboards:

Dashboard	Purpose
Pylon Overview	High-level HTTP, RPC, WebSocket, users, and messages overview
Pylon API Gateway	Request rate, latency, in-flight requests, WebSocket activity, and rate limiting
Pylon Microservices	RPC request rate, latency, errors, and pod resources
Pylon Kafka	Kafka publish/consume metrics, latency, consumer lag, and broker metrics
Pylon Infrastructure	PostgreSQL, Redis, and Kubernetes infrastructure metrics
Pylon Business Metrics	Users online, rooms created, messages sent, and active WebSocket connections

See deploy/grafana/README.md.

Tracing

OpenTelemetry tracing is initialized in every service and exported to the OTel Collector.

Local tracing flow:

Pylon services -> OTel Collector :4317 -> Jaeger :16686

Main environment variables:

OTEL_TRACING_ENABLED=true
OTEL_EXPORTER_OTLP_ENDPOINT=localhost:4317
OTEL_SERVICE_VERSION=1.0.0
OTEL_TRACES_SAMPLE_RATIO=1

Deployment

Docker Images

Build all service images:

make docker-build

Build one image:

make docker-build-gateway
make docker-build-chat
make docker-build-presence
make docker-build-room
make docker-build-notification

Kubernetes

Render manifests:

kubectl kustomize deploy/base

Apply manifests:

kubectl apply -k deploy/base

Check resources:

kubectl get pods -n pylon
kubectl get svc -n pylon
kubectl get hpa -n pylon

The Kubernetes base includes:

Namespace
ConfigMap
Example Secret placeholders
Ingress
Deployments and Services for all Pylon services
HPA manifests
Prometheus scrape config
Jaeger
OpenTelemetry Collector

Important production note:

deploy/base/secrets.yaml contains placeholder values only. Replace it with External Secrets, Sealed Secrets, or generated Kubernetes Secrets before production deployment.

Minikube

Local Kubernetes deployment is available through:

make minikube-deploy

Useful commands:

make minikube-status
make minikube-clean

See deploy/minikube/README.md for the full Minikube workflow.

Architecture Decision Records

Architecture decisions are documented in:

docs/ADR/

Start here:

docs/ADR/README.md

Current ADRs cover:

Microservice architecture
Go as primary language
Connect-Go for RPC
coder/websocket
Kafka with segmentio/kafka-go
PostgreSQL with pgx
Redis with go-redis
Buf for Protobuf
Kubernetes deployment
Monorepo structure
Prometheus, Grafana, and OpenTelemetry

Performance

Load-test targets are documented in tests/load/README.md.

Current targets:

Metric	Target	Alert
Request Duration P50	`< 100ms`	`> 200ms`
Request Duration P95	`< 200ms`	`> 500ms`
Request Duration P99	`< 500ms`	`> 1s`
Error Rate	`< 0.1%`	`> 1%`
Throughput	`>= 500 req/s`	`< 200 req/s`

Baseline result JSON files should only be committed after running tests against a known environment. Do not commit fake load-test results.

Contributing

Create a feature branch from dev.
Keep changes small and focused.
Run verification before committing:

go fmt ./...
go test ./...
make build
golangci-lint run ./...

Update docs when behavior changes.
Add or update tests for service logic.
Open a PR with a clear summary and verification output.

Recommended commit style:

feat: add room listing endpoint
fix: preserve metrics middleware in service handlers
docs: update load testing guide
test: add kafka propagation coverage

License

MIT License. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 145 Commits
.github		.github
cmd		cmd
deploy		deploy
docs/ADR		docs/ADR
gen/pylon		gen/pylon
internal		internal
migrations		migrations
proto		proto
pylon-web		pylon-web
scripts		scripts
services		services
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.golangci.yml		.golangci.yml
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
docker-compose.test.yml		docker-compose.test.yml
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum

Folders and files

Latest commit

History

Repository files navigation

Pylon

Features

Architecture

Services

Tech Stack

Getting Started

Prerequisites

Clone

Configure Environment

Start Local Infrastructure

Generate Protobuf Code

Run Database Migrations & Seed Demo Data

Run Services Locally

API Reference

Auth

Rooms

Messages

WebSocket Protocol

Client to Server

Join Room

Leave Room

Send Message

Typing

Ping

Server to Client

Message

User Joined

User Left

Typing

Pong

Error

Internal RPC Services

Project Structure

Development

Commands

Quality Check

Integration Tests

Load Testing

Monitoring and Tracing

Local URLs

Metrics

Grafana

Tracing

Deployment

Docker Images

Kubernetes

Minikube

Architecture Decision Records

Performance

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages