Blobasaur is a high-performance, sharded blob storage server written in Rust. It implements the Redis protocol for client compatibility and uses SQLite as the backend for each shard, providing a simple yet robust solution for storing and retrieving binary large objects (blobs).
- Features
- Installation
- Quick Start
- CLI Usage
- Configuration
- Redis Commands
- Shard Migration
- Performance Features
- Advanced Topics
- Development
- 🚀 High Performance Sharding: Distributes data across multiple SQLite databases using multi-probe consistent hashing for optimal concurrency and scalability
- 🔄 Shard Migration: Built-in support for migrating data between different shard configurations with data integrity verification
- 🔌 Redis Protocol Compatible: Full compatibility with Redis clients using standard commands (
GET
,SET
,DEL
,EXISTS
,HGET
,HSET
,HDEL
,HEXISTS
) - ⚡ Asynchronous Operations: Built on Tokio for non-blocking I/O and efficient concurrent request handling
- 💾 SQLite Backend: Each shard uses its own SQLite database for simple deployment and reliable storage
- 🗜️ Storage Compression: Configurable compression with multiple algorithms (Gzip, Zstd, Lz4, Brotli)
- 📊 Namespacing: Hash-based namespacing using
HGET
/HSET
commands for logical data organization - 🏷️ Metadata Tracking: Automatic tracking of creation time, update time, expiration, and version numbers
- ⏰ TTL Support: Redis-compatible key expiration with
SET EX/PX
,TTL
, andEXPIRE
commands plus automatic background cleanup - 🌐 Redis Cluster Support: Full cluster protocol support with automatic node discovery and client redirection
- ⚙️ Highly Configurable: Flexible configuration for shards, compression, batching, and performance tuning
You can download pre-compiled binaries for Linux from the GitHub Releases page.
-
Download the latest release:
For example, to download version
v0.1.0
:wget https://github.com/iamd3vil/blobasaur/releases/download/v0.1.0/blobasaur-Linux-musl-x86_64.tar.gz
-
Extract the archive:
tar -xvf blobasaur-Linux-musl-x86_64.tar.gz
-
Run the binary: The binary inside the archive is named
blobasaur
. You can run it directly:./blobasaur
- Rust toolchain (latest stable recommended) - Install Rust
just
(optional but recommended) - A command runner for development taskscargo install just
cross
(optional) - For cross-compilation to different targetscargo install cross
You'll need to build from source:
-
Clone the repository:
git clone https://github.com/zerodha/blobasaur.git cd blobasaur
-
Build the project:
# Debug build cargo build # or using just just build # Release build (recommended for production) cargo build --release # or using just just build-release
-
Install the binary:
# Install to ~/.cargo/bin cargo install --path . # Or copy the binary manually cp target/release/blobasaur /usr/local/bin/
For Linux production deployments, you can create a static binary:
# Build a static Linux MUSL binary
just build-linux
# Binary will be at: target/x86_64-unknown-linux-musl/release/blobasaur
-
Create a configuration file (
config.toml
):data_dir = "./data" num_shards = 4 addr = "0.0.0.0:6379"
-
Run the server:
blobasaur # or with custom config blobasaur --config /path/to/config.toml
-
Test with redis-cli:
redis-cli -p 6379 SET mykey "Hello, World!" redis-cli -p 6379 GET mykey # Test TTL functionality redis-cli -p 6379 SET session:123 "user_data" EX 60 # Expires in 60 seconds redis-cli -p 6379 TTL session:123 # Check remaining time redis-cli -p 6379 EXPIRE mykey 300 # Set 5 minute expiration
Blobasaur supports multiple modes of operation:
# Run the server (default behavior)
blobasaur
# or explicitly
blobasaur serve
# Specify a custom config file
blobasaur --config /path/to/config.toml
# Shard migration commands
blobasaur shard migrate <old_shard_count> <new_shard_count>
# Get help
blobasaur --help
Available Commands:
serve
(default): Runs the Blobasaur servershard migrate
: Migrates data between different shard configurations
Global Options:
--config, -c
: Path to configuration file (default:config.toml
)--help
: Display help information
Blobasaur exposes a wide range of metrics for monitoring, compatible with Prometheus. You can track command latency, connection counts, cache performance, and much more.
To enable the metrics server, add the following to your config.toml
:
[metrics]
enabled = true
addr = "0.0.0.0:9090"
For a complete list of available metrics and example usage with Prometheus and Grafana, please see the METRICS.md documentation.
Blobasaur is configured via a config.toml
file:
# Required settings
data_dir = "/var/data/blobasaur" # Directory for SQLite databases
num_shards = 8 # Number of shards (must be > 0)
# Optional settings
addr = "0.0.0.0:6379" # Server bind address
async_write = false # Enable async writes
batch_size = 1 # Write batch size
batch_timeout_ms = 0 # Batch timeout in milliseconds
Configure compression for data at rest:
[storage_compression]
enabled = true
algorithm = "zstd" # Options: "none", "gzip", "zstd", "lz4", "brotli"
level = 3 # Compression level (algorithm-specific)
Compression Options:
- Zstd: Best balance of speed and compression ratio (recommended)
- Lz4: Fastest compression, lower ratio
- Gzip: Good compatibility, moderate performance
- Brotli: Best compression ratio, slower
For high-throughput scenarios:
# High-performance configuration
async_write = true
batch_size = 100
batch_timeout_ms = 10
[storage_compression]
enabled = true
algorithm = "lz4"
level = 1
Blobasaur implements core Redis commands for blob operations:
-
SET key value [EX seconds] [PX milliseconds]
: Store or replace a blob with optional TTL# Basic SET redis-cli SET mykey "Hello, World!" # SET with TTL in seconds redis-cli SET mykey "Hello, World!" EX 60 # SET with TTL in milliseconds redis-cli SET mykey "Hello, World!" PX 30000
-
GET key
: Retrieve a blob (excludes expired keys)redis-cli GET mykey
-
DEL key
: Delete a blobredis-cli DEL mykey
-
EXISTS key
: Check if a blob exists (excludes expired keys)redis-cli EXISTS mykey
-
TTL key
: Get the remaining time to live for a keyredis-cli TTL mykey # Returns: # -1 if key exists but has no expiration # -2 if key does not exist or has expired # positive number: remaining TTL in seconds
-
EXPIRE key seconds
: Set expiration time for an existing keyredis-cli EXPIRE mykey 120 # Expire in 2 minutes # Returns: # 1 if expiration was set successfully # 0 if key does not exist
Use namespaces to organize data into logical groups:
-
HSET namespace key value
: Store in namespaceredis-cli HSET users:123 name "John Doe" redis-cli HSET users:123 email "[email protected]"
-
HGET namespace key
: Retrieve from namespaceredis-cli HGET users:123 name
-
HDEL namespace key
: Delete from namespaceredis-cli HDEL users:123 email
-
HEXISTS namespace key
: Check existence in namespaceredis-cli HEXISTS users:123 name
Any Redis client works with Blobasaur:
# Python example
import redis
r = redis.Redis(host='localhost', port=6379)
r.set('mykey', 'myvalue')
value = r.get('mykey')
# Namespaced operations
r.hset('users:123', 'name', 'John Doe')
name = r.hget('users:123', 'name')
Shard migration allows you to change the number of shards in your deployment, enabling you to scale storage and optimize performance.
The migration process:
- Creates new SQLite databases for the target shard configuration
- Redistributes existing data according to the new consistent hashing ring
- Maintains data integrity throughout the process
- Optionally verifies migration success
# Basic migration from 4 shards to 8 shards
blobasaur shard migrate 4 8
# Migration with custom data directory
blobasaur shard migrate 4 8 --data-dir /custom/path/to/data
# Migration with verification
blobasaur shard migrate 4 8 --verify
# Migration using specific config file
blobasaur --config /path/to/config.toml shard migrate 4 8
Command Options:
old_shard_count
: Current number of shardsnew_shard_count
: Target number of shards--data-dir, -d
: Override data directory--verify
: Run verification after migration
- Validation: Checks source shard databases exist and are accessible
- Preparation: Creates new SQLite databases for target configuration
- Data Transfer: Migrates data in batches using new hash ring
- Verification (optional): Compares data integrity between old and new shards
Before Migration:
# 1. Stop the server
pkill blobasaur
# 2. Backup your data
cp -r /var/data/blobasaur /var/data/blobasaur-backup
# 3. Test on a copy first
cp -r /var/data/blobasaur /tmp/test-migration
Migration Workflow:
# 4. Run migration with verification
blobasaur shard migrate 4 8 --verify
# 5. Update config.toml
# Change: num_shards = 4
# To: num_shards = 8
# 6. Start server with new configuration
blobasaur
Important Considerations:
- Downtime Required: Server must be stopped during migration
- Disk Space: Need space for both old and new databases
- Testing: Always test migrations on data copies first
- Recovery: Keep backups for rollback if needed
Improves throughput by batching multiple operations:
batch_size = 50 # Operations per batch
batch_timeout_ms = 10 # Max wait time for batch
Benefits:
- Higher write throughput
- Reduced SQLite transaction overhead
- Better resource utilization
Enables immediate responses while writes happen in background:
async_write = true
Features:
- Immediate response to clients
- Inflight cache prevents race conditions
- Maintains consistency guarantees
Reduces storage requirements and can improve I/O performance:
[storage_compression]
enabled = true
algorithm = "zstd"
level = 3
Performance Impact:
- Zstd: 20-40% size reduction, minimal CPU overhead
- Lz4: 15-25% size reduction, fastest compression
- Brotli: 40-60% size reduction, higher CPU usage
Default Table (blobs
):
CREATE TABLE blobs (
key TEXT PRIMARY KEY,
value BLOB NOT NULL,
created_at INTEGER NOT NULL,
updated_at INTEGER NOT NULL,
expires_at INTEGER,
version INTEGER NOT NULL DEFAULT 0
);
Namespaced Tables (blobs_{namespace}
):
- Created automatically on first access
- Same schema as default table
- Isolated from other namespaces
Features:
- Redis-Compatible TTL: Full support for
SET EX/PX
,TTL
, andEXPIRE
commands - Automatic Expiry: All read operations (
GET
,EXISTS
,TTL
) automatically filter expired keys - Background Cleanup: Per-shard cleanup tasks run every 60 seconds to remove expired keys
- Efficient Storage: Uses indexed
expires_at
timestamps for fast expiry queries
Implementation Details:
- Expiration timestamps stored as Unix epoch seconds in
expires_at
column - Database indexes on
expires_at
for efficient cleanup queries - Background cleanup processes both main
blobs
table and namespaced tables - Race-free expiry checking: keys are considered expired at query time
Usage Examples:
# Set with 60 second expiration
redis-cli SET session:abc123 "user_data" EX 60
# Check remaining TTL
redis-cli TTL session:abc123
# Add expiration to existing key
redis-cli EXPIRE permanent_key 3600
The Problem: Async writes could cause GET requests to miss recently SET data.
The Solution: Inflight cache system:
- Tracks pending write operations
- Serves data from cache during async writes
- Automatic cleanup after database commits
Full Redis cluster protocol support with:
- Node Discovery: Automatic gossip protocol
- Hash Slots: 16384 slot distribution
- Client Redirection: Automatic MOVED responses
- Cluster Commands:
CLUSTER NODES
,CLUSTER INFO
, etc.
Basic Cluster Configuration:
[cluster]
enabled = true
node_id = "node1"
seeds = ["127.0.0.1:6380"]
advertise_addr = "127.0.0.1"
port = 6379
[[cluster.slot_ranges]]
start = 0
end = 8191
src/
├── main.rs # Entry point and CLI handling
├── config.rs # Configuration management
├── app_state.rs # Application state and shard routing
├── server.rs # Redis protocol server
├── shard_manager.rs # Shard write operations and batching
├── migration.rs # Shard migration functionality
├── compression.rs # Storage compression
├── metrics.rs # Performance metrics
├── http_server.rs # HTTP metrics endpoint
├── cluster.rs # Redis cluster protocol
└── redis/ # Redis protocol implementation
├── protocol.rs # RESP parser
└── integration_tests.rs
Comprehensive test suite covering:
- Unit Tests: RESP protocol parsing and serialization
- Integration Tests: Command handling and binary data
- Protocol Compliance: Redis compatibility verification
# Run all tests
cargo test
# Run with output
cargo test -- --nocapture
- Tokio: Async runtime
- SQLx: Async SQL toolkit
- Nom: Parser combinator for RESP protocol
- Serde: Serialization framework
- Miette: Error handling
- Tracing: Structured logging
- Gumdrop: Command-line parsing
This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0) - see the LICENSE file for details.
The AGPL-3.0 is a copyleft license that requires anyone who distributes the code or runs it on a server to provide the source code to users, including any modifications.