Blobasaur

Blobasaur is a high-performance, sharded blob storage server written in Rust. It implements the Redis protocol for client compatibility and uses SQLite as the backend for each shard, providing a simple yet robust solution for storing and retrieving binary large objects (blobs).

Features

🚀 High Performance Sharding: Distributes data across multiple SQLite databases using multi-probe consistent hashing for optimal concurrency and scalability
🔄 Shard Migration: Built-in support for migrating data between different shard configurations with data integrity verification
🔌 Redis Protocol Compatible: Full compatibility with Redis clients using standard commands (GET, SET, DEL, EXISTS, HGET, HSET, HSETEX, HDEL, HEXISTS)
⚡ Asynchronous Operations: Built on Tokio for non-blocking I/O and efficient concurrent request handling
💾 SQLite Backend: Each shard uses its own SQLite database for simple deployment and reliable storage
🗜️ Storage Compression: Configurable compression with multiple algorithms (Gzip, Zstd, Lz4, Brotli)
📊 Namespacing: Hash-based namespacing using HGET/HSET commands for logical data organization
🏷️ Metadata Tracking: Automatic tracking of creation time, update time, expiration, and version numbers
⏰ TTL Support: Redis-compatible key expiration with SET EX/PX, TTL, and EXPIRE commands plus automatic background cleanup
🌐 Redis Cluster Support: Full cluster protocol support with automatic node discovery and client redirection
⚙️ Highly Configurable: Flexible configuration for shards, compression, batching, and performance tuning

Installation

From GitHub Releases

You can download pre-compiled binaries for Linux from the GitHub Releases page.

Download the latest release:

For example, to download version v0.1.0:

wget https://github.com/iamd3vil/blobasaur/releases/download/v0.1.0/blobasaur-Linux-musl-x86_64.tar.gz

Extract the archive:

tar -xvf blobasaur-Linux-musl-x86_64.tar.gz

Run the binary: The binary inside the archive is named blobasaur. You can run it directly:
```
./blobasaur
```

Prerequisites

Rust toolchain (latest stable recommended) - Install Rust
just (optional but recommended) - A command runner for development tasks
```
cargo install just
```
cross (optional) - For cross-compilation to different targets
```
cargo install cross
```

From Source

You'll need to build from source:

Clone the repository:

git clone https://github.com/iamd3vil/blobasaur.git
cd blobasaur

Build the project:

# Debug build
cargo build
# or using just
just build

# Release build (recommended for production)
cargo build --release
# or using just
just build-release

Install the binary:

# Install to ~/.cargo/bin
cargo install --path .

# Or copy the binary manually
cp target/release/blobasaur /usr/local/bin/

Cross-compilation

For Linux production deployments, you can create a static binary:

# Build a static Linux MUSL binary
just build-linux

# Binary will be at: target/x86_64-unknown-linux-musl/release/blobasaur

Quick Start

Create a configuration file (config.toml):

data_dir = "./data"
num_shards = 4
addr = "0.0.0.0:6379"

Run the server:

blobasaur
# or with custom config
blobasaur --config /path/to/config.toml

Test with redis-cli:

redis-cli -p 6379 SET mykey "Hello, World!"
redis-cli -p 6379 GET mykey

# Test TTL functionality
redis-cli -p 6379 SET session:123 "user_data" EX 60  # Expires in 60 seconds
redis-cli -p 6379 TTL session:123                    # Check remaining time
redis-cli -p 6379 EXPIRE mykey 300                   # Set 5 minute expiration

CLI Usage

Blobasaur supports multiple modes of operation:

# Run the server (default behavior)
blobasaur
# or explicitly
blobasaur serve

# Specify a custom config file
blobasaur --config /path/to/config.toml

# Shard migration commands
blobasaur shard migrate <old_shard_count> <new_shard_count>

# Get help
blobasaur --help

Available Commands:

serve (default): Runs the Blobasaur server
shard migrate: Migrates data between different shard configurations

Global Options:

--config, -c: Path to configuration file (default: config.toml)
--help: Display help information

Metrics

Blobasaur exposes a wide range of metrics for monitoring, compatible with Prometheus. You can track command latency, connection counts, cache performance, and much more.

To enable the metrics server, add the following to your config.toml:

[metrics]
enabled = true
addr = "0.0.0.0:9090"

For a complete list of available metrics and example usage with Prometheus and Grafana, please see the METRICS.md documentation.

Configuration

Basic Configuration

Blobasaur is configured via a config.toml file:

# Required settings
data_dir = "/var/data/blobasaur"    # Directory for SQLite databases
num_shards = 8                      # Number of shards (must be > 0)

# Optional settings
addr = "0.0.0.0:6379"              # Server bind address
async_write = false                 # Enable async writes
batch_size = 1                      # Write batch size
batch_timeout_ms = 0               # Batch timeout in milliseconds

Storage Compression

Configure compression for data at rest:

[storage_compression]
enabled = true
algorithm = "zstd"  # Options: "none", "gzip", "zstd", "lz4", "brotli"
level = 3           # Compression level (algorithm-specific)

Compression Options:

Zstd: Best balance of speed and compression ratio (recommended)
Lz4: Fastest compression, lower ratio
Gzip: Good compatibility, moderate performance
Brotli: Best compression ratio, slower

Performance Tuning

For high-throughput scenarios:

# High-performance configuration
async_write = true
batch_size = 100
batch_timeout_ms = 10

[storage_compression]
enabled = true
algorithm = "lz4"
level = 1

Redis Commands

Basic Commands

Blobasaur implements core Redis commands for blob operations:

SET key value [EX seconds] [PX milliseconds]: Store or replace a blob with optional TTL

# Basic SET
redis-cli SET mykey "Hello, World!"

# SET with TTL in seconds
redis-cli SET mykey "Hello, World!" EX 60

# SET with TTL in milliseconds
redis-cli SET mykey "Hello, World!" PX 30000

GET key: Retrieve a blob (excludes expired keys)
```
redis-cli GET mykey
```

DEL key [key ...]: Delete one or more blobs (returns count of deleted keys)

# Delete single key
redis-cli DEL mykey

# Delete multiple keys
redis-cli DEL key1 key2 key3

EXISTS key: Check if a blob exists (excludes expired keys)
```
redis-cli EXISTS mykey
```

TTL key: Get the remaining time to live for a key

redis-cli TTL mykey
# Returns:
# -1 if key exists but has no expiration
# -2 if key does not exist or has expired
# positive number: remaining TTL in seconds

EXPIRE key seconds: Set expiration time for an existing key

redis-cli EXPIRE mykey 120  # Expire in 2 minutes
# Returns:
# 1 if expiration was set successfully
# 0 if key does not exist

Namespaced Commands

Use namespaces to organize data into logical groups:

HSET namespace key value: Store in namespace

redis-cli HSET users:123 name "John Doe"
redis-cli HSET users:123 email "[email protected]"

HSETEX key [options] FIELDS numfields field value [field value ...]: Store fields with TTL

# Set single field with 60 second expiration
redis-cli HSETEX sessions EX 60 FIELDS 1 user123 "session_data"

# Set multiple fields with expiration
redis-cli HSETEX cache EX 300 FIELDS 2 key1 "value1" key2 "value2"

# Set with millisecond precision
redis-cli HSETEX temp PX 5000 FIELDS 1 data "temporary"

# Only set if field doesn't exist (FNX option)
redis-cli HSETEX users FNX EX 3600 FIELDS 1 newuser "data"

# Only set if field exists (FXX option)
redis-cli HSETEX users FXX EX 3600 FIELDS 1 existinguser "updated"

HGET namespace key: Retrieve from namespace
```
redis-cli HGET users:123 name
```
HDEL namespace key: Delete from namespace
```
redis-cli HDEL users:123 email
```
HEXISTS namespace key: Check existence in namespace
```
redis-cli HEXISTS users:123 name
```

Using Redis Clients

Any Redis client works with Blobasaur:

# Python example
import redis
r = redis.Redis(host='localhost', port=6379)
r.set('mykey', 'myvalue')
value = r.get('mykey')

# Namespaced operations
r.hset('users:123', 'name', 'John Doe')
name = r.hget('users:123', 'name')

# Namespaced operations with TTL (Redis 8.0 compatible)
# HSETEX syntax: key [options] FIELDS numfields field value [field value ...]
r.execute_command('HSETEX', 'sessions', 'EX', '3600', 'FIELDS', '1', 'user456', 'session_data')

# Set multiple fields with expiration
r.execute_command('HSETEX', 'cache', 'EX', '300', 'FIELDS', '2', 'key1', 'val1', 'key2', 'val2')

Shard Migration

Overview

Shard migration allows you to change the number of shards in your deployment, enabling you to scale storage and optimize performance.

The migration process:

Creates new SQLite databases for the target shard configuration
Redistributes existing data according to the new consistent hashing ring
Maintains data integrity throughout the process
Optionally verifies migration success

Usage

# Basic migration from 4 shards to 8 shards
blobasaur shard migrate 4 8

# Migration with custom data directory
blobasaur shard migrate 4 8 --data-dir /custom/path/to/data

# Migration with verification
blobasaur shard migrate 4 8 --verify

# Migration using specific config file
blobasaur --config /path/to/config.toml shard migrate 4 8

Command Options:

old_shard_count: Current number of shards
new_shard_count: Target number of shards
--data-dir, -d: Override data directory
--verify: Run verification after migration

Migration Process

Validation: Checks source shard databases exist and are accessible
Preparation: Creates new SQLite databases for target configuration
Data Transfer: Migrates data in batches using new hash ring
Verification (optional): Compares data integrity between old and new shards

Best Practices

Before Migration:

# 1. Stop the server
pkill blobasaur

# 2. Backup your data
cp -r /var/data/blobasaur /var/data/blobasaur-backup

# 3. Test on a copy first
cp -r /var/data/blobasaur /tmp/test-migration

Migration Workflow:

# 4. Run migration with verification
blobasaur shard migrate 4 8 --verify

# 5. Update config.toml
# Change: num_shards = 4
# To:     num_shards = 8

# 6. Start server with new configuration
blobasaur

Important Considerations:

Downtime Required: Server must be stopped during migration
Disk Space: Need space for both old and new databases
Testing: Always test migrations on data copies first
Recovery: Keep backups for rollback if needed

Performance Features

Write Batching

Improves throughput by batching multiple operations:

batch_size = 50          # Operations per batch
batch_timeout_ms = 10    # Max wait time for batch

Benefits:

Higher write throughput
Reduced SQLite transaction overhead
Better resource utilization

Asynchronous Writes

Enables immediate responses while writes happen in background:

async_write = true

Features:

Immediate response to clients
Inflight cache prevents race conditions
Maintains consistency guarantees

Storage Compression

Reduces storage requirements and can improve I/O performance:

[storage_compression]
enabled = true
algorithm = "zstd"
level = 3

Performance Impact:

Zstd: 20-40% size reduction, minimal CPU overhead
Lz4: 15-25% size reduction, fastest compression
Brotli: 40-60% size reduction, higher CPU usage

Advanced Topics

Database Schema

Default Table (blobs):

CREATE TABLE blobs (
    key TEXT PRIMARY KEY,
    value BLOB NOT NULL,
    created_at INTEGER NOT NULL,
    updated_at INTEGER NOT NULL,
    expires_at INTEGER,
    version INTEGER NOT NULL DEFAULT 0
);

Namespaced Tables (blobs_{namespace}):

Created automatically on first access
Same schema as default table
Isolated from other namespaces

TTL and Key Expiration

Features:

Redis-Compatible TTL: Full support for SET EX/PX, TTL, and EXPIRE commands
Automatic Expiry: All read operations (GET, EXISTS, TTL) automatically filter expired keys
Background Cleanup: Per-shard cleanup tasks run every 60 seconds to remove expired keys
Efficient Storage: Uses indexed expires_at timestamps for fast expiry queries

Implementation Details:

Expiration timestamps stored as Unix epoch seconds in expires_at column
Database indexes on expires_at for efficient cleanup queries
Background cleanup processes both main blobs table and namespaced tables
Race-free expiry checking: keys are considered expired at query time

Usage Examples:

# Set with 60 second expiration
redis-cli SET session:abc123 "user_data" EX 60

# Check remaining TTL
redis-cli TTL session:abc123

# Add expiration to existing key
redis-cli EXPIRE permanent_key 3600

Race Condition Handling

The Problem: Async writes could cause GET requests to miss recently SET data.

The Solution: Inflight cache system:

Tracks pending write operations
Serves data from cache during async writes
Automatic cleanup after database commits

Redis Cluster Compatibility

Full Redis cluster protocol support with:

Node Discovery: Automatic gossip protocol
Hash Slots: 16384 slot distribution
Client Redirection: Automatic MOVED responses
Cluster Commands: CLUSTER NODES, CLUSTER INFO, etc.

Basic Cluster Configuration:

[cluster]
enabled = true
node_id = "node1"
seeds = ["127.0.0.1:6380"]
advertise_addr = "127.0.0.1:6379"  # Redis address for clients
port = 6380                        # Gossip port for cluster communication

[[cluster.slot_ranges]]
start = 0
end = 8191

Development

Project Structure

src/
├── main.rs              # Entry point and CLI handling
├── config.rs            # Configuration management
├── app_state.rs         # Application state and shard routing
├── server.rs            # Redis protocol server
├── shard_manager.rs     # Shard write operations and batching
├── migration.rs         # Shard migration functionality
├── compression.rs       # Storage compression
├── metrics.rs           # Performance metrics
├── http_server.rs       # HTTP metrics endpoint
├── cluster.rs           # Redis cluster protocol
└── redis/               # Redis protocol implementation
    ├── protocol.rs      # RESP parser
    └── integration_tests.rs

Testing

Comprehensive test suite covering:

Unit Tests: RESP protocol parsing and serialization
Integration Tests: Command handling and binary data
Protocol Compliance: Redis compatibility verification

# Run all tests
cargo test

# Run with output
cargo test -- --nocapture

Key Dependencies

Tokio: Async runtime
SQLx: Async SQL toolkit
Nom: Parser combinator for RESP protocol
Serde: Serialization framework
Miette: Error handling
Tracing: Structured logging
Gumdrop: Command-line parsing

License

This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0) - see the LICENSE file for details.

The AGPL-3.0 is a copyleft license that requires anyone who distributes the code or runs it on a server to provide the source code to users, including any modifications.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
.github/workflows		.github/workflows
dev		dev
examples		examples
scripts		scripts
src		src
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Blobasaur.svg		Blobasaur.svg
CLUSTERING.md		CLUSTERING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Justfile		Justfile
LICENSE		LICENSE
METRICS.md		METRICS.md
README.md		README.md
bench.js		bench.js
benchmark.sh		benchmark.sh
config.cluster.example.toml		config.cluster.example.toml
config.example.toml		config.example.toml

License

iamd3vil/blobasaur

Folders and files

Latest commit

History

Repository files navigation

Blobasaur

Table of Contents

Features

Installation

From GitHub Releases

Prerequisites

From Source

Cross-compilation

Quick Start

CLI Usage

Metrics

Configuration

Basic Configuration

Storage Compression

Performance Tuning

Redis Commands

Basic Commands

Namespaced Commands

Using Redis Clients

Shard Migration

Overview

Usage

Migration Process

Best Practices

Performance Features

Write Batching

Asynchronous Writes

Storage Compression

Advanced Topics

Database Schema

TTL and Key Expiration

Race Condition Handling

Redis Cluster Compatibility

Development

Project Structure

Testing

Key Dependencies

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages