Skip to content

iamd3vil/blobasaur

Repository files navigation

Blobasaur Logo

Blobasaur

Blobasaur is a high-performance, sharded blob storage server written in Rust. It implements the Redis protocol for client compatibility and uses SQLite as the backend for each shard, providing a simple yet robust solution for storing and retrieving binary large objects (blobs).

Table of Contents

Features

  • 🚀 High Performance Sharding: Distributes data across multiple SQLite databases using multi-probe consistent hashing for optimal concurrency and scalability
  • 🔄 Shard Migration: Built-in support for migrating data between different shard configurations with data integrity verification
  • 🔌 Redis Protocol Compatible: Full compatibility with Redis clients using standard commands (GET, SET, DEL, EXISTS, HGET, HSET, HDEL, HEXISTS)
  • ⚡ Asynchronous Operations: Built on Tokio for non-blocking I/O and efficient concurrent request handling
  • 💾 SQLite Backend: Each shard uses its own SQLite database for simple deployment and reliable storage
  • 🗜️ Storage Compression: Configurable compression with multiple algorithms (Gzip, Zstd, Lz4, Brotli)
  • 📊 Namespacing: Hash-based namespacing using HGET/HSET commands for logical data organization
  • 🏷️ Metadata Tracking: Automatic tracking of creation time, update time, expiration, and version numbers
  • ⏰ TTL Support: Redis-compatible key expiration with SET EX/PX, TTL, and EXPIRE commands plus automatic background cleanup
  • 🌐 Redis Cluster Support: Full cluster protocol support with automatic node discovery and client redirection
  • ⚙️ Highly Configurable: Flexible configuration for shards, compression, batching, and performance tuning

Installation

From GitHub Releases

You can download pre-compiled binaries for Linux from the GitHub Releases page.

  1. Download the latest release:

    For example, to download version v0.1.0:

    wget https://github.com/iamd3vil/blobasaur/releases/download/v0.1.0/blobasaur-Linux-musl-x86_64.tar.gz
  2. Extract the archive:

    tar -xvf blobasaur-Linux-musl-x86_64.tar.gz
  3. Run the binary: The binary inside the archive is named blobasaur. You can run it directly:

    ./blobasaur

Prerequisites

  • Rust toolchain (latest stable recommended) - Install Rust
  • just (optional but recommended) - A command runner for development tasks
    cargo install just
  • cross (optional) - For cross-compilation to different targets
    cargo install cross

From Source

You'll need to build from source:

  1. Clone the repository:

    git clone https://github.com/zerodha/blobasaur.git
    cd blobasaur
  2. Build the project:

    # Debug build
    cargo build
    # or using just
    just build
    
    # Release build (recommended for production)
    cargo build --release
    # or using just
    just build-release
  3. Install the binary:

    # Install to ~/.cargo/bin
    cargo install --path .
    
    # Or copy the binary manually
    cp target/release/blobasaur /usr/local/bin/

Cross-compilation

For Linux production deployments, you can create a static binary:

# Build a static Linux MUSL binary
just build-linux

# Binary will be at: target/x86_64-unknown-linux-musl/release/blobasaur

Quick Start

  1. Create a configuration file (config.toml):

    data_dir = "./data"
    num_shards = 4
    addr = "0.0.0.0:6379"
  2. Run the server:

    blobasaur
    # or with custom config
    blobasaur --config /path/to/config.toml
  3. Test with redis-cli:

    redis-cli -p 6379 SET mykey "Hello, World!"
    redis-cli -p 6379 GET mykey
    
    # Test TTL functionality
    redis-cli -p 6379 SET session:123 "user_data" EX 60  # Expires in 60 seconds
    redis-cli -p 6379 TTL session:123                    # Check remaining time
    redis-cli -p 6379 EXPIRE mykey 300                   # Set 5 minute expiration

CLI Usage

Blobasaur supports multiple modes of operation:

# Run the server (default behavior)
blobasaur
# or explicitly
blobasaur serve

# Specify a custom config file
blobasaur --config /path/to/config.toml

# Shard migration commands
blobasaur shard migrate <old_shard_count> <new_shard_count>

# Get help
blobasaur --help

Available Commands:

  • serve (default): Runs the Blobasaur server
  • shard migrate: Migrates data between different shard configurations

Global Options:

  • --config, -c: Path to configuration file (default: config.toml)
  • --help: Display help information

Metrics

Blobasaur exposes a wide range of metrics for monitoring, compatible with Prometheus. You can track command latency, connection counts, cache performance, and much more.

To enable the metrics server, add the following to your config.toml:

[metrics]
enabled = true
addr = "0.0.0.0:9090"

For a complete list of available metrics and example usage with Prometheus and Grafana, please see the METRICS.md documentation.

Configuration

Basic Configuration

Blobasaur is configured via a config.toml file:

# Required settings
data_dir = "/var/data/blobasaur"    # Directory for SQLite databases
num_shards = 8                      # Number of shards (must be > 0)

# Optional settings
addr = "0.0.0.0:6379"              # Server bind address
async_write = false                 # Enable async writes
batch_size = 1                      # Write batch size
batch_timeout_ms = 0               # Batch timeout in milliseconds

Storage Compression

Configure compression for data at rest:

[storage_compression]
enabled = true
algorithm = "zstd"  # Options: "none", "gzip", "zstd", "lz4", "brotli"
level = 3           # Compression level (algorithm-specific)

Compression Options:

  • Zstd: Best balance of speed and compression ratio (recommended)
  • Lz4: Fastest compression, lower ratio
  • Gzip: Good compatibility, moderate performance
  • Brotli: Best compression ratio, slower

Performance Tuning

For high-throughput scenarios:

# High-performance configuration
async_write = true
batch_size = 100
batch_timeout_ms = 10

[storage_compression]
enabled = true
algorithm = "lz4"
level = 1

Redis Commands

Basic Commands

Blobasaur implements core Redis commands for blob operations:

  • SET key value [EX seconds] [PX milliseconds]: Store or replace a blob with optional TTL

    # Basic SET
    redis-cli SET mykey "Hello, World!"
    
    # SET with TTL in seconds
    redis-cli SET mykey "Hello, World!" EX 60
    
    # SET with TTL in milliseconds  
    redis-cli SET mykey "Hello, World!" PX 30000
  • GET key: Retrieve a blob (excludes expired keys)

    redis-cli GET mykey
  • DEL key: Delete a blob

    redis-cli DEL mykey
  • EXISTS key: Check if a blob exists (excludes expired keys)

    redis-cli EXISTS mykey
  • TTL key: Get the remaining time to live for a key

    redis-cli TTL mykey
    # Returns:
    # -1 if key exists but has no expiration
    # -2 if key does not exist or has expired
    # positive number: remaining TTL in seconds
  • EXPIRE key seconds: Set expiration time for an existing key

    redis-cli EXPIRE mykey 120  # Expire in 2 minutes
    # Returns:
    # 1 if expiration was set successfully
    # 0 if key does not exist

Namespaced Commands

Use namespaces to organize data into logical groups:

  • HSET namespace key value: Store in namespace

    redis-cli HSET users:123 name "John Doe"
    redis-cli HSET users:123 email "[email protected]"
  • HGET namespace key: Retrieve from namespace

    redis-cli HGET users:123 name
  • HDEL namespace key: Delete from namespace

    redis-cli HDEL users:123 email
  • HEXISTS namespace key: Check existence in namespace

    redis-cli HEXISTS users:123 name

Using Redis Clients

Any Redis client works with Blobasaur:

# Python example
import redis
r = redis.Redis(host='localhost', port=6379)
r.set('mykey', 'myvalue')
value = r.get('mykey')

# Namespaced operations
r.hset('users:123', 'name', 'John Doe')
name = r.hget('users:123', 'name')

Shard Migration

Overview

Shard migration allows you to change the number of shards in your deployment, enabling you to scale storage and optimize performance.

The migration process:

  1. Creates new SQLite databases for the target shard configuration
  2. Redistributes existing data according to the new consistent hashing ring
  3. Maintains data integrity throughout the process
  4. Optionally verifies migration success

Usage

# Basic migration from 4 shards to 8 shards
blobasaur shard migrate 4 8

# Migration with custom data directory
blobasaur shard migrate 4 8 --data-dir /custom/path/to/data

# Migration with verification
blobasaur shard migrate 4 8 --verify

# Migration using specific config file
blobasaur --config /path/to/config.toml shard migrate 4 8

Command Options:

  • old_shard_count: Current number of shards
  • new_shard_count: Target number of shards
  • --data-dir, -d: Override data directory
  • --verify: Run verification after migration

Migration Process

  1. Validation: Checks source shard databases exist and are accessible
  2. Preparation: Creates new SQLite databases for target configuration
  3. Data Transfer: Migrates data in batches using new hash ring
  4. Verification (optional): Compares data integrity between old and new shards

Best Practices

Before Migration:

# 1. Stop the server
pkill blobasaur

# 2. Backup your data
cp -r /var/data/blobasaur /var/data/blobasaur-backup

# 3. Test on a copy first
cp -r /var/data/blobasaur /tmp/test-migration

Migration Workflow:

# 4. Run migration with verification
blobasaur shard migrate 4 8 --verify

# 5. Update config.toml
# Change: num_shards = 4
# To:     num_shards = 8

# 6. Start server with new configuration
blobasaur

Important Considerations:

  • Downtime Required: Server must be stopped during migration
  • Disk Space: Need space for both old and new databases
  • Testing: Always test migrations on data copies first
  • Recovery: Keep backups for rollback if needed

Performance Features

Write Batching

Improves throughput by batching multiple operations:

batch_size = 50          # Operations per batch
batch_timeout_ms = 10    # Max wait time for batch

Benefits:

  • Higher write throughput
  • Reduced SQLite transaction overhead
  • Better resource utilization

Asynchronous Writes

Enables immediate responses while writes happen in background:

async_write = true

Features:

  • Immediate response to clients
  • Inflight cache prevents race conditions
  • Maintains consistency guarantees

Storage Compression

Reduces storage requirements and can improve I/O performance:

[storage_compression]
enabled = true
algorithm = "zstd"
level = 3

Performance Impact:

  • Zstd: 20-40% size reduction, minimal CPU overhead
  • Lz4: 15-25% size reduction, fastest compression
  • Brotli: 40-60% size reduction, higher CPU usage

Advanced Topics

Database Schema

Default Table (blobs):

CREATE TABLE blobs (
    key TEXT PRIMARY KEY,
    value BLOB NOT NULL,
    created_at INTEGER NOT NULL,
    updated_at INTEGER NOT NULL,
    expires_at INTEGER,
    version INTEGER NOT NULL DEFAULT 0
);

Namespaced Tables (blobs_{namespace}):

  • Created automatically on first access
  • Same schema as default table
  • Isolated from other namespaces

TTL and Key Expiration

Features:

  • Redis-Compatible TTL: Full support for SET EX/PX, TTL, and EXPIRE commands
  • Automatic Expiry: All read operations (GET, EXISTS, TTL) automatically filter expired keys
  • Background Cleanup: Per-shard cleanup tasks run every 60 seconds to remove expired keys
  • Efficient Storage: Uses indexed expires_at timestamps for fast expiry queries

Implementation Details:

  • Expiration timestamps stored as Unix epoch seconds in expires_at column
  • Database indexes on expires_at for efficient cleanup queries
  • Background cleanup processes both main blobs table and namespaced tables
  • Race-free expiry checking: keys are considered expired at query time

Usage Examples:

# Set with 60 second expiration
redis-cli SET session:abc123 "user_data" EX 60

# Check remaining TTL  
redis-cli TTL session:abc123

# Add expiration to existing key
redis-cli EXPIRE permanent_key 3600

Race Condition Handling

The Problem: Async writes could cause GET requests to miss recently SET data.

The Solution: Inflight cache system:

  • Tracks pending write operations
  • Serves data from cache during async writes
  • Automatic cleanup after database commits

Redis Cluster Compatibility

Full Redis cluster protocol support with:

  • Node Discovery: Automatic gossip protocol
  • Hash Slots: 16384 slot distribution
  • Client Redirection: Automatic MOVED responses
  • Cluster Commands: CLUSTER NODES, CLUSTER INFO, etc.

Basic Cluster Configuration:

[cluster]
enabled = true
node_id = "node1"
seeds = ["127.0.0.1:6380"]
advertise_addr = "127.0.0.1"
port = 6379

[[cluster.slot_ranges]]
start = 0
end = 8191

Development

Project Structure

src/
├── main.rs              # Entry point and CLI handling
├── config.rs            # Configuration management
├── app_state.rs         # Application state and shard routing
├── server.rs            # Redis protocol server
├── shard_manager.rs     # Shard write operations and batching
├── migration.rs         # Shard migration functionality
├── compression.rs       # Storage compression
├── metrics.rs           # Performance metrics
├── http_server.rs       # HTTP metrics endpoint
├── cluster.rs           # Redis cluster protocol
└── redis/               # Redis protocol implementation
    ├── protocol.rs      # RESP parser
    └── integration_tests.rs

Testing

Comprehensive test suite covering:

  • Unit Tests: RESP protocol parsing and serialization
  • Integration Tests: Command handling and binary data
  • Protocol Compliance: Redis compatibility verification
# Run all tests
cargo test

# Run with output
cargo test -- --nocapture

Key Dependencies

  • Tokio: Async runtime
  • SQLx: Async SQL toolkit
  • Nom: Parser combinator for RESP protocol
  • Serde: Serialization framework
  • Miette: Error handling
  • Tracing: Structured logging
  • Gumdrop: Command-line parsing

License

This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0) - see the LICENSE file for details.

The AGPL-3.0 is a copyleft license that requires anyone who distributes the code or runs it on a server to provide the source code to users, including any modifications.

About

A Blob store based on SQLite compatible with Redis protocol

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages