Skip to content

keshramamurthy/hyperliquid-market-data

Repository files navigation

Hyperliquid Market Data Collector

A high-performance market data collection system for Hyperliquid DEX that streams trade data via WebSocket and compresses it using ZSTD for efficient storage.

About

This project was inspired by my class project in the High Frequency Trading Technology class (IE 421) at the University of Illinois. The original assignment required implementing a high-performance compressor for cryptocurrency market data from major Centralized Exchanges (see group project repository). I've since expanded the concept to create this comprehensive market data collection system for Hyperliquid DEX.

About the Author: Keshav Ramamurthy, rising junior at the University of Illinois majoring in Computer Science + Advertising.

Architecture

  • Python WebSocket Client: Connects to Hyperliquid API and streams trade data
  • C++ Compressor: High-performance ZSTD compression with custom binary format
  • Daily File Rotation: Automatically creates new files each day
  • Multi-threaded: Separate threads for WebSocket handling and compression
  • Multi-currency Support: Concurrent collection for multiple cryptocurrencies

Features

Core Features

  • Real-time trade data collection from Hyperliquid
  • Multi-currency support: Collect data for multiple coins simultaneously
  • Custom binary format with delta encoding for optimal compression
  • ZSTD compression for efficient storage
  • Automatic reconnection with exponential backoff
  • Graceful shutdown handling
  • Daily file rotation
  • Configurable compression threads

Enhanced Monitoring & Logging

  • Real-time latency monitoring: Track end-to-end latency from trade to processing
  • Detailed performance metrics: Trade rates, data throughput, compression efficiency
  • Per-currency statistics: Individual monitoring for each cryptocurrency
  • Comprehensive logging: Multi-level logging with currency-specific context
  • Performance analysis tools: Built-in analysis scripts for performance assessment

Real-time Performance

  • Sub-500ms latency in optimal conditions
  • Concurrent WebSocket connections for multiple currencies
  • Efficient compression with excellent byte-per-trade ratios
  • Live performance reporting every minute

Quick Start

  1. Build the project:

    make
  2. Install Python dependencies:

    pip3 install websockets
  3. Run data collection:

    # Single currency
    ./run.sh BTC 4
    
    # Multiple currencies (recommended)
    ./run.sh "BTC,ETH,SOL" 4

Usage

Basic Usage

./run.sh <coin_symbols> [num_zstd_threads]

Arguments:

  • coin_symbols: Single currency (e.g., BTC) or comma-separated list (e.g., "BTC,ETH,SOL")
  • num_zstd_threads: Number of compression threads per currency (default: 1)

Examples

# Single currency
./run.sh BTC           # Collect BTC trades (1 thread)
./run.sh ETH 4         # Collect ETH trades (4 threads)

# Multiple currencies
./run.sh "BTC,ETH"     # Collect BTC and ETH (1 thread each)
./run.sh "BTC,ETH,SOL,DOGE" 2  # Collect 4 currencies (2 threads each)

Direct Python Usage

# Multi-currency with enhanced logging
python3 hyperliquid_client.py "BTC,ETH,SOL" 4

# Single currency
python3 hyperliquid_client.py BTC 2

Performance Analysis

Real-time Monitoring

The system provides real-time performance metrics in the logs:

2025-05-29 00:32:38,299 - [BTC] Processed 6 trades | Latency: 360.5ms | Total trades: 128 | Data: 35.1KB
2025-05-29 00:32:39,101 - [ETH] Processed 8 trades | Latency: 1233.1ms | Total trades: 90 | Data: 24.6KB

Performance Analysis Tool

Run the built-in performance analyzer:

python3 analyze_performance.py [log_file]

This provides:

  • Real-time latency analysis
  • Per-currency performance statistics
  • Compression efficiency metrics
  • System recommendations

Sample Performance Report

πŸ“Š OVERALL STATISTICS
   Collection period: 2025-05-29 00:32:29 to 2025-05-29 00:32:45
   Duration: 15.6 seconds (0.3 minutes)
   Total currencies monitored: 2
   Total trade messages processed: 58

πŸ“ˆ PER-CURRENCY PERFORMANCE
   BTC: 138 trades | 8.83 trades/sec | Avg latency: 367.2ms
   ETH: 103 trades | 6.59 trades/sec | Avg latency: 503.2ms

⚑ REAL-TIME ASSESSMENT: ⚠️ ACCEPTABLE (1233.1ms 95th percentile)
πŸ’Ύ COMPRESSION: βœ… EXCELLENT (25.9 bytes/trade)

Output Format

Data is stored in: data/YYYY-MM-DD/COIN/trades.zst

Example structure:

data/
β”œβ”€β”€ 2025-05-29/
β”‚   β”œβ”€β”€ BTC/
β”‚   β”‚   └── trades.zst
β”‚   β”œβ”€β”€ ETH/
β”‚   β”‚   └── trades.zst
β”‚   └── SOL/
β”‚       └── trades.zst
└── 2025-05-28/
    └── BTC/
        └── trades.zst

The binary format includes:

  • File header with magic bytes "CDT2"
  • Ticker symbol and exchange identifier
  • Compressed trade data with delta encoding

Dependencies

System Requirements:

  • macOS or Linux
  • Python 3.7+
  • C++17 compatible compiler

Libraries:

  • ZSTD (compression)
  • simdjson (JSON parsing)
  • websockets (Python WebSocket client)

Building from Source

# Install dependencies (macOS with Homebrew)
brew install zstd simdjson

# Install Python dependencies
pip3 install websockets

# Build
make clean && make

Monitoring & Logging

Log Files

  • hyperliquid.log: Detailed application logs with performance metrics
  • Real-time latency measurements
  • Per-currency trade statistics
  • Connection status and errors

Performance Metrics

  • Latency tracking: End-to-end latency from trade timestamp to processing
  • Throughput monitoring: Trades per second per currency
  • Compression efficiency: Bytes per trade analysis
  • Connection health: Reconnection events and network status

Stopping the Service

Press Ctrl+C to stop gracefully. The system will:

  1. Close all WebSocket connections
  2. Flush all buffers for each currency
  3. Finalize compression streams
  4. Print final statistics
  5. Clean up resources

Real-time Performance Characteristics

Based on testing, the system provides:

  • Excellent performance: Sub-500ms latency (95th percentile)
  • Good performance: 500ms-1000ms latency
  • Acceptable performance: 1-2 second latency
  • Compression efficiency: Typically 25-50 bytes per trade

The multi-currency architecture allows monitoring multiple assets simultaneously with minimal impact on individual currency performance.

File Format Details

The custom binary format uses:

  • Delta encoding for timestamps and trade IDs
  • Variable-length integers (varints) for space efficiency
  • Raw IEEE 754 floats for prices and quantities
  • ZSTD compression for the entire stream

This results in excellent compression ratios while maintaining fast decompression speeds.

About

A Python and C++ application to collect and compress Hyperliquid market data in real-time.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages