A high-performance market data collection system for Hyperliquid DEX that streams trade data via WebSocket and compresses it using ZSTD for efficient storage.
This project was inspired by my class project in the High Frequency Trading Technology class (IE 421) at the University of Illinois. The original assignment required implementing a high-performance compressor for cryptocurrency market data from major Centralized Exchanges (see group project repository). I've since expanded the concept to create this comprehensive market data collection system for Hyperliquid DEX.
About the Author: Keshav Ramamurthy, rising junior at the University of Illinois majoring in Computer Science + Advertising.
- Python WebSocket Client: Connects to Hyperliquid API and streams trade data
- C++ Compressor: High-performance ZSTD compression with custom binary format
- Daily File Rotation: Automatically creates new files each day
- Multi-threaded: Separate threads for WebSocket handling and compression
- Multi-currency Support: Concurrent collection for multiple cryptocurrencies
- Real-time trade data collection from Hyperliquid
- Multi-currency support: Collect data for multiple coins simultaneously
- Custom binary format with delta encoding for optimal compression
- ZSTD compression for efficient storage
- Automatic reconnection with exponential backoff
- Graceful shutdown handling
- Daily file rotation
- Configurable compression threads
- Real-time latency monitoring: Track end-to-end latency from trade to processing
- Detailed performance metrics: Trade rates, data throughput, compression efficiency
- Per-currency statistics: Individual monitoring for each cryptocurrency
- Comprehensive logging: Multi-level logging with currency-specific context
- Performance analysis tools: Built-in analysis scripts for performance assessment
- Sub-500ms latency in optimal conditions
- Concurrent WebSocket connections for multiple currencies
- Efficient compression with excellent byte-per-trade ratios
- Live performance reporting every minute
-
Build the project:
make
-
Install Python dependencies:
pip3 install websockets
-
Run data collection:
# Single currency ./run.sh BTC 4 # Multiple currencies (recommended) ./run.sh "BTC,ETH,SOL" 4
./run.sh <coin_symbols> [num_zstd_threads]Arguments:
coin_symbols: Single currency (e.g.,BTC) or comma-separated list (e.g.,"BTC,ETH,SOL")num_zstd_threads: Number of compression threads per currency (default: 1)
# Single currency
./run.sh BTC # Collect BTC trades (1 thread)
./run.sh ETH 4 # Collect ETH trades (4 threads)
# Multiple currencies
./run.sh "BTC,ETH" # Collect BTC and ETH (1 thread each)
./run.sh "BTC,ETH,SOL,DOGE" 2 # Collect 4 currencies (2 threads each)# Multi-currency with enhanced logging
python3 hyperliquid_client.py "BTC,ETH,SOL" 4
# Single currency
python3 hyperliquid_client.py BTC 2The system provides real-time performance metrics in the logs:
2025-05-29 00:32:38,299 - [BTC] Processed 6 trades | Latency: 360.5ms | Total trades: 128 | Data: 35.1KB
2025-05-29 00:32:39,101 - [ETH] Processed 8 trades | Latency: 1233.1ms | Total trades: 90 | Data: 24.6KB
Run the built-in performance analyzer:
python3 analyze_performance.py [log_file]This provides:
- Real-time latency analysis
- Per-currency performance statistics
- Compression efficiency metrics
- System recommendations
π OVERALL STATISTICS
Collection period: 2025-05-29 00:32:29 to 2025-05-29 00:32:45
Duration: 15.6 seconds (0.3 minutes)
Total currencies monitored: 2
Total trade messages processed: 58
π PER-CURRENCY PERFORMANCE
BTC: 138 trades | 8.83 trades/sec | Avg latency: 367.2ms
ETH: 103 trades | 6.59 trades/sec | Avg latency: 503.2ms
β‘ REAL-TIME ASSESSMENT: β οΈ ACCEPTABLE (1233.1ms 95th percentile)
πΎ COMPRESSION: β
EXCELLENT (25.9 bytes/trade)
Data is stored in: data/YYYY-MM-DD/COIN/trades.zst
Example structure:
data/
βββ 2025-05-29/
β βββ BTC/
β β βββ trades.zst
β βββ ETH/
β β βββ trades.zst
β βββ SOL/
β βββ trades.zst
βββ 2025-05-28/
βββ BTC/
βββ trades.zst
The binary format includes:
- File header with magic bytes "CDT2"
- Ticker symbol and exchange identifier
- Compressed trade data with delta encoding
System Requirements:
- macOS or Linux
- Python 3.7+
- C++17 compatible compiler
Libraries:
- ZSTD (compression)
- simdjson (JSON parsing)
- websockets (Python WebSocket client)
# Install dependencies (macOS with Homebrew)
brew install zstd simdjson
# Install Python dependencies
pip3 install websockets
# Build
make clean && make- hyperliquid.log: Detailed application logs with performance metrics
- Real-time latency measurements
- Per-currency trade statistics
- Connection status and errors
- Latency tracking: End-to-end latency from trade timestamp to processing
- Throughput monitoring: Trades per second per currency
- Compression efficiency: Bytes per trade analysis
- Connection health: Reconnection events and network status
Press Ctrl+C to stop gracefully. The system will:
- Close all WebSocket connections
- Flush all buffers for each currency
- Finalize compression streams
- Print final statistics
- Clean up resources
Based on testing, the system provides:
- Excellent performance: Sub-500ms latency (95th percentile)
- Good performance: 500ms-1000ms latency
- Acceptable performance: 1-2 second latency
- Compression efficiency: Typically 25-50 bytes per trade
The multi-currency architecture allows monitoring multiple assets simultaneously with minimal impact on individual currency performance.
The custom binary format uses:
- Delta encoding for timestamps and trade IDs
- Variable-length integers (varints) for space efficiency
- Raw IEEE 754 floats for prices and quantities
- ZSTD compression for the entire stream
This results in excellent compression ratios while maintaining fast decompression speeds.