A high-performance, production-ready inference server written in Rust that supports model versioning, A/B testing, caching, and comprehensive monitoring.
- Model Versioning: Support multiple model versions with traffic allocation
- A/B Testing: Configure traffic distribution across different model versions
- Caching: Redis-based caching for inference results
- Rate Limiting: Configurable rate limiting per client
- Monitoring: Comprehensive metrics via Prometheus and Grafana dashboards
- Batch Processing: Efficient handling of batch inference requests
- Input Validation: Configurable input size limits and validation
- Health Checks: Built-in health monitoring endpoints
- Graceful Shutdown: Proper shutdown handling with cleanup
- Rust (latest stable version)
- Redis (optional, for caching)
- Docker and Docker Compose (optional, for containerization)
# Clone the repository
git clone https://github.com/Pewpenguin/inferstack-rs
cd inferstack-rs
# Build the project
cargo build --releaseSet up the server using environment variables defined in a .env file.
Use the .env.example file as a reference for the required structure and variable names. Ensure all necessary variables are defined before starting the server.
# Run directly
cargo run --release
# Or using Docker
docker-compose up -dGET /healthPOST /inference
Content-Type: application/json
{
"input": [[1.0, 2.0, 3.0]],
"model_version": "v1" // optional
}POST /inference
Content-Type: application/json
{
"input": [
[1.0, 2.0, 3.0],
[4.0, 5.0, 6.0]
],
"batch": true,
"model_version": "v1" // optional
}Access metrics at /metrics endpoint. Key metrics include:
inferstack_inference_total: Total inference requestsinferstack_model_version_usage_total: Usage by model versioninferstack_inference_duration_seconds: Inference latencyinferstack_cache_operations_total: Cache operation statisticsinferstack_batch_throughput_items_per_second: Batch processing performance
A pre-configured Grafana dashboard is available in monitoring/grafana/dashboards/.
- Fork the repository
- Create your feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.