A high-performance time-series database engine built from scratch in C++ and exposed via a Python/FastAPI service. This project is a deep dive into the fundamentals of data engineering, focusing on performance, storage efficiency, and professional software practices.
The system is designed to ingest numerical time-series data, store it efficiently using custom compression, and query it with low latency.
- Custom C++ Storage Engine: Core logic is written in modern C++ (C++17) for maximum performance and control over memory.
- High-Performance Querying: A time-sharded storage architecture minimizes I/O, enabling sub-millisecond latencies for hot-cache reads.
- Efficient Compression: Implemented bespoke, time-series-specific compression algorithms (Delta-of-Delta and XOR) to drastically reduce storage footprint.
- Modern API Layer: A clean, documented API is provided using Python 3 and FastAPI for easy integration.
- Professional Tooling: Fully containerized with Docker, built with CMake, and unit-tested via Pytest and a C++ suite.
Benchmarks were run on a local WSL2 environment by ingesting and querying a dataset of 1,000,000 pseudo-realistic data points. The results demonstrate a clear trade-off, prioritizing extremely fast read latencies.
| Metric | Result | Analysis |
|---|---|---|
| Storage Efficiency | ~8.2 bytes/point | 50% reduction in storage compared to uncompressed 16-byte data points, via custom compression on high-entropy data. |
| Hot Query Latency (p99) | ~1.3 ms | Querying a 1-hour window of recent data (3,600 points); quick due to page caching and efficient decompression. |
| Cold Query Latency (p99) | ~16 ms | Querying a 24-hour window of older data (86,400 points); time-sharding avoids full scan of entire dataset. |
| Ingestion Throughput | ~5,500 points/sec | Baseline performance; bottleneck is per-point file I/O—batch ingestion API proposed for optimization. |
1.By designing the compression algorithms for the data's specific patterns, the engine achieves a 50% storage reduction on high-entropy data, a result generic algorithms don't generally achieve.
2.By architecting the I/O pattern to align with the OS page cache, the engine delivers p99 latencies of just 1.3ms on hot-cache reads—a level of performance typically associated with highly-optimized, low-level systems.
graph TD;
subgraph "Docker Container"
Client[External Client] -- "HTTP POST /api/ingest(JSON)" --> FastAPI;
Client -- "HTTP GET /api/query(JSON)" --> FastAPI;
FastAPI[Python FastAPI Server] -- "Calls Function" --> CtypesBridge[Python ctypes Bridge];
CtypesBridge -- "Loads & Calls" --> CppEngine[libinsight.so];
subgraph "C++ Storage Engine"
CppEngine -- "Writes/Reads Compressed Data" --> Shards[Time-Sharded .bin Files];
end
end
This project is fully containerized and can be built and run with two commands.
- Docker
- Git
Clone the repository and run the Docker build command from the project root.
git clone https://github.com/KaranSinghDev/cpp-time-series-database.git
cd cpp-time-series-database
docker build -t insight-service .docker run -p 8000:8000 insight-serviceThe service is now running and accessible at http://127.0.0.1:8000. Interactive API documentation is available at http://127.0.0.1:8000/docs.
For developers who wish to build and test the components manually.
- A C++17 compiler (
g++) & CMake - Python 3.10+ & pip
cd engine
cmake -B build
cmake --build build./engine/build/engine_test# From the project root
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txtLD_LIBRARY_PATH=./engine/build pytest