Disclaimer: This project is not affiliated with or endorsed by the Apache Software Foundation. “Apache”, “Apache Iceberg”, and related marks are trademarks of the ASF.
iceberg-compaction is a high-performance Rust-based engine that compacts Apache Iceberg™ tables efficiently and safely at scale.
- Rust-Native Performance: Low-latency, high-throughput compaction with memory safety guarantees
 - DataFusion Engine: Leverages Apache DataFusion for query planning and vectorized execution
 - Iceberg Native Support: Full compliance with Iceberg table formats via iceberg-rs
 - Multi-Cloud Ready: Currently supports AWS S3, with plans for Azure Blob Storage and GCP Cloud Storage
 
- Full Compaction: Merges all data files in an Iceberg table and removes old files
 - Deletion Support:
- Positional deletions (POS_DELETE)
 - Equality deletions (EQ_DELETE)
 
 
We provide a complete working example using a REST catalog. This example demonstrates how to use iceberg-compaction for Iceberg table compaction with a REST catalog backend:
# Navigate to the example directory
cd examples/rest-catalog
# Run the example
cargo runThe example includes:
- Setting up a REST Iceberg catalog with S3 storage
 - Configuring authentication and connection settings
 - Performing table compaction using iceberg-compaction
 
For more details, see the rest-catalog example.
- Partial compaction: Support incremental compaction strategies
 - Compaction Policy: Multiple built-in policies (size-based, time-based, cost-optimized)
 - Built-in cache: Metadata and query result caching for improved performance
 
- Spill to disk: Handle large datasets that exceed memory limits
 - Network rebuild: Robust handling of network failures and retries
 - Task breakpoint resume: Resume operations from failure points
 - E2E test framework: Comprehensive testing infrastructure
 
- Job progress display: Progress tracking
 - Comprehensive compaction metrics: Detailed performance and operation metrics
 
- Tune parquet: Configurable Parquet writer parameters
 - Fine-grained configurable compaction parameters: Extensive customization options
 
- Expire snapshot
 - Rewrite manifest
 
- Binpack/Sort/ZOrder Compaction
 - Clustering / Order by: Support for data reorganization and sorting
 - File clean: Delete orphan files