Run Snowflake SQL dialect on your data lake in 30 seconds. Zero dependencies.
Start Embucket and run your first query in 30 seconds:
docker run --name embucket --rm -p 3000:3000 embucket/embucketRun the Snowflake CLI against the local endpoint:
pip install snowflake-cli
snow sql -c local -a local -u embucket -p embucket -q "select 1;"Done. You just ran Snowflake SQL dialect against the local Embucket instance with zero configuration.
Important: External volumes must be created via YAML configuration at startup.
Define volumes and databases by pointing embucketd at a YAML config file.
Using Docker:
docker run --name embucket --rm -p 3000:3000 \
-v $PWD/config:/app/config \
embucket/embucket \
./embucketd --metastore-config config/metastore.yamlUsing cargo:
cargo run -p embucketd -- \
--no-bootstrap \
--metastore-config config/metastore.yamlSample configuration (config/metastore.yaml):
volumes:
# S3 Tables volume - connects to AWS S3 Table Bucket
- ident: demo
type: s3-tables
database: demo
credentials:
credential_type: access_key
aws-access-key-id: YOUR_ACCESS_KEY
aws-secret-access-key: YOUR_SECRET_KEY
arn: arn:aws:s3tables:us-east-2:123456789012:bucket/my-table-bucket
databases:
- ident: my_db
volume: demo
# S3 volume - connects to standard S3 bucket
# - ident: volume
# type: s3
# bucket: my-data-bucket
# credentials:
# credential_type: access_key
# aws-access-key-id: YOUR_ACCESS_KEY
# aws-secret-access-key: YOUR_SECRET_KEYUpdate the credentials and ARN/bucket details with your own values for real deployments.
Embucket provides a single binary that gives you a wire-compatible Snowflake replacement:
- Snowflake SQL dialect and API: Use your existing queries, dbt projects, and BI tools
- Apache Iceberg storage: Your data stays in open formats on object storage
- Zero dependencies: No databases, no clusters, no configuration files
- Query-per-node: Each instance handles complete queries independently
Perfect for teams who want Snowflake's simplicity with bring-your-own-cloud control. Built on proven open source:
- Apache DataFusion for SQL execution
- Apache Iceberg for ACID table metadata
Escape the dilemma: choose between cloud provider lakehouses (Redshift, BigQuery) or operational complexity (do-it-yourself lakehouse).
- Radical simplicity - Single binary deployment
- Snowflake SQL dialect compatibility - Works with your existing tools
- Open data - Apache Iceberg format, no lock-in
- Horizontal scaling - Add nodes for more throughput
- Zero operations - No external dependencies to manage
git clone https://github.com/Embucket/embucket.git
cd embucket && cargo build
./target/debug/embucketdContributions welcome. To get involved:
- Fork the repository on GitHub
- Create a new branch for your feature or bug fix
- Submit a pull request with a detailed description
For more details, see CONTRIBUTING.md.
This project uses the Apache 2.0 License. See LICENSE for details.