Enterprise-grade ETL pipeline orchestration for Kubernetes with seamless deduplication and joins
π Quick Start β’ π Documentation β’ ποΈ Architecture β’ π€ Contributing
The GlassFlow ETL Kubernetes Operator is a production-ready Kubernetes operator that enables scalable, cloud-native data pipeline deployments. Built as a companion to the GlassFlow ClickHouse ETL project, it provides enterprise-grade data processing capabilities with advanced features like deduplication, temporal joins, and seamless pause/resume functionality.
- π Pipeline Lifecycle Management - Create, pause, resume, and terminate data pipelines
- π― Advanced Deduplication - Built-in deduplication with configurable time windows
- π Stream Joins - Seamless joining of multiple data streams
- β‘ Kubernetes Native - Full CRD-based pipeline management
- π‘οΈ Production Ready - Enterprise-grade reliability and monitoring
- π Scalable Ingestor - Efficiently reads from multiple Kafka partitions with horizontal scaling
- π§ Helm Charts - Easy deployment and configuration management
graph LR
KAFKA[Kafka Cluster]
subgraph "Kubernetes Cluster"
subgraph "GlassFlow ETL"
subgraph "Operator"
OP[Operator Controller]
CRD[Pipeline CRD]
end
subgraph "Data Pipeline"
ING[Ingestor Pods]
JOIN[Join Pod]
SINK[Sink Pod]
end
subgraph NATS_JETSTREAM["NATS JetStream"]
NATS[NATS]
DLQ[DLQ]
end
end
end
CH[ClickHouse]
subgraph "External"
API[GlassFlow API]
UI[Web UI]
end
API --> CRD
CRD --> OP
OP --> ING
OP --> JOIN
OP --> SINK
ING <--> NATS_JETSTREAM
JOIN <--> NATS_JETSTREAM
SINK <--> NATS_JETSTREAM
KAFKA --> ING
SINK --> CH
UI --> API
- Kubernetes 1.19+ cluster
- Helm 3.2.0+
- kubectl configured for your cluster
- Kafka (optional - can use external setup for development)
- ClickHouse (optional - can use external setup for development)
Deploy using the complete GlassFlow ETL stack from the GlassFlow Charts repository:
# Add GlassFlow Helm repository
helm repo add glassflow https://glassflow.github.io/charts
helm repo update
# Install complete GlassFlow ETL stack
helm install glassflow-etl glassflow/glassflow-etl
Deploy just the operator as a dependency:
# Install operator chart
helm install glassflow-operator glassflow/glassflow-operator
The operator includes automatic cleanup functionality that ensures all pipelines are immediately terminated when uninstalling:
helm uninstall glassflow-operator
This will:
- β Terminate all existing pipelines ungracefully
- β Clean up all resources (namespaces, deployments, NATS streams)
- β Remove Pipeline CRD definitions
- β Remove the operator deployment
For more details, see HELM_UNINSTALL.md.
# Clone the repository
git clone https://github.com/glassflow/glassflow-etl-k8s-operator.git
cd glassflow-etl-k8s-operator
# Install CRDs
make install
# Deploy operator
make deploy IMG=ghcr.io/glassflow/glassflow-etl-k8s-operator:latest
Create pipelines using the GlassFlow ClickHouse ETL backend API. The operator will automatically create the corresponding Pipeline CRDs. Here's an example of what the generated CRD will look like:
apiVersion: etl.glassflow.io/v1alpha1
kind: Pipeline
metadata:
name: user-events-pipeline
spec:
pipeline_id: "user-events-v1"
config: "pipeline-config"
dlq: "dead-letter-queue"
sources:
type: kafka
topics:
- topic_name: "user-events"
stream: "users"
dedup_window: 60000000000 # 1 minute in nanoseconds
join:
type: "temporal"
stream: "joined-users"
enabled: true
sink: "clickhouse"
Feature | Status | Description |
---|---|---|
Pipeline Creation | β | Deploy new ETL pipelines via CRD |
Pipeline Termination | β | Graceful shutdown and cleanup |
Pipeline Pausing | β | Temporarily halt data processing |
Pipeline Resuming | β | Resume paused pipelines |
Deduplication | β | Configurable time-window deduplication |
Stream Joins | β | Multi-stream data joining |
Auto-scaling | β | Horizontal pod autoscaling / ingestor replicas support |
Monitoring | β | Prometheus metrics integration |
Helm Uninstall Cleanup | β | Automatic pipeline termination and CRD cleanup on uninstall |
- Go 1.23+
- Docker 17.03+
- kubectl v1.11.3+
- Kind (for local testing)
- NATS (for messaging)
-
Clone and setup:
git clone https://github.com/glassflow/glassflow-etl-k8s-operator.git cd glassflow-etl-k8s-operator make help # See all available targets
-
Install dependencies:
# Install development tools make controller-gen make kustomize make golangci-lint
-
Start local infrastructure:
# Start NATS with JetStream (must run inside the cluster) helm repo add nats https://nats-io.github.io/k8s/helm/charts/ helm install nats nats/nats --set nats.jetstream.enabled=true # Start Kafka (using Helm) helm repo add bitnami https://charts.bitnami.com/bitnami helm install kafka bitnami/kafka # Start ClickHouse (using Helm) helm install clickhouse bitnami/clickhouse # Or use external Kafka/ClickHouse for development
-
Run the operator:
# Run locally (requires NATS running inside the cluster) make run
This project was built using Kubebuilder v4 and follows Kubernetes operator best practices:
βββ api/v1alpha1/ # CRD definitions
βββ internal/controller/ # Operator controller logic
βββ internal/nats/ # NATS client integration
βββ charts/ # Helm charts
βββ config/ # Kustomize configurations
βββ test/ # Unit and e2e tests
- Kubebuilder - Operator framework and scaffolding
- Kustomize - Kubernetes configuration management
- Helmify - Automatic Helm chart generation
- GolangCI-Lint - Code quality and linting
# Run e2e tests (requires Kind cluster) - Primary testing method
make test-e2e
# Run unit tests (coverage being improved)
make test
# Run linter
make lint
Chart | Purpose | Components | Use Case |
---|---|---|---|
glassflow-etl | Complete ETL Platform | UI, API, Operator, NATS | Full-featured deployment |
glassflow-operator | Operator Only | Operator, CRDs | Dependency for custom setups |
The glassflow-etl chart includes the complete platform with web UI, backend API, NATS, and the operator as dependencies. The glassflow-operator chart is designed as a dependency for the main chart or custom deployments.
- GlassFlow ClickHouse ETL - Core ETL engine and API
- GlassFlow Charts - Helm charts repository
- GlassFlow Documentation - Complete documentation
- Demo Video: Coming soon
- Live Demo: demo.glassflow.dev
- Documentation: docs.glassflow.dev
We welcome contributions!
- Fork the repository
- Create a feature branch
- Make your changes
- Run tests:
make test
- Run linter:
make lint
- Submit a pull request
This project is licensed under the Apache License 2.0 - see the clickhouse-etl LICENSE file for details.
- Slack Community: GlassFlow Hub
- Documentation: docs.glassflow.dev
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: [email protected]
Built by GlassFlow Team
Website β’ Documentation β’ GitHub