This ETL framework is designed to extract data from 100+ sources, transform using Pandas/Arrow, and load into Apache Iceberg format on AWS S3.
- Source connectors: MySQL, Oracle, SQL Server, MongoDB, .dat files
- Modular, config-driven pipeline
- Pandas โ Arrow โ Iceberg on S3
- Prefect-based orchestration
python main.py mysql_customersOr trigger via Prefect UI/CLI:
prefect deployment build flows/etl_flow.py:etl_flow -n etl-deployment
prefect deployment apply etl_flow-deployment.yaml
prefect agent startSee config/etl_config.yaml
Define AWS creds in .env file or via environment variables