-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Labels
Type: ImprovementMake something betterMake something better
Description
What feature do you want to improve?
CHT sync currently uses a custom python script (dataemon) to run dbt tasks and a separate container for couch2pg, which are managed with docker compose or kubernetes.
Describe the improvement you'd like
- Replace
dataemonwith airflow DAGs to run dbt tasks. - Optionally, add
couch2pgas a DAG to manage sync jobs via airflow.
This would allow creating a different user set, 'data users' that would have some ability to control dbt runs, without requiring full Kubernetes or server access. This could include:
- Starting, stopping, and scheduling dbt runs.
- Manually triggering deployments of new cht-pipeline releases. For small tables, automatically updating them works, but for larger tables and more complex deployments, it may be helpful to be able to trigger a deployment manually.
- View basic logs and results via the Airflow UI (Watchdog and Loki are the preferred tools for observability, but not everyone may have access).
Tradeoffs
- The current setup is simple; two containers running two scripts. Introducing Airflow adds more complexity and operational overhead.
Additional context
There is another tool, Data Observability Toolkit that uses airflow to run dbt task. It may be possible to merge these two projects, which would reduce complexity rather than increasing it.
Metadata
Metadata
Assignees
Labels
Type: ImprovementMake something betterMake something better