Skip to content

Use Airflow to run dbt #204

@witash

Description

@witash

What feature do you want to improve?
CHT sync currently uses a custom python script (dataemon) to run dbt tasks and a separate container for couch2pg, which are managed with docker compose or kubernetes.

Describe the improvement you'd like

  • Replace dataemon with airflow DAGs to run dbt tasks.
  • Optionally, add couch2pg as a DAG to manage sync jobs via airflow.

This would allow creating a different user set, 'data users' that would have some ability to control dbt runs, without requiring full Kubernetes or server access. This could include:

  • Starting, stopping, and scheduling dbt runs.
  • Manually triggering deployments of new cht-pipeline releases. For small tables, automatically updating them works, but for larger tables and more complex deployments, it may be helpful to be able to trigger a deployment manually.
  • View basic logs and results via the Airflow UI (Watchdog and Loki are the preferred tools for observability, but not everyone may have access).

Tradeoffs

  • The current setup is simple; two containers running two scripts. Introducing Airflow adds more complexity and operational overhead.

Additional context
There is another tool, Data Observability Toolkit that uses airflow to run dbt task. It may be possible to merge these two projects, which would reduce complexity rather than increasing it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions