Skip to content

Conversation

cilasbeltrame
Copy link

What type of PR is this? (check all applicable)

  • [ x] πŸ• Feature
  • πŸ› Bug Fix
  • [ x] πŸ“ Documentation
  • πŸ§‘β€πŸ’» Refactor
  • βœ… Test
  • πŸ€– Build or CI
  • ❓ Other (please specify)

Related Issue

Creates k8s helm chart for drone-tm

Describe this PR

This PR sets up the foundation for migrating Drone-TM from Docker Compose to Kubernetes using Helm charts and ArgoCD for GitOps deployment.

Stage/prod values will be set in k8s-infra repo

@github-actions github-actions bot added the enhancement New feature or request label Sep 6, 2025
@spwoodcock
Copy link
Member

Looking good so far!

One thing that isn't very clear about DroneTM is that it relies on OpenDroneMap (ODM) to do it's imagery processing.
We have a basic config for testing locally here: https://github.com/hotosm/drone-tm/blob/develop/compose.odm.yaml

But in production we actually use an external instance that has been graciously hosted by a friend.

In the long term, we need to be moving OpenDroneMap into the Kubernetes cluster, with autoscaling of the jobs based on demand.

There is an existing project called ClusterODM that does this with AWS, but it's doesn't have Kubernetes support:
https://github.com/opendronemap/ClusterODM

There has been a community attempt at creating k8s configs for ODM:
https://github.com/polvi/odm-kustomize
Related forum post: https://community.opendronemap.org/t/deployment-ideas-to-scale-processing-resources-to-0/23084

So in summary:

  1. As part of DroneTM, we bundle NodeODM instances.
  2. Officially these can be scaled up and down using ClusterODM (to be investigated further)
  3. But my hunch is that we probably want to pull the NodeODM instances into a separate chart, with KEDA-based autoscaling capability.
  4. This should in turn be included in the DroneTM chart as a dependency, where we have the NODE_ODM_URL variable configured. This would point to a load balanced / autoscaled set of NodeODM instances that can scale down to 1 minimum.

@cilasbeltrame
Copy link
Author

@spwoodcock quick one on KEDA:

What would be the factor for keda scalability here? Does ClusterODM/NodeODM have some queue or something we can use to scale?

@spwoodcock
Copy link
Member

spwoodcock commented Sep 24, 2025

That's very good point & I'm not 100% sure!

ClusterODM must have something in the code it scales on, but I haven't visited it for a while.

Perhaps some useful info here:
http://www.ewitton.com/ODM%E5%9C%A8%E7%BA%BF%E5%B8%AE%E5%8A%A9/large.html

Their current scaling approach via S3 no doubt uses proprietary autoscaling, perhaps based on machine load or something.

For KEDA to work well it would probably be nice to have a queue to base the scaling on. I think WebODM adds a layer on top of NodeODM processing, including a queue, so may be worth investigating.

Otherwise we could add the autoscaling in at another place, such as a bundled redis queue, or even into DroneTM itself (not ideal, as hopefully we can contribute the solution upstream).

The linked helm chart (in the issues) may also have some clues!

@spwoodcock
Copy link
Member

spwoodcock commented Sep 24, 2025

Looks like the linked kustomize config (community, not tested by us) actually uses WebODM with redis as I was thinking!

https://github.com/polvi/odm-kustomize/tree/main/base

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants