-
Notifications
You must be signed in to change notification settings - Fork 13
Open
3 / 63 of 6 issues completedLabels
Rodan-coreIssues pertaining to the Rodan workflow management system itselfIssues pertaining to the Rodan workflow management system itselfsummer
Description
This is going to be very ambitious: I want to restructure Rodan's architecture and revamp how the Dockerfile are being configured currently.
After discussion with @homework36, we think that Rodan could benefit vastly from a restructure:
- Each job (GPU and non-GPU) can have its own container.
- Each job has its own list of dependencies, e.g.,
tensorflow 2.5.1withkeras 2.5.0only work withpython3.7. - This could resolve A LOT of "cross dependencies" between jobs; and between the Dockerfile base image to the jobs themselves.
- Each job has its own list of dependencies, e.g.,
- Usage of Kubernetes (K8s)
- Since each job has its own container, there will be a lot of containers. K8s can help with orchestrating the system.
- Limited GPU resources on staging and production servers
- As of writing this, I'm unsure if K8s (or
docker compose) has the ability to orchestrate GPU resources. Quick research gave me Scheduling GPUs - K8s docs
- As of writing this, I'm unsure if K8s (or
- Update folder structure
- This would make development less daunting and confusing. Some folders are nested and being called from other folders
- I will follow up on this thread on how to restructure (with graphics, maybe)
- I will also create sub-issues since this is a relative big project to be done for a summer
List of Rodan issues that could benefit from this change:
- optimize Dockerfile #1289
- docker hub build failure due to package version issue #1260, relating to PRs Rebuilding Dockerfile for GPU-Celery #1294 and Optimize
gpu-celerycontainer #751 - cannot run PACO train on prod (vGPU server) #1181
- Delete files that aren't used anywhere in the code base. #1018 (why hasn't anyone merged this PR 😭)
- rewrite Dockerfile for arm64 and have a separate GPU-celery container for local rodan on m-chip machines #1288
- and many more!!
homework36homework36homework36
Sub-issues
Metadata
Metadata
Assignees
Labels
Rodan-coreIssues pertaining to the Rodan workflow management system itselfIssues pertaining to the Rodan workflow management system itselfsummer