Replies: 6 comments 5 replies
-
|
Fully support both statements:
Pros and cons Decouple scheduling and api:
Migrate to event driven architecture:
We initially thought to go that route, but lended up with current implementation due to some reasons :) one of them is time to implementation. But eventually we need to migrate to that, for sure! |
Beta Was this translation helpful? Give feedback.
-
|
I agree and the data event oriented is a must |
Beta Was this translation helpful? Give feedback.
-
|
On structure: I agree in separating into separate apps On architecture: I'm a bit lukewarm here. Do we have the volume to necessitate a move to an event-driven architecture? Assuming for sake of argument that we stick with a polling approach for now, how easy would it be to migrate to an event-driven approach in the future if we did need to make that move? All things being equal, I'd advocate for keeping the implementation as simple to support our workloads. |
Beta Was this translation helpful? Give feedback.
-
|
oh, definitely not now :) |
Beta Was this translation helpful? Give feedback.
-
|
This is interesting. When this is released, we can create and watch rayjob CRD instance so that we can get events from kubernetes. |
Beta Was this translation helpful? Give feedback.
-
|
Another benefit to separating the scheduler and gateway would be easier pod name autocompletes 😆 |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
For future reads the context comes from this PR: #570
There are two parts where I would like to see improvements over the current implementation: structure, architecture.
Structure
At this moment, all the logic is fully integrated with the API application. I would like to propose a little refactor around the scheduler, moving it out from the API to a new application called
scheduler. My main purpose with this change is to separate responsibilities and scopes. From my perspective this will make the code be easier to follow.Architecture
The current implementation has a combination of
django-commandspluspolling loopthat can be difficult to follow & maintain. To solve this problem we have different approaches. An implementation that I was thinking is based inevent-drivenusing a combination ofdjango-signalspluscelery. This implementation would have some core keys:Something that I took in consideration when I was analyzing different solutions is that due to the Ray's decentralized scheduler can be tricky to analyze the status of Ray (I leave some references here):
That is one of the main reasons because I couldn't attach the task/event creation to the ray's job finish. I continue investigating this approach though:
I would like to hear opinions from you too: @pacomf , @IceKhan13 , @psschwei , @akihikokuroda 😄
Beta Was this translation helpful? Give feedback.
All reactions