[FEAT] make ray optional and enable local backend execution (without ray)#127
[FEAT] make ray optional and enable local backend execution (without ray)#127VincentG1234 wants to merge 2 commits intoopenshift-psap:mainfrom
Conversation
📝 WalkthroughWalkthroughThe PR makes Ray an optional dependency and introduces LocalExecutionBackend as an alternative execution backend. Changes include: refactoring Ray dependencies to be lazy-loaded with fallbacks, conditionally guarding Ray API calls throughout the codebase, restructuring CancellationFlag as a private class, adding local backend selection logic to the CLI, and migrating Ray from required to optional dependencies in the project manifest. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip CodeRabbit can approve the review once all CodeRabbit's comments are resolved.Enable the |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
auto_tune_vllm/execution/backends.py (1)
591-599:⚠️ Potential issue | 🟡 Minor
cleanup_all_trialsdoesn't cancel running futures.Unlike
RayExecutionBackend, which has sophisticated cancellation logic,LocalExecutionBackend.cleanup_all_trialsis a stub that does nothing. If a user needs to abort, long-running local trials won't be interrupted. Additionally,shutdown(wait=True)will block until all submitted work completes.🔧 Suggested improvement
def cleanup_all_trials(self): - """Cleanup all active trials (stub implementation for local backend).""" - logger.info("Local backend does not require explicit trial cleanup") - # Local backend doesn't need to do anything special here - # Individual trial controllers handle their own cleanup when they complete + """Cancel all active trials and clean up resources.""" + if not self.active_futures: + logger.debug("No active trials to cleanup") + return + + logger.info(f"Cleaning up {len(self.active_futures)} active local trial(s)") + for job_id, future in list(self.active_futures.items()): + if not future.done(): + future.cancel() + logger.debug(f"Cancelled future {job_id}") + self.active_futures.clear() + logger.info("✓ Completed cleanup of all active local trials") def shutdown(self): """Shutdown thread pool executor.""" - self.executor.shutdown(wait=True) + self.executor.shutdown(wait=False, cancel_futures=True) + logger.info("Shutdown local execution backend")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@auto_tune_vllm/execution/backends.py` around lines 591 - 599, cleanup_all_trials currently is a no-op so long-running local trials aren't interrupted; update LocalExecutionBackend.cleanup_all_trials to iterate over the backend's active futures (track them if not already, e.g., the collection used when submitting tasks) and call cancel() on each running Future and clear the tracking collection, and then have shutdown call self.executor.shutdown(wait=False) (or call shutdown after cancelling with an optional timeout and then join) so the process does not block waiting for cancelled/long-running tasks to finish; reference the methods cleanup_all_trials, shutdown, self.executor and the internal active-futures collection when making the change.
🧹 Nitpick comments (1)
auto_tune_vllm/execution/backends.py (1)
248-258: Consider removing redundant Ray re-import.Lines 250-252 re-import Ray and reassign
self._ray, but this is unnecessary since__init__already imports and stores it. If__init__succeeded,self._rayis guaranteed to be set.♻️ Suggested simplification
def submit_trial(self, trial_config: TrialConfig) -> JobHandle: """Submit trial to Ray cluster.""" - import ray - - self._ray = ray + ray = self._ray from .trial_controller import RayTrialActor🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@auto_tune_vllm/execution/backends.py` around lines 248 - 258, The submit_trial method redundantly re-imports Ray and reassigns self._ray; remove the local import and assignment (the lines importing ray and setting self._ray) and instead use the ray reference stored on the instance from __init__; update the creation of CancellationFlagActor to use self._ray.remote(_CancellationFlag) (or simply use self._ray when calling remote) so symbols to change are submit_trial, self._ray, CancellationFlagActor, and _CancellationFlag.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@auto_tune_vllm/execution/backends.py`:
- Around line 591-599: cleanup_all_trials currently is a no-op so long-running
local trials aren't interrupted; update LocalExecutionBackend.cleanup_all_trials
to iterate over the backend's active futures (track them if not already, e.g.,
the collection used when submitting tasks) and call cancel() on each running
Future and clear the tracking collection, and then have shutdown call
self.executor.shutdown(wait=False) (or call shutdown after cancelling with an
optional timeout and then join) so the process does not block waiting for
cancelled/long-running tasks to finish; reference the methods
cleanup_all_trials, shutdown, self.executor and the internal active-futures
collection when making the change.
---
Nitpick comments:
In `@auto_tune_vllm/execution/backends.py`:
- Around line 248-258: The submit_trial method redundantly re-imports Ray and
reassigns self._ray; remove the local import and assignment (the lines importing
ray and setting self._ray) and instead use the ray reference stored on the
instance from __init__; update the creation of CancellationFlagActor to use
self._ray.remote(_CancellationFlag) (or simply use self._ray when calling
remote) so symbols to change are submit_trial, self._ray, CancellationFlagActor,
and _CancellationFlag.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 38cc8570-5783-45a1-83d1-64c9116a5a5d
📒 Files selected for processing (4)
auto_tune_vllm/cli/main.pyauto_tune_vllm/execution/backends.pyauto_tune_vllm/execution/trial_controller.pypyproject.toml
Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com>
7ca82f2 to
a879d94
Compare
Description
This PR makes Ray an optional dependency and adds support for a local execution backend, allowing users to run auto-tune-vllm on a single machine without a Ray cluster.
Changes
Move ray[default] from required to optional (pip install auto-tune-vllm[ray])
CLI: Add --backend local option alongside existing ray backend
Python environment options (--python-executable, --venv-path, --conda-env) now only required for Ray backend
max_concurrent_trials defaults to 1 for local backend (no longer required)
All Ray imports are now lazy/conditional - the tool works without Ray installed
Usage
Backward Compatibility
Default backend remains ray, existing commands continue to work and users with Ray installed are unaffected.
Clear error message if Ray backend is requested but not installed
Summary by CodeRabbit
New Features
Improvements
rayextra for distributed execution