Skip to content

Conversation

@cg505
Copy link
Collaborator

@cg505 cg505 commented Nov 24, 2025

Fixes #8083.

Tested (run the relevant ones):

  • Code formatting: install pre-commit (auto-check on commit) or bash format.sh
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: /smoke-test (CI) or pytest tests/test_smoke.py (local)
  • Relevant individual tests: /smoke-test -k test_name (CI) or pytest tests/test_smoke.py::test_name (local)
  • Backward compatibility: /quicktest-core (CI) or pytest tests/smoke_tests/test_backward_compat.py (local)

@cg505 cg505 requested a review from cblmemo November 24, 2025 23:26
Copy link
Collaborator

@cblmemo cblmemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch @cg505 ! Thanks for fixing this. LGTM!

@cg505 cg505 added this to the v0.11.0 milestone Nov 24, 2025
@cg505
Copy link
Collaborator Author

cg505 commented Nov 24, 2025

/smoke-test --remote-server

@cg505 cg505 requested a review from cblmemo November 25, 2025 00:18
@cg505
Copy link
Collaborator Author

cg505 commented Nov 25, 2025

/smoke-test

Copy link
Collaborator

@cblmemo cblmemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this @cg505 ! Mostly LGTM. left a discussion

Comment on lines +189 to +193
logger.warning(
f'{colorama.Fore.RED}Consolidation mode for jobs is enabled, '
f'but the controller cluster {controller_cn} is still running. '
'Please terminate the controller cluster first.'
f'{colorama.Style.RESET_ALL}')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feeling like we should print out how many running jobs are still on the job controller here - just to get a sense of how many resources are still on the old deployment. same applies for serve.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will make this call much slower, so I think it could be a problem.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Lets just keep this :)

@cg505
Copy link
Collaborator Author

cg505 commented Nov 25, 2025

smoke test failures are on nightly

@cg505 cg505 merged commit 717a13c into skypilot-org:master Nov 25, 2025
30 of 33 checks passed
cg505 added a commit that referenced this pull request Nov 25, 2025
* ignore restart file on the first run

* avoid crashing the server on inconsistent consolidation mode config

* Revert "avoid crashing the server on inconsistent consolidation mode config"

This reverts commit dfa985e.

* only use a warning for inconsistent consolidation mode
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

restarting API server with postgres & consolidation mode can fail if jobs are running

2 participants