-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Suspend/Resume feature on SparkApplication #2387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
e2c2b7a
to
4c5b807
Compare
feb525d
to
7f87e65
Compare
@everpeace Maybe u can resolve this conflict so that this PR can continue to move forward? we also think about use kueue with sparkoperator , depend this suspend state feature . It would be better if you could push further down ~ |
7f87e65
to
4d6dd1d
Compare
@Kevinz857 Hi, thanks for your comment!! I rebaed my PR. |
My original
However, I already opened a PR for this which will be needed to support |
/assign @jacobsalway @vara-bonthu @ImpSy |
@Kevinz857 Would you mind help reviewing the PR? |
@ChenYi015 Very happy, I will review this PR together in detail later. |
@everpeace Suggestions for Improvement
// Suspend indicates whether the SparkApplication should be suspended.
Add unit tests for the suspension logic to ensure the controller skips job submission when @ChenYi015 If have time, we can take a look together to see if these suggestions are suitable @everpeace Or if there are any better suggestions, we can discuss them together |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
cd6bae6
to
7926292
Compare
@Kevinz857 Thanks for your review.
hmmm.
Perhaps, you saw spark-operator/internal/controller/scheduledsparkapplication/controller.go Lines 109 to 112 in 4d6dd1d
Could you elaborate on your suggestion if I missed something?
hmmm. I think I already did. spark-operator/pkg/common/event.go Lines 33 to 38 in 4d6dd1d
Could you elaborate on your suggestion if I missed something?
Thanks. Fixed in b3f8e8c
I think it's already covered: spark-operator/internal/controller/sparkapplication/controller_test.go Lines 579 to 584 in 4d6dd1d
Could you elaborate on your suggestion if I missed something?
IIUC, the current |
7926292
to
b3f8e8c
Compare
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
/remove-lifecycle stale |
Awesome effort! We had multiple people asking for the integration, so I'm happy to see it rolling. Overall LGTM, let me comment here on the API itself, and I may also leave some small-ish comments to the code.
I don't remember / wasn't involved with the decision when introducing the field to core k8s as a ptr. I found this question from @soltysh in the k8s design, but there was no explicit answer: kubernetes/enhancements#2234 (comment), just the confirmation of the decision to use ptr in the comment. Without knowing exact details, I think this is a general practice in k8s APIs to introduce new fields as pointers. I think it is because of the client-go compatibility. So, old API clients may fail to parse the API output after the field is introduced, even if it is not used in the cluster. Users may want to use the old clients because technically the API remains the v1beta2 version. Maybe @deads2k could validate that statement to keep me honest. |
@everpeace please rebase |
…cationStates(Suspending/Suspended/Resuming) Signed-off-by: Shingo Omura <[email protected]>
40e300d
to
f6e439d
Compare
…r and its unit tests Signed-off-by: Shingo Omura <[email protected]>
Signed-off-by: Shingo Omura <[email protected]>
Signed-off-by: Shingo Omura <[email protected]>
Signed-off-by: Shingo Omura <[email protected]>
…on loop. Signed-off-by: Shingo Omura <[email protected]>
… make it more descriptive Signed-off-by: Shingo Omura <[email protected]>
Signed-off-by: Shingo Omura <[email protected]>
f6e439d
to
43bcea2
Compare
|
||
r.recordSparkApplicationEvent(app) | ||
|
||
_ = r.submitSparkApplication(ctx, app) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you need to fail on error here and retry?
If the handling of error is not necessary, it is probably worth adding a comment why, because it is far from obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I noticed this pattern is used in several places in the controller. Let me inspect this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The returned error of r.submitSparkApplication
was not used anywhere in the controller. Instead, the submission result(success/failure) was recorded in app.Status
. So I just removed the returned error from the method. fixed in f4719c1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, btw, my question was mostly exploratory, so if the repo maintainers prefer the old style, then I'm also fine with that.
84bc712
to
602be42
Compare
It makes sense. I changed suspend field as pointer in 43bcea2. |
Signed-off-by: Shingo Omura <[email protected]>
Signed-off-by: Shingo Omura <[email protected]>
e219987
to
f4719c1
Compare
@mimowo all your comments are addressed. PTAL 🙇 |
LGTM, thank you 👍 |
Just to clarify, since my name was called out the reason for going with
Hope that helps. |
LGTM, @andreyvelich what are the next steps for merging / releasing this enhancement? |
/assign @nabuskey @vara-bonthu @ChenYi015 @yuchaoran2011 Please can you help with reviewing this PR? We see a lot of interest of native integration between Kueue and SparkApplication CRD. |
Purpose of this PR
This PR implements Suspend/Resume operation on SparkApplication.
Proposed changes:
Suspend: true|false
field inSparkApplicationSpec
**
--Suspend: true
-->SUSPENDING
-->SUSPENDED
(**
=any non-Terminal Application State)SUSPENDING | SUSPENDED
--Suspend: false
-->RESUMING
-->SUBMITTED
Fixes #2290
Change Category
Rationale
Supporting Suspend/Resume feature could make Kueue integration easier.
Checklist
Additional Notes