-
Notifications
You must be signed in to change notification settings - Fork 56
Add retry execute mechanism #634
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reviewer's GuideAdds a retry-capable pod command execution method and improves handling of execution failures by ensuring execution completion is signaled and errors are logged across failure and exit paths. Sequence diagram for executeWithRetry pod command execution with failure handlingsequenceDiagram
participant Caller
participant PodShell
participant KubernetesApi
participant StateExecListener
Caller->>PodShell: executeWithRetry(commands)
loop up to 3 attempts
PodShell->>PodShell: execute(commands)
PodShell->>KubernetesApi: startExec(podName, commands, StateExecListener)
KubernetesApi-->>StateExecListener: onOpen()
alt command_execution_success
KubernetesApi-->>StateExecListener: onExit(code, status)
StateExecListener->>StateExecListener: executionDone.set(true)
StateExecListener-->>PodShell: hasExecutionFinished() == true
PodShell-->>Caller: PodShellOutput
PodShell->>PodShell: break loop
else command_execution_failure
KubernetesApi-->>StateExecListener: onFailure(throwable, response)
StateExecListener->>StateExecListener: log.error(...)
StateExecListener->>StateExecListener: executionDone.set(true)
StateExecListener-->>PodShell: hasExecutionFinished() == true
PodShell-->>PodShell: throw WaiterException
PodShell-->>Caller: propagate WaiterException
Caller-->>PodShell: WaiterException caught by executeWithRetry
PodShell->>PodShell: log.warn(attempt, podName, message)
alt attempt < 3
PodShell->>PodShell: Thread.sleep(5000)
else last_attempt
PodShell->>Caller: throw WaiterException("All attempts to execute command on pod ... have failed.")
PodShell->>PodShell: break loop
end
end
end
Updated class diagram for PodShell with executeWithRetry and StateExecListener changesclassDiagram
class PodShell {
- String podName
- Logger log
PodShellOutput execute(String[] commands)
PodShellOutput executeWithRetry(String[] commands)
}
class PodShellOutput {
- String stdout
- String stderr
}
class StateExecListener {
- AtomicBoolean executionDone
+ StateExecListener()
+ void onOpen()
+ void onFailure(Throwable throwable, Response response)
+ void onClose(int code, String reason)
+ void onExit(int code, Status status)
+ boolean hasExecutionFinished()
}
class ExecListener {
<<interface>>
+ void onOpen()
+ void onFailure(Throwable throwable, Response response)
+ void onClose(int code, String reason)
+ void onExit(int code, Status status)
}
class WaiterException {
+ WaiterException(String message)
}
class Logger {
+ void warn(String message, Object attempt, Object retryCount, Object podName, Object errorMessage)
+ void error(String message, Object podName, Object errorMessage)
}
class AtomicBoolean {
+ AtomicBoolean(boolean initialValue)
+ boolean get()
+ void set(boolean newValue)
}
class Response
class Status
PodShell "1" *-- "1" StateExecListener : creates
PodShell "1" o-- "1" PodShellOutput : returns
StateExecListener ..|> ExecListener : implements
PodShell ..> WaiterException : throws
PodShell ..> Logger : uses
StateExecListener ..> Logger : uses
StateExecListener ..> AtomicBoolean : uses
StateExecListener ..> Response : uses
StateExecListener ..> Status : uses
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
615967e to
54f446b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey - I've found 3 issues, and left some high level feedback:
- Consider extracting POD_SHELL_RETRY_COUNT and POD_SHELL_RETRY_DELAY_MS into class-level constants or configuration to avoid magic numbers and to make tuning behavior easier without code changes.
- When the retry loop exhausts all attempts, it would be helpful to include the lastException as the cause in the final WaiterException so downstream code and logs retain the original failure context.
- In onExit, you might want to log non-zero exit codes (or the Status) to distinguish expected terminations from failures, since you now rely on this hook to mark execution as finished.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider extracting POD_SHELL_RETRY_COUNT and POD_SHELL_RETRY_DELAY_MS into class-level constants or configuration to avoid magic numbers and to make tuning behavior easier without code changes.
- When the retry loop exhausts all attempts, it would be helpful to include the lastException as the cause in the final WaiterException so downstream code and logs retain the original failure context.
- In onExit, you might want to log non-zero exit codes (or the Status) to distinguish expected terminations from failures, since you now rely on this hook to mark execution as finished.
## Individual Comments
### Comment 1
<location> `core/src/main/java/cz/xtf/core/openshift/PodShell.java:78-87` </location>
<code_context>
+ WaiterException lastException = null;
</code_context>
<issue_to_address>
**issue:** The final WaiterException loses the original cause; consider chaining the last exception.
You store `lastException` but then create a new `WaiterException` without using it, which discards useful debugging context. Prefer chaining it as the cause, e.g. `throw new WaiterException("All attempts...", lastException);`, or at least incorporating its message so callers can inspect the underlying failure.
</issue_to_address>
### Comment 2
<location> `core/src/main/java/cz/xtf/core/openshift/PodShell.java:85-94` </location>
<code_context>
+ return this.execute(commands);
+ } catch (WaiterException e) {
+ lastException = e;
+ log.warn("Attempt {}/{} failed for command execution on pod {}: {}",
+ attempt, POD_SHELL_RETRY_COUNT, podName, e.getMessage());
+ if (attempt < POD_SHELL_RETRY_COUNT) {
</code_context>
<issue_to_address>
**suggestion:** Logging only the exception message drops stack traces; consider logging the Throwable itself.
Both the retry warning and the `onFailure` log pass only `getMessage()` to the logger. Instead, pass the exception as the final argument (e.g. `log.warn("Attempt {}/{} failed for command execution on pod {}", attempt, POD_SHELL_RETRY_COUNT, podName, e);` and `log.error("Execution failed in pod '{}'", podName, throwable);`) so the full stack trace and causes are logged while keeping the format consistent.
Suggested implementation:
```java
log.warn("Attempt {}/{} failed for command execution on pod {}",
attempt, POD_SHELL_RETRY_COUNT, podName, e);
```
Elsewhere in `PodShell.java` (or related classes) where failures are logged using only `throwable.getMessage()` (e.g. `log.error("Execution failed in pod '{}': {}", podName, throwable.getMessage());`), update them similarly to pass the throwable as the last parameter: `log.error("Execution failed in pod '{}'", podName, throwable);`.
</issue_to_address>
### Comment 3
<location> `core/src/main/java/cz/xtf/core/openshift/PodShell.java:108-114` </location>
<code_context>
@Override
public void onFailure(Throwable throwable, Response response) {
- // DO NOTHING
</code_context>
<issue_to_address>
**suggestion (bug_risk):** onExit ignores exit code and status; consider handling/logging non-zero exits explicitly.
Right now we only mark execution as finished and ignore failure details. If upstream logic uses `StateExecListener` to decide success or trigger retries, consider at least logging non-zero exit codes/status and/or propagating a failure state so callers can react correctly.
Suggested implementation:
```java
@Override
public void onExit(int code, Status status) {
if (code != 0) {
if (status != null) {
log.error(
"Command in pod '{}' exited with non-zero code {}. Status: reason='{}', message='{}', status='{}'",
podName,
code,
status.getReason(),
status.getMessage(),
status.getStatus()
);
} else {
log.error("Command in pod '{}' exited with non-zero code {} and no status information.", podName, code);
}
}
executionDone.set(true);
}
```
If callers should programmatically react to failures (e.g., retry on non-zero exit codes), you may also want to:
1. Add a failure flag to `StateExecListener` (e.g., `AtomicBoolean executionFailed`) and set it when `code != 0` or `onFailure` is called.
2. Expose that failure flag via an accessor or adapt existing logic that consumes `StateExecListener` to use it when deciding success vs retry.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| log.warn("Attempt {}/{} failed for command execution on pod {}: {}", | ||
| attempt, POD_SHELL_RETRY_COUNT, podName, e.getMessage()); | ||
| if (attempt < POD_SHELL_RETRY_COUNT) { | ||
| try { | ||
| Thread.sleep(POD_SHELL_RETRY_DELAY_MS); | ||
| } catch (InterruptedException ie) { | ||
| Thread.currentThread().interrupt(); | ||
| throw new RuntimeException("Thread has been interrupted!"); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Logging only the exception message drops stack traces; consider logging the Throwable itself.
Both the retry warning and the onFailure log pass only getMessage() to the logger. Instead, pass the exception as the final argument (e.g. log.warn("Attempt {}/{} failed for command execution on pod {}", attempt, POD_SHELL_RETRY_COUNT, podName, e); and log.error("Execution failed in pod '{}'", podName, throwable);) so the full stack trace and causes are logged while keeping the format consistent.
Suggested implementation:
log.warn("Attempt {}/{} failed for command execution on pod {}",
attempt, POD_SHELL_RETRY_COUNT, podName, e);Elsewhere in PodShell.java (or related classes) where failures are logged using only throwable.getMessage() (e.g. log.error("Execution failed in pod '{}': {}", podName, throwable.getMessage());), update them similarly to pass the throwable as the last parameter: log.error("Execution failed in pod '{}'", podName, throwable);.
| @Override | ||
| public void onFailure(Throwable throwable, Response response) { | ||
| // DO NOTHING | ||
| log.error("Execution failed in pod '{}': {}", podName, throwable.getMessage()); | ||
| executionDone.set(true); | ||
| } | ||
|
|
||
| @Override |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (bug_risk): onExit ignores exit code and status; consider handling/logging non-zero exits explicitly.
Right now we only mark execution as finished and ignore failure details. If upstream logic uses StateExecListener to decide success or trigger retries, consider at least logging non-zero exit codes/status and/or propagating a failure state so callers can react correctly.
Suggested implementation:
@Override
public void onExit(int code, Status status) {
if (code != 0) {
if (status != null) {
log.error(
"Command in pod '{}' exited with non-zero code {}. Status: reason='{}', message='{}', status='{}'",
podName,
code,
status.getReason(),
status.getMessage(),
status.getStatus()
);
} else {
log.error("Command in pod '{}' exited with non-zero code {} and no status information.", podName, code);
}
}
executionDone.set(true);
}If callers should programmatically react to failures (e.g., retry on non-zero exit codes), you may also want to:
- Add a failure flag to
StateExecListener(e.g.,AtomicBoolean executionFailed) and set it whencode != 0oronFailureis called. - Expose that failure flag via an accessor or adapt existing logic that consumes
StateExecListenerto use it when deciding success vs retry.
| public void onFailure(Throwable throwable, Response response) { | ||
| // DO NOTHING | ||
| log.error("Execution failed in pod '{}': {}", podName, throwable.getMessage()); | ||
| executionDone.set(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mnovak thinking about this we have now way of knowing if command succeeded this way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
last changes should clear that up
c55a177 to
db13605
Compare
db13605 to
800517b
Compare
Summary
onFailure()changes should result in fast fail.executeWithRetry()simply tries execute up to 3 times. This may take up to 3 minutes, but IMO it's preferable to making code more complex with parametrization.Summary by Sourcery
Introduce a retryable pod command execution path and ensure shell execution failures are surfaced and terminate waiting.
New Features:
Enhancements: