Add retry execute mechanism #634

laDok8 · 2026-01-12T09:20:19Z

Summary

I suspect that execute command can fail on loaded server, this could lead to timeout in execute method onFailure() changes should result in fast fail.
executeWithRetry() simply tries execute up to 3 times. This may take up to 3 minutes, but IMO it's preferable to making code more complex with parametrization.

Summary by Sourcery

Introduce a retryable pod command execution path and ensure shell execution failures are surfaced and terminate waiting.

New Features:

Add an executeWithRetry method to run pod shell commands with up to three attempts and delays between failures.

Enhancements:

Update pod shell execution failure handling to log errors and signal completion, including on process exit, to avoid indefinite waits.

sourcery-ai · 2026-01-12T09:20:27Z

Reviewer's Guide

Adds a retry-capable pod command execution method and improves handling of execution failures by ensuring execution completion is signaled and errors are logged across failure and exit paths.

Sequence diagram for executeWithRetry pod command execution with failure handling

sequenceDiagram
    participant Caller
    participant PodShell
    participant KubernetesApi
    participant StateExecListener

    Caller->>PodShell: executeWithRetry(commands)
    loop up to 3 attempts
        PodShell->>PodShell: execute(commands)
        PodShell->>KubernetesApi: startExec(podName, commands, StateExecListener)
        KubernetesApi-->>StateExecListener: onOpen()
        alt command_execution_success
            KubernetesApi-->>StateExecListener: onExit(code, status)
            StateExecListener->>StateExecListener: executionDone.set(true)
            StateExecListener-->>PodShell: hasExecutionFinished() == true
            PodShell-->>Caller: PodShellOutput
            PodShell->>PodShell: break loop
        else command_execution_failure
            KubernetesApi-->>StateExecListener: onFailure(throwable, response)
            StateExecListener->>StateExecListener: log.error(...)
            StateExecListener->>StateExecListener: executionDone.set(true)
            StateExecListener-->>PodShell: hasExecutionFinished() == true
            PodShell-->>PodShell: throw WaiterException
            PodShell-->>Caller: propagate WaiterException
            Caller-->>PodShell: WaiterException caught by executeWithRetry
            PodShell->>PodShell: log.warn(attempt, podName, message)
            alt attempt < 3
                PodShell->>PodShell: Thread.sleep(5000)
            else last_attempt
                PodShell->>Caller: throw WaiterException("All attempts to execute command on pod ... have failed.")
                PodShell->>PodShell: break loop
            end
        end
    end

Updated class diagram for PodShell with executeWithRetry and StateExecListener changes

classDiagram
    class PodShell {
        - String podName
        - Logger log
        PodShellOutput execute(String[] commands)
        PodShellOutput executeWithRetry(String[] commands)
    }

    class PodShellOutput {
        - String stdout
        - String stderr
    }

    class StateExecListener {
        - AtomicBoolean executionDone
        + StateExecListener()
        + void onOpen()
        + void onFailure(Throwable throwable, Response response)
        + void onClose(int code, String reason)
        + void onExit(int code, Status status)
        + boolean hasExecutionFinished()
    }

    class ExecListener {
        <<interface>>
        + void onOpen()
        + void onFailure(Throwable throwable, Response response)
        + void onClose(int code, String reason)
        + void onExit(int code, Status status)
    }

    class WaiterException {
        + WaiterException(String message)
    }

    class Logger {
        + void warn(String message, Object attempt, Object retryCount, Object podName, Object errorMessage)
        + void error(String message, Object podName, Object errorMessage)
    }

    class AtomicBoolean {
        + AtomicBoolean(boolean initialValue)
        + boolean get()
        + void set(boolean newValue)
    }

    class Response
    class Status

    PodShell "1" *-- "1" StateExecListener : creates
    PodShell "1" o-- "1" PodShellOutput : returns
    StateExecListener ..|> ExecListener : implements
    PodShell ..> WaiterException : throws
    PodShell ..> Logger : uses
    StateExecListener ..> Logger : uses
    StateExecListener ..> AtomicBoolean : uses
    StateExecListener ..> Response : uses
    StateExecListener ..> Status : uses

File-Level Changes

Change	Details	Files
Introduce executeWithRetry to rerun pod shell commands with bounded retries and delays on failure.	Add executeWithRetry method that wraps execute in a retry loop with up to three attempts. Retry loop catches WaiterException, logs a warning with attempt count and pod name, and waits 5 seconds between attempts. On InterruptedException during sleep, restore interrupt flag and throw a RuntimeException. After exhausting all attempts, throw a WaiterException indicating all attempts failed for the pod.	`core/src/main/java/cz/xtf/core/openshift/PodShell.java`
Tighten pod shell execution lifecycle handling to mark failures and exits as finished and log errors.	Update ExecListener.onFailure to log an error with pod name and mark execution as done. Ensure ExecListener.onClose marks execution as finished (unchanged behavior). Implement ExecListener.onExit to mark execution as finished when the remote command exits.	`core/src/main/java/cz/xtf/core/openshift/PodShell.java`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've found 3 issues, and left some high level feedback:

Consider extracting POD_SHELL_RETRY_COUNT and POD_SHELL_RETRY_DELAY_MS into class-level constants or configuration to avoid magic numbers and to make tuning behavior easier without code changes.
When the retry loop exhausts all attempts, it would be helpful to include the lastException as the cause in the final WaiterException so downstream code and logs retain the original failure context.
In onExit, you might want to log non-zero exit codes (or the Status) to distinguish expected terminations from failures, since you now rely on this hook to mark execution as finished.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- Consider extracting POD_SHELL_RETRY_COUNT and POD_SHELL_RETRY_DELAY_MS into class-level constants or configuration to avoid magic numbers and to make tuning behavior easier without code changes.
- When the retry loop exhausts all attempts, it would be helpful to include the lastException as the cause in the final WaiterException so downstream code and logs retain the original failure context.
- In onExit, you might want to log non-zero exit codes (or the Status) to distinguish expected terminations from failures, since you now rely on this hook to mark execution as finished.

## Individual Comments

### Comment 1
<location> `core/src/main/java/cz/xtf/core/openshift/PodShell.java:78-87` </location>
<code_context>
+        WaiterException lastException = null;
</code_context>

<issue_to_address>
**issue:** The final WaiterException loses the original cause; consider chaining the last exception.

You store `lastException` but then create a new `WaiterException` without using it, which discards useful debugging context. Prefer chaining it as the cause, e.g. `throw new WaiterException("All attempts...", lastException);`, or at least incorporating its message so callers can inspect the underlying failure.
</issue_to_address>

### Comment 2
<location> `core/src/main/java/cz/xtf/core/openshift/PodShell.java:85-94` </location>
<code_context>
+                return this.execute(commands);
+            } catch (WaiterException e) {
+                lastException = e;
+                log.warn("Attempt {}/{} failed for command execution on pod {}: {}",
+                        attempt, POD_SHELL_RETRY_COUNT, podName, e.getMessage());
+                if (attempt < POD_SHELL_RETRY_COUNT) {
</code_context>

<issue_to_address>
**suggestion:** Logging only the exception message drops stack traces; consider logging the Throwable itself.

Both the retry warning and the `onFailure` log pass only `getMessage()` to the logger. Instead, pass the exception as the final argument (e.g. `log.warn("Attempt {}/{} failed for command execution on pod {}", attempt, POD_SHELL_RETRY_COUNT, podName, e);` and `log.error("Execution failed in pod '{}'", podName, throwable);`) so the full stack trace and causes are logged while keeping the format consistent.

Suggested implementation:

```java
                log.warn("Attempt {}/{} failed for command execution on pod {}",
                        attempt, POD_SHELL_RETRY_COUNT, podName, e);

```

Elsewhere in `PodShell.java` (or related classes) where failures are logged using only `throwable.getMessage()` (e.g. `log.error("Execution failed in pod '{}': {}", podName, throwable.getMessage());`), update them similarly to pass the throwable as the last parameter: `log.error("Execution failed in pod '{}'", podName, throwable);`.
</issue_to_address>

### Comment 3
<location> `core/src/main/java/cz/xtf/core/openshift/PodShell.java:108-114` </location>
<code_context>

         @Override
         public void onFailure(Throwable throwable, Response response) {
-            // DO NOTHING
</code_context>

<issue_to_address>
**suggestion (bug_risk):** onExit ignores exit code and status; consider handling/logging non-zero exits explicitly.

Right now we only mark execution as finished and ignore failure details. If upstream logic uses `StateExecListener` to decide success or trigger retries, consider at least logging non-zero exit codes/status and/or propagating a failure state so callers can react correctly.

Suggested implementation:

```java
        @Override
        public void onExit(int code, Status status) {
            if (code != 0) {
                if (status != null) {
                    log.error(
                        "Command in pod '{}' exited with non-zero code {}. Status: reason='{}', message='{}', status='{}'",
                        podName,
                        code,
                        status.getReason(),
                        status.getMessage(),
                        status.getStatus()
                    );
                } else {
                    log.error("Command in pod '{}' exited with non-zero code {} and no status information.", podName, code);
                }
            }
            executionDone.set(true);
        }

```

If callers should programmatically react to failures (e.g., retry on non-zero exit codes), you may also want to:
1. Add a failure flag to `StateExecListener` (e.g., `AtomicBoolean executionFailed`) and set it when `code != 0` or `onFailure` is called.
2. Expose that failure flag via an accessor or adapt existing logic that consumes `StateExecListener` to use it when deciding success vs retry.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

core/src/main/java/cz/xtf/core/openshift/PodShell.java

sourcery-ai · 2026-01-12T09:24:16Z

core/src/main/java/cz/xtf/core/openshift/PodShell.java

+                log.warn("Attempt {}/{} failed for command execution on pod {}: {}",
+                        attempt, POD_SHELL_RETRY_COUNT, podName, e.getMessage());
+                if (attempt < POD_SHELL_RETRY_COUNT) {
+                    try {
+                        Thread.sleep(POD_SHELL_RETRY_DELAY_MS);
+                    } catch (InterruptedException ie) {
+                        Thread.currentThread().interrupt();
+                        throw new RuntimeException("Thread has been interrupted!");
+                    }
+                }


suggestion: Logging only the exception message drops stack traces; consider logging the Throwable itself.

Both the retry warning and the onFailure log pass only getMessage() to the logger. Instead, pass the exception as the final argument (e.g. log.warn("Attempt {}/{} failed for command execution on pod {}", attempt, POD_SHELL_RETRY_COUNT, podName, e); and log.error("Execution failed in pod '{}'", podName, throwable);) so the full stack trace and causes are logged while keeping the format consistent.

Suggested implementation:

log.warn("Attempt {}/{} failed for command execution on pod {}", attempt, POD_SHELL_RETRY_COUNT, podName, e);

Elsewhere in PodShell.java (or related classes) where failures are logged using only throwable.getMessage() (e.g. log.error("Execution failed in pod '{}': {}", podName, throwable.getMessage());), update them similarly to pass the throwable as the last parameter: log.error("Execution failed in pod '{}'", podName, throwable);.

sourcery-ai · 2026-01-12T09:24:16Z

core/src/main/java/cz/xtf/core/openshift/PodShell.java

        @Override
        public void onFailure(Throwable throwable, Response response) {
-            // DO NOTHING
+            log.error("Execution failed in pod '{}': {}", podName, throwable.getMessage());
+            executionDone.set(true);
        }

        @Override


suggestion (bug_risk): onExit ignores exit code and status; consider handling/logging non-zero exits explicitly.

Right now we only mark execution as finished and ignore failure details. If upstream logic uses StateExecListener to decide success or trigger retries, consider at least logging non-zero exit codes/status and/or propagating a failure state so callers can react correctly.

Suggested implementation:

@Override public void onExit(int code, Status status) { if (code != 0) { if (status != null) { log.error( "Command in pod '{}' exited with non-zero code {}. Status: reason='{}', message='{}', status='{}'", podName, code, status.getReason(), status.getMessage(), status.getStatus() ); } else { log.error("Command in pod '{}' exited with non-zero code {} and no status information.", podName, code); } } executionDone.set(true); }

If callers should programmatically react to failures (e.g., retry on non-zero exit codes), you may also want to:

Add a failure flag to StateExecListener (e.g., AtomicBoolean executionFailed) and set it when code != 0 or onFailure is called.

Expose that failure flag via an accessor or adapt existing logic that consumes StateExecListener to use it when deciding success vs retry.

laDok8 · 2026-01-12T12:13:54Z

core/src/main/java/cz/xtf/core/openshift/PodShell.java

        public void onFailure(Throwable throwable, Response response) {
-            // DO NOTHING
+            log.error("Execution failed in pod '{}': {}", podName, throwable.getMessage());
+            executionDone.set(true);


@mnovak thinking about this we have now way of knowing if command succeeded this way

last changes should clear that up

laDok8 assigned mnovak Jan 12, 2026

laDok8 force-pushed the fix_os_timeout branch from 615967e to 54f446b Compare January 12, 2026 09:23

sourcery-ai bot reviewed Jan 12, 2026

View reviewed changes

laDok8 commented Jan 12, 2026

View reviewed changes

laDok8 force-pushed the fix_os_timeout branch 2 times, most recently from c55a177 to db13605 Compare January 12, 2026 12:46

Add retry execute mechanism

800517b

laDok8 force-pushed the fix_os_timeout branch from db13605 to 800517b Compare January 13, 2026 09:26

mnovak approved these changes Jan 14, 2026

View reviewed changes

mnovak merged commit 7453e76 into xtf-cz:master Jan 14, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add retry execute mechanism #634

Add retry execute mechanism #634

Uh oh!

laDok8 commented Jan 12, 2026 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot commented Jan 12, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Uh oh!

Uh oh!

sourcery-ai bot Jan 12, 2026

Uh oh!

sourcery-ai bot Jan 12, 2026

Uh oh!

laDok8 Jan 12, 2026

Uh oh!

laDok8 Jan 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add retry execute mechanism #634

Add retry execute mechanism #634

Uh oh!

Conversation

laDok8 commented Jan 12, 2026 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for executeWithRetry pod command execution with failure handling

Updated class diagram for PodShell with executeWithRetry and StateExecListener changes

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sourcery-ai bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

laDok8 Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

laDok8 Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

laDok8 commented Jan 12, 2026 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Jan 12, 2026 •

edited

Loading