Skip to content

Conversation

@laDok8
Copy link
Contributor

@laDok8 laDok8 commented Jan 12, 2026

Summary

  • I suspect that execute command can fail on loaded server, this could lead to timeout in execute method onFailure() changes should result in fast fail.
  • executeWithRetry() simply tries execute up to 3 times. This may take up to 3 minutes, but IMO it's preferable to making code more complex with parametrization.

Summary by Sourcery

Introduce a retryable pod command execution path and ensure shell execution failures are surfaced and terminate waiting.

New Features:

  • Add an executeWithRetry method to run pod shell commands with up to three attempts and delays between failures.

Enhancements:

  • Update pod shell execution failure handling to log errors and signal completion, including on process exit, to avoid indefinite waits.

@sourcery-ai
Copy link

sourcery-ai bot commented Jan 12, 2026

Reviewer's Guide

Adds a retry-capable pod command execution method and improves handling of execution failures by ensuring execution completion is signaled and errors are logged across failure and exit paths.

Sequence diagram for executeWithRetry pod command execution with failure handling

sequenceDiagram
    participant Caller
    participant PodShell
    participant KubernetesApi
    participant StateExecListener

    Caller->>PodShell: executeWithRetry(commands)
    loop up to 3 attempts
        PodShell->>PodShell: execute(commands)
        PodShell->>KubernetesApi: startExec(podName, commands, StateExecListener)
        KubernetesApi-->>StateExecListener: onOpen()
        alt command_execution_success
            KubernetesApi-->>StateExecListener: onExit(code, status)
            StateExecListener->>StateExecListener: executionDone.set(true)
            StateExecListener-->>PodShell: hasExecutionFinished() == true
            PodShell-->>Caller: PodShellOutput
            PodShell->>PodShell: break loop
        else command_execution_failure
            KubernetesApi-->>StateExecListener: onFailure(throwable, response)
            StateExecListener->>StateExecListener: log.error(...)
            StateExecListener->>StateExecListener: executionDone.set(true)
            StateExecListener-->>PodShell: hasExecutionFinished() == true
            PodShell-->>PodShell: throw WaiterException
            PodShell-->>Caller: propagate WaiterException
            Caller-->>PodShell: WaiterException caught by executeWithRetry
            PodShell->>PodShell: log.warn(attempt, podName, message)
            alt attempt < 3
                PodShell->>PodShell: Thread.sleep(5000)
            else last_attempt
                PodShell->>Caller: throw WaiterException("All attempts to execute command on pod ... have failed.")
                PodShell->>PodShell: break loop
            end
        end
    end
Loading

Updated class diagram for PodShell with executeWithRetry and StateExecListener changes

classDiagram
    class PodShell {
        - String podName
        - Logger log
        PodShellOutput execute(String[] commands)
        PodShellOutput executeWithRetry(String[] commands)
    }

    class PodShellOutput {
        - String stdout
        - String stderr
    }

    class StateExecListener {
        - AtomicBoolean executionDone
        + StateExecListener()
        + void onOpen()
        + void onFailure(Throwable throwable, Response response)
        + void onClose(int code, String reason)
        + void onExit(int code, Status status)
        + boolean hasExecutionFinished()
    }

    class ExecListener {
        <<interface>>
        + void onOpen()
        + void onFailure(Throwable throwable, Response response)
        + void onClose(int code, String reason)
        + void onExit(int code, Status status)
    }

    class WaiterException {
        + WaiterException(String message)
    }

    class Logger {
        + void warn(String message, Object attempt, Object retryCount, Object podName, Object errorMessage)
        + void error(String message, Object podName, Object errorMessage)
    }

    class AtomicBoolean {
        + AtomicBoolean(boolean initialValue)
        + boolean get()
        + void set(boolean newValue)
    }

    class Response
    class Status

    PodShell "1" *-- "1" StateExecListener : creates
    PodShell "1" o-- "1" PodShellOutput : returns
    StateExecListener ..|> ExecListener : implements
    PodShell ..> WaiterException : throws
    PodShell ..> Logger : uses
    StateExecListener ..> Logger : uses
    StateExecListener ..> AtomicBoolean : uses
    StateExecListener ..> Response : uses
    StateExecListener ..> Status : uses
Loading

File-Level Changes

Change Details Files
Introduce executeWithRetry to rerun pod shell commands with bounded retries and delays on failure.
  • Add executeWithRetry method that wraps execute in a retry loop with up to three attempts.
  • Retry loop catches WaiterException, logs a warning with attempt count and pod name, and waits 5 seconds between attempts.
  • On InterruptedException during sleep, restore interrupt flag and throw a RuntimeException.
  • After exhausting all attempts, throw a WaiterException indicating all attempts failed for the pod.
core/src/main/java/cz/xtf/core/openshift/PodShell.java
Tighten pod shell execution lifecycle handling to mark failures and exits as finished and log errors.
  • Update ExecListener.onFailure to log an error with pod name and mark execution as done.
  • Ensure ExecListener.onClose marks execution as finished (unchanged behavior).
  • Implement ExecListener.onExit to mark execution as finished when the remote command exits.
core/src/main/java/cz/xtf/core/openshift/PodShell.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 3 issues, and left some high level feedback:

  • Consider extracting POD_SHELL_RETRY_COUNT and POD_SHELL_RETRY_DELAY_MS into class-level constants or configuration to avoid magic numbers and to make tuning behavior easier without code changes.
  • When the retry loop exhausts all attempts, it would be helpful to include the lastException as the cause in the final WaiterException so downstream code and logs retain the original failure context.
  • In onExit, you might want to log non-zero exit codes (or the Status) to distinguish expected terminations from failures, since you now rely on this hook to mark execution as finished.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Consider extracting POD_SHELL_RETRY_COUNT and POD_SHELL_RETRY_DELAY_MS into class-level constants or configuration to avoid magic numbers and to make tuning behavior easier without code changes.
- When the retry loop exhausts all attempts, it would be helpful to include the lastException as the cause in the final WaiterException so downstream code and logs retain the original failure context.
- In onExit, you might want to log non-zero exit codes (or the Status) to distinguish expected terminations from failures, since you now rely on this hook to mark execution as finished.

## Individual Comments

### Comment 1
<location> `core/src/main/java/cz/xtf/core/openshift/PodShell.java:78-87` </location>
<code_context>
+        WaiterException lastException = null;
</code_context>

<issue_to_address>
**issue:** The final WaiterException loses the original cause; consider chaining the last exception.

You store `lastException` but then create a new `WaiterException` without using it, which discards useful debugging context. Prefer chaining it as the cause, e.g. `throw new WaiterException("All attempts...", lastException);`, or at least incorporating its message so callers can inspect the underlying failure.
</issue_to_address>

### Comment 2
<location> `core/src/main/java/cz/xtf/core/openshift/PodShell.java:85-94` </location>
<code_context>
+                return this.execute(commands);
+            } catch (WaiterException e) {
+                lastException = e;
+                log.warn("Attempt {}/{} failed for command execution on pod {}: {}",
+                        attempt, POD_SHELL_RETRY_COUNT, podName, e.getMessage());
+                if (attempt < POD_SHELL_RETRY_COUNT) {
</code_context>

<issue_to_address>
**suggestion:** Logging only the exception message drops stack traces; consider logging the Throwable itself.

Both the retry warning and the `onFailure` log pass only `getMessage()` to the logger. Instead, pass the exception as the final argument (e.g. `log.warn("Attempt {}/{} failed for command execution on pod {}", attempt, POD_SHELL_RETRY_COUNT, podName, e);` and `log.error("Execution failed in pod '{}'", podName, throwable);`) so the full stack trace and causes are logged while keeping the format consistent.

Suggested implementation:

```java
                log.warn("Attempt {}/{} failed for command execution on pod {}",
                        attempt, POD_SHELL_RETRY_COUNT, podName, e);

```

Elsewhere in `PodShell.java` (or related classes) where failures are logged using only `throwable.getMessage()` (e.g. `log.error("Execution failed in pod '{}': {}", podName, throwable.getMessage());`), update them similarly to pass the throwable as the last parameter: `log.error("Execution failed in pod '{}'", podName, throwable);`.
</issue_to_address>

### Comment 3
<location> `core/src/main/java/cz/xtf/core/openshift/PodShell.java:108-114` </location>
<code_context>

         @Override
         public void onFailure(Throwable throwable, Response response) {
-            // DO NOTHING
</code_context>

<issue_to_address>
**suggestion (bug_risk):** onExit ignores exit code and status; consider handling/logging non-zero exits explicitly.

Right now we only mark execution as finished and ignore failure details. If upstream logic uses `StateExecListener` to decide success or trigger retries, consider at least logging non-zero exit codes/status and/or propagating a failure state so callers can react correctly.

Suggested implementation:

```java
        @Override
        public void onExit(int code, Status status) {
            if (code != 0) {
                if (status != null) {
                    log.error(
                        "Command in pod '{}' exited with non-zero code {}. Status: reason='{}', message='{}', status='{}'",
                        podName,
                        code,
                        status.getReason(),
                        status.getMessage(),
                        status.getStatus()
                    );
                } else {
                    log.error("Command in pod '{}' exited with non-zero code {} and no status information.", podName, code);
                }
            }
            executionDone.set(true);
        }

```

If callers should programmatically react to failures (e.g., retry on non-zero exit codes), you may also want to:
1. Add a failure flag to `StateExecListener` (e.g., `AtomicBoolean executionFailed`) and set it when `code != 0` or `onFailure` is called.
2. Expose that failure flag via an accessor or adapt existing logic that consumes `StateExecListener` to use it when deciding success vs retry.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines 85 to 87
log.warn("Attempt {}/{} failed for command execution on pod {}: {}",
attempt, POD_SHELL_RETRY_COUNT, podName, e.getMessage());
if (attempt < POD_SHELL_RETRY_COUNT) {
try {
Thread.sleep(POD_SHELL_RETRY_DELAY_MS);
} catch (InterruptedException ie) {
Thread.currentThread().interrupt();
throw new RuntimeException("Thread has been interrupted!");
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Logging only the exception message drops stack traces; consider logging the Throwable itself.

Both the retry warning and the onFailure log pass only getMessage() to the logger. Instead, pass the exception as the final argument (e.g. log.warn("Attempt {}/{} failed for command execution on pod {}", attempt, POD_SHELL_RETRY_COUNT, podName, e); and log.error("Execution failed in pod '{}'", podName, throwable);) so the full stack trace and causes are logged while keeping the format consistent.

Suggested implementation:

                log.warn("Attempt {}/{} failed for command execution on pod {}",
                        attempt, POD_SHELL_RETRY_COUNT, podName, e);

Elsewhere in PodShell.java (or related classes) where failures are logged using only throwable.getMessage() (e.g. log.error("Execution failed in pod '{}': {}", podName, throwable.getMessage());), update them similarly to pass the throwable as the last parameter: log.error("Execution failed in pod '{}'", podName, throwable);.

Comment on lines 108 to 117
@Override
public void onFailure(Throwable throwable, Response response) {
// DO NOTHING
log.error("Execution failed in pod '{}': {}", podName, throwable.getMessage());
executionDone.set(true);
}

@Override
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): onExit ignores exit code and status; consider handling/logging non-zero exits explicitly.

Right now we only mark execution as finished and ignore failure details. If upstream logic uses StateExecListener to decide success or trigger retries, consider at least logging non-zero exit codes/status and/or propagating a failure state so callers can react correctly.

Suggested implementation:

        @Override
        public void onExit(int code, Status status) {
            if (code != 0) {
                if (status != null) {
                    log.error(
                        "Command in pod '{}' exited with non-zero code {}. Status: reason='{}', message='{}', status='{}'",
                        podName,
                        code,
                        status.getReason(),
                        status.getMessage(),
                        status.getStatus()
                    );
                } else {
                    log.error("Command in pod '{}' exited with non-zero code {} and no status information.", podName, code);
                }
            }
            executionDone.set(true);
        }

If callers should programmatically react to failures (e.g., retry on non-zero exit codes), you may also want to:

  1. Add a failure flag to StateExecListener (e.g., AtomicBoolean executionFailed) and set it when code != 0 or onFailure is called.
  2. Expose that failure flag via an accessor or adapt existing logic that consumes StateExecListener to use it when deciding success vs retry.

public void onFailure(Throwable throwable, Response response) {
// DO NOTHING
log.error("Execution failed in pod '{}': {}", podName, throwable.getMessage());
executionDone.set(true);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mnovak thinking about this we have now way of knowing if command succeeded this way

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

last changes should clear that up

@laDok8 laDok8 force-pushed the fix_os_timeout branch 2 times, most recently from c55a177 to db13605 Compare January 12, 2026 12:46
@mnovak mnovak merged commit 7453e76 into xtf-cz:master Jan 14, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants