Skip to content

Conversation

@HorjuRares
Copy link
Contributor

@HorjuRares HorjuRares commented Nov 3, 2025

Issues: #450

  • Introduced new command for retrieving events with specified output format and field mask.
  • Implemented event subscription and listening in the server connection module.
  • Updated Cargo.toml and Cargo.lock to include the chrono dependency.

Definition of Done

The PR shall be merged only if all items mentioned in CONTRIBUTING.md have been followed. In case an item is not applicable as described, please provide a short explanation in the description.

- Introduced new command for retrieving events with specified output format and field mask.
- Implemented event subscription and listening in the server connection module.
- Updated Cargo.toml and Cargo.lock to include the chrono dependency.
@HorjuRares HorjuRares marked this pull request as draft November 3, 2025 07:45
@HorjuRares HorjuRares self-assigned this Nov 18, 2025
@HorjuRares HorjuRares marked this pull request as ready for review November 18, 2025 13:57
@HorjuRares HorjuRares added the enhancement New feature or request. Issue will appear in the change log "Features" label Nov 18, 2025
@HorjuRares
Copy link
Contributor Author

@krucod3 I have a local integration branch were both ank get events and the event handler are merged into the feature branch and the system tests are passing without any problem. Please proceed further with the review of this PR.

@inf17101
Copy link
Contributor

@HorjuRares: You can still update the branch with the one from the event handler.

@inf17101 inf17101 mentioned this pull request Dec 5, 2025
15 tasks
@inf17101
Copy link
Contributor

inf17101 commented Dec 8, 2025

@GabyUnalaq: I have analyzed the PR and here are further tasks to get this PR ready:

  1. merge current feature branch into it & resolve conflicts
  2. remove/replace the FilteredTypes for the event output because of the incoming API unifications
  3. test it again with the latest binaries of event handler branch (the event server logic)

For point 2, I think we still need a custom object to be able to serialize to yaml and json with serde because of the timestamp and custom altered field output logic in the cli. This needs to be fixed.

@GabyUnalaq GabyUnalaq self-assigned this Dec 8, 2025
@github-actions
Copy link

Uncovered requirements found:

swdd~agent-supports-restart-policies~1: uncovered impl
swdd~common-workload-states-supported-states~1: uncovered impl
swdd~common-workload-state-transitions~1: uncovered impl
swdd~common-workload-state-additional-information~1: uncovered impl
swdd~common-workload-state-identification~1: uncovered impl,utest
swdd~workload-add-conditions-for-dependencies~1: uncovered impl
swdd~common-workload-needs-control-interface~1: uncovered impl
swdd~common-workload-execution-instance-naming~1: uncovered impl
swdd~common-access-rules-filter-mask-convention~1: uncovered stest

@github-actions
Copy link

Uncovered requirements found:

swdd~agent-supports-restart-policies~1: uncovered impl
swdd~common-workload-states-supported-states~1: uncovered impl
swdd~common-workload-state-transitions~1: uncovered impl
swdd~common-workload-state-additional-information~1: uncovered impl
swdd~common-workload-state-identification~1: uncovered impl,utest
swdd~workload-add-conditions-for-dependencies~1: uncovered impl
swdd~common-workload-needs-control-interface~1: uncovered impl
swdd~common-workload-execution-instance-naming~1: uncovered impl
swdd~common-access-rules-filter-mask-convention~1: uncovered stest

@GabyUnalaq GabyUnalaq added the ignore-req-tracing Ignore diffs if requirement tracing reports label Dec 15, 2025
use serde::{Deserialize, Serialize};

#[derive(Debug, Clone, Default, PartialEq, Serialize, Deserialize)]
struct EventOutput {
Copy link
Contributor

@inf17101 inf17101 Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need this rename below, otherwise all system tests are going into the timeout because they are trying to get completeState from the event dictionary but the key exists as snake case complete_state. I verified this also with the CLI with -o json.

Suggested change
struct EventOutput {
#[serde(rename_all = "camelCase")]
struct EventOutput {

I see also general problems with the current implementation of the event usage in the stests:

  1. The code is checking for Pending(Initial), but currently, we do not send events after starting up the server with a manifest. So pending initial states are never received after a startup only after an update in a running cluster! However some stests are using startup manifests, so they are affected. => if we should send those events at startup will be discussed on Wednesday in the team.
  2. The current sequential order of the keywords in the stests can cause timing issues in the event subscriptions of the CLI. Explanation: The current order is 1. Server startup, 2. agent startups, 3. CLI event subscriptions to wait for initial execution states => if the event subscription is delayed the stest might miss important events and runs into the timeout for checking initial states. The correct order is: 1. Server startup, 2. CLI event subscriptions, 3. agent startups
  3. If the timeout is reached caused by empty dictionaries (complete_state, workload_states, agent_workloads in the python code), a warning is output, but the stests are still passing.
  4. When for a lot of stests the timeout of 60 sec is hit to wait for initial execution state events, the whole CI/CD pipeline can take 50 min runtime like here in this run: https://github.com/eclipse-ankaios/ankaios/actions/runs/20713173138/job/59458188540?pr=616
  5. If there is no startup manifest provided to the server always the timeout of 60 sec is hit leading to delays in the stests => in this case we can remove the wait for initial state via events because there are no workloads and a manifest is applied later with ank apply. I will mark an example test in the PR where this is the case.

Even after fixing the camel case issue, 37 stests are failing locally on my side. Let's discuss the current approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is fixed now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No not all of those points.


Status: approved

The Ankaios CLI shall provide a function to get real-time state change events from the Ankaios server with options to specify output format (JSON or YAML) and field mask for filtering events to specific parts of the state.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The Ankaios CLI shall provide a function to get real-time state change events from the Ankaios server with options to specify output format (JSON or YAML) and field mask for filtering events to specific parts of the state.
The Ankaios CLI shall provide a function to get real-time state change events from the Ankaios server with the following options to specify the:
* output format (JSON or YAML)
* field mask for filtering events to specific parts of the state.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at how other requirements are written; we can leave it as it was before. In addition, I believe we can remove the output format examples, because there is already another requirement covering the output formats that should be allowed by the ank cli (swdd~cli-supports-multiple-output-types-for-events~1)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, so removing the output format and then we do not need the enumeration.

while (get_time_secs() - start_time) < timeout:
if process.poll() is not None:
stderr = process.stderr.read()
logger.warn(f"Event listener process terminated: {stderr}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I would agree with the break here, but in our stests sometimes the CLI cannot connect to server because the server might not be ready and running its API, and same like with get state command with the polling in the stests the error Could not connect to Ankaios server on 'http://127.0.0.1:25551' might happen in a stest but still passes because the subsequent get state calls will succeed when the server will be ready. However, this can also happen when doing a ank get events instead, but with the break the return None below is executed and the stest will fail with Event listener process terminated: error: Could not connect to Ankaios server on 'http://127.0.0.1:25551'. What should be done is to reconnect the ank get events or making sure the server is up and running when doing the event subscription. Because, currently stest are sporadically failing because of this.

@inf17101
Copy link
Contributor

inf17101 commented Jan 9, 2026

@GabyUnalaq: I will take over this branch. Would be great if you can do a quick review later if I am finished.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request. Issue will appear in the change log "Features" ignore-req-tracing Ignore diffs if requirement tracing reports ready for review

Projects

Development

Successfully merging this pull request may close these issues.

4 participants