-
Notifications
You must be signed in to change notification settings - Fork 25
ank get events support
#616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: feature/450_event-support
Are you sure you want to change the base?
Conversation
- Introduced new command for retrieving events with specified output format and field mask. - Implemented event subscription and listening in the server connection module. - Updated Cargo.toml and Cargo.lock to include the chrono dependency.
|
@krucod3 I have a local integration branch were both |
|
@HorjuRares: You can still update the branch with the one from the event handler. |
|
@GabyUnalaq: I have analyzed the PR and here are further tasks to get this PR ready:
For point 2, I think we still need a custom object to be able to serialize to yaml and json with serde because of the timestamp and custom altered field output logic in the cli. This needs to be fixed. |
|
Uncovered requirements found: |
|
Uncovered requirements found: |
| use serde::{Deserialize, Serialize}; | ||
|
|
||
| #[derive(Debug, Clone, Default, PartialEq, Serialize, Deserialize)] | ||
| struct EventOutput { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need this rename below, otherwise all system tests are going into the timeout because they are trying to get completeState from the event dictionary but the key exists as snake case complete_state. I verified this also with the CLI with -o json.
| struct EventOutput { | |
| #[serde(rename_all = "camelCase")] | |
| struct EventOutput { |
I see also general problems with the current implementation of the event usage in the stests:
- The code is checking for
Pending(Initial), but currently, we do not send events after starting up the server with a manifest. So pending initial states are never received after a startup only after an update in a running cluster! However some stests are using startup manifests, so they are affected. => if we should send those events at startup will be discussed on Wednesday in the team. - The current sequential order of the keywords in the stests can cause timing issues in the event subscriptions of the CLI. Explanation: The current order is 1. Server startup, 2. agent startups, 3. CLI event subscriptions to wait for initial execution states => if the event subscription is delayed the stest might miss important events and runs into the timeout for checking initial states. The correct order is: 1. Server startup, 2. CLI event subscriptions, 3. agent startups
- If the timeout is reached caused by empty dictionaries (
complete_state,workload_states,agent_workloadsin the python code), a warning is output, but the stests are still passing. - When for a lot of stests the timeout of 60 sec is hit to wait for initial execution state events, the whole CI/CD pipeline can take 50 min runtime like here in this run: https://github.com/eclipse-ankaios/ankaios/actions/runs/20713173138/job/59458188540?pr=616
- If there is no startup manifest provided to the server always the timeout of 60 sec is hit leading to delays in the stests => in this case we can remove the wait for initial state via events because there are no workloads and a manifest is applied later with
ank apply. I will mark an example test in the PR where this is the case.
Even after fixing the camel case issue, 37 stests are failing locally on my side. Let's discuss the current approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is fixed now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No not all of those points.
|
|
||
| Status: approved | ||
|
|
||
| The Ankaios CLI shall provide a function to get real-time state change events from the Ankaios server with options to specify output format (JSON or YAML) and field mask for filtering events to specific parts of the state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| The Ankaios CLI shall provide a function to get real-time state change events from the Ankaios server with options to specify output format (JSON or YAML) and field mask for filtering events to specific parts of the state. | |
| The Ankaios CLI shall provide a function to get real-time state change events from the Ankaios server with the following options to specify the: | |
| * output format (JSON or YAML) | |
| * field mask for filtering events to specific parts of the state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at how other requirements are written; we can leave it as it was before. In addition, I believe we can remove the output format examples, because there is already another requirement covering the output formats that should be allowed by the ank cli (swdd~cli-supports-multiple-output-types-for-events~1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine, so removing the output format and then we do not need the enumeration.
| while (get_time_secs() - start_time) < timeout: | ||
| if process.poll() is not None: | ||
| stderr = process.stderr.read() | ||
| logger.warn(f"Event listener process terminated: {stderr}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I would agree with the break here, but in our stests sometimes the CLI cannot connect to server because the server might not be ready and running its API, and same like with get state command with the polling in the stests the error Could not connect to Ankaios server on 'http://127.0.0.1:25551' might happen in a stest but still passes because the subsequent get state calls will succeed when the server will be ready. However, this can also happen when doing a ank get events instead, but with the break the return None below is executed and the stest will fail with Event listener process terminated: error: Could not connect to Ankaios server on 'http://127.0.0.1:25551'. What should be done is to reconnect the ank get events or making sure the server is up and running when doing the event subscription. Because, currently stest are sporadically failing because of this.
tests/stests/workloads/reject_state_with_cyclic_dependencies.robot
Outdated
Show resolved
Hide resolved
|
@GabyUnalaq: I will take over this branch. Would be great if you can do a quick review later if I am finished. |
Issues: #450
Definition of Done
The PR shall be merged only if all items mentioned in CONTRIBUTING.md have been followed. In case an item is not applicable as described, please provide a short explanation in the description.