Skip to content

Report service download progress to remote backend#71

Open
pipex wants to merge 11 commits intomainfrom
state-report-and-service-download
Open

Report service download progress to remote backend#71
pipex wants to merge 11 commits intomainfrom
state-report-and-service-download

Conversation

@pipex
Copy link
Contributor

@pipex pipex commented Jan 28, 2026

This add a create_service task that initializes the service with Installing state. This allows the reporting module to identify installing services and report as Downloading if download_progress < 100

This PR also includes some fixes to make sure helios unstable keeps working with the legacy supervisor

  • Use docker-cli v28 for balena-engine compatibility
  • Fail parsing image uri if the image or tag are <none> (a valid result from docker). Ignore parsing failures when listing images
  • Make sure to always pass the raw target to the legacy supervisor even on serialization errors

@pipex pipex marked this pull request as draft January 28, 2026 19:40
@pipex pipex force-pushed the state-report-and-service-download branch 7 times, most recently from 8aa4d39 to c13e431 Compare February 4, 2026 17:37
@pipex pipex force-pushed the state-report-and-service-download branch from d51a484 to c13e431 Compare February 5, 2026 13:31
@pipex pipex requested a review from cywang117 February 5, 2026 14:25
@pipex pipex marked this pull request as ready for review February 5, 2026 14:25
engine_id: None,
config: Default::default(),
download_progress: 0,
download_progress: 100,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why initialize with 100% progress then set to 0% later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because this is the outcome that the planner will see and simulates a successful outcome of this task.

That is a part of the interface of Mahler that is not super easy to convey unfortunately :(


#[derive(Clone, Serialize, Default, Debug)]
#[serde(rename_all = "lowercase")]
pub enum UpdateStatus {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between this and the UpdateStatus in helios-state/src/seek.rs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the way it's serialized mostly. The one in seek is also returned by the API /v3/status and returns {"status": "applying changes"}, while this just serializes to a string. It might be worth unifying them at some point

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, the helios-state crate should remain agnostic to reporting (and probably the local API) so types can diverge safely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the state in seek.rs also includes any errors from an apply that might be useful to get from. the API

// which indicates an error condition. State changes during the request should not
// interrupt it - we'll report the new state after this request completes.
last_report = loop {
tokio::select! {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude said there may be a logic issue in this select!:

Here's what happens step by step:

  1. First iteration: both branches are active. select! races them.
  2. If state_rx.changed() returns Ok(()) — it matched, but the pattern Err(_) doesn't match, so the branch is completed and disabled.
  3. If report_future hasn't finished yet, on the next loop iteration state_rx.changed() is already consumed. Now only report_future is active.
  4. But: if state_rx.changed() returns Ok(()) and report_future completes in the same poll — both branches are disabled on the next iteration, and else fires. Then the
    inner loop repeats with both branches completed → else fires again → infinite busy-loop.

Even in the non-degenerate case, once state_rx.changed() fires Ok(()), that branch is disabled for all subsequent iterations of the inner loop. This means subsequent
channel closures won't be detected until report_future completes.

I can't make heads or tails of the control flow at the moment, but commenting in case there's some validity to this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's wrong. A watch channel (like state_rx) blocks on calls to changed(). Once changed() returns Ok, the value is marked as seen, meaning the next call to changed will block again until there is an error or a new value (see Receiver::changed()).

But: if state_rx.changed() returns Ok(()) and report_future completes in the same poll

I don't think this can happen either, futures in Rust don't complete until polled, so there is no such thing as both things happening at the same time.

The Service status is used to identify what status is the service in its
lifecycle. This commit also creates a service initialization task that
runs before image downloads. The status and early initialization should
allow the reporter module to report on downloading services by looking
at the service status and the image download progress.

Change-type: patch
@pipex pipex force-pushed the state-report-and-service-download branch 2 times, most recently from 6d01c0f to a12bf3c Compare February 6, 2026 15:05
pipex added 9 commits February 6, 2026 12:33
Create the state patch to send to the backend from the user app list.
This also fixes a bug with the reporting loop that would never send a
patch

Change-type: patch
This allows to identify bugs with the patch body

Change-type: patch
Tokio stream has much better utilities for working with streams
This prevents multiple reports, like during a download interrupting each
other which ends up in multiple patch cancellation on the logs without
any reports actually being sent.

The reports are limited by the min poll interval (10s) so on downloads
faster than that, no download progress will show on the dashboard.

Change-type: patch
This limit can be added in the future, but since the patch happens at
most every 10 seconds, is better to make sure it has the most accurate
progress for each image on long downloads.

Change-type: patch
Also downgrade the dind version used in integration tests.

Change-type: patch
Reading the list of images would fail if parsing an image URI failed,
however, the list can contain `<none>:<none>` tags, in which case we
want to ignore those errors.

This also includes the repo_digests list, deduplicating by repo and tag.

Change-type: patch
If the remote target cannot be processed during poll, only the
raw_target will be available and will be passed to the legacy supervisor
when available.

Change-type: patch
Uses mahler exception feature. Also skip reporting with userapps
disabled to prevent conflicting with the legacy supervisor.
@pipex pipex force-pushed the state-report-and-service-download branch from a12bf3c to 894b73a Compare February 6, 2026 15:34
@flowzone-app flowzone-app bot enabled auto-merge February 6, 2026 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants