Skip to content

Implement depth for special watches, again#19

Merged
last-genius merged 5 commits intoxapi-project:mainfrom
last-genius:asv/watch-depth-fix
Oct 20, 2025
Merged

Implement depth for special watches, again#19
last-genius merged 5 commits intoxapi-project:mainfrom
last-genius:asv/watch-depth-fix

Conversation

@last-genius
Copy link
Contributor

First commit is just 5e559b4, the second one fixes an issue that made oxenstored not fire watch events on domains shutting down. The last commit avoids sending multiple @releaseDomain events when several domains die at the same time.

Best reviewed with "hide whitespace".

This passed BVT (Thanks @robhoes!)

Add the tests checking that the documentation-specified behaviour
regarding special watches ("@introduceDomain" and "@releaseDomain") is
followed.

The handling is a bit tricky (and helpfully clarified by Juergen
additionally):

* @introduceDomain and @releaseDomain can only have depth=1 if it's
  specified. This is verified on adding a new watch. If depth=1, then
  the path reported in the watch event reply will contain the domid
  (as in "@introduceDomain/2").

* Additionally, "@releaseDomain/domid" can now be watched to receive a
  notification for a particular domain only. This watch does not accept
  a depth parameter at all.

A key difference that this new feature necessitated is that previously,
oxenstored would only issue a single @releaseDomain notification when
cleaning up after several domains in xenstored.ml:handle_eventchn. Now
that a single domid needs to be reported per notification, several
independent @releaseDomain can occur. I think this is a safe change, as
old clients would need to rescan the tree anyhow to find out how many
domains disappeared, the notification would not provide any additional
help in that regard.

Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
@last-genius
Copy link
Contributor Author

I've figured out how to check this in unit tests so I will now force push:

  1. Original commit without the fix + unit tests, to show that they fail in the CI
  2. Fix, to show that it makes the CI pass

@last-genius
Copy link
Contributor Author

The new unit test fails here (before the fix) because no watch events are created:

(* Domains in [1000; 2000] are considered shutdown on the first query *)
let _ = create_domU_conn cons doms 1001 in
(* Check that a @releaseDomain watch event is sent when a domain shuts down *)
one_loop_iteration () ;
let actual = Xenbus.Xb.unsafe_pop_output dom0.xb in
---
[exception] Stdlib.Queue.Empty

(https://github.com/xapi-project/oxenstored/actions/runs/18590934590/job/53005545095?pr=19)

@last-genius
Copy link
Contributor Author

The unit test passes after the fix :)

( if is_special_watch then
(* No depth can be specified for @releaseDomain/domid watches.
Only depth=1 can be specified for other special watches *)
match depth with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be cleaner to use match ... | .. when is_special_watch -> .. without the outer if?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it'd need to be added to each arm then, I'm not sure it's going to be cleaner.


let fire_spec_watch fire_generic domid =
(* Trigger @releaseDomain/domid watches as well *)
let specpaths =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have multiple predicates to recognise special paths? Would it make sense to lift them to an outer level?

Avoid daemonizing and doing some other things that are impossible in the
testing environment outside of xen.

Change Xenstored.main to return its loop body as a function so that unit tests
can step over it one iteration at a time. When not under testing, the
main loop is called as before.

Provide a mock paths.ml and change the mock unix_activations_minimal to
always return the eventchn.domexc port as pending so that Xenstored
checks for dead domains.

Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
Unit tests used to avoid calling Xenstored.main before, preferring to
call Process.process_packet to circumvent some hard-to-mock logic. As a
result, a few edge cases that can only be triggered from within the main
loop in xenstored.ml slipped through - hence the bug that required
45c16ec.

Now that xenstored.ml can handle being called in unit tests, we can
verify that the main loop handles shutdown and dead domains correctly,
sending the right number of correct special watch events when it needs
to.

Adds a hack to the mock domain_getinfo implementation, where all domids
> 2000 are always considered 'dying' and domains > 1000 are always
considered shutdown. Create a unit test that steps over the main loop in
xenstored.ml, verifying that the correct watch events are sent.

Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
…ng down

5e559b4 changed the code in xenstored.ml from this:

    if deaddom <> [] || notify then
      Connections.fire_spec_watches (Store.get_root store) cons
        Store.Path.release_domain

to this:

    if notify then
      List.iter
        (Connections.fire_spec_watches (Store.get_root store) cons
           Store.Path.release_domain
        )
        deaddom

which meant that it would not send any releaseDomain events in case of a
domain shutting down (which is tracked by a variable confusingly named "notify").
Turn this into a list of domains that have shutdown instead, sending
events for each.

This should hopefully fix the issues that required reverting this work (45c16ec)

Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
…ngle update

Before "@eventDomain/domid" watch events were added with the ability to
specify depth for watches (and watch "@eventDomain/domid" paths
specifically), oxenstored sent a single special watch event, relying on
clients to scan the whole tree to determine which domains changed state.

Now that these events are fired per domid, perform an optimization and
avoid sending "generic" watch events (watch events with the path
@eventDomain, without the domid) multiple times in a row.

Change fire_spec_watches to accept a list of domids to sent events for,
and only send a generic watch event for the first domid, with the rest
only producing specialized watch events.

Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
@last-genius last-genius merged commit 94863c0 into xapi-project:main Oct 20, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants