quiesce needs to keep track of blueprint ids #8919

davepacheco · 2025-08-27T00:40:56Z

This is the first part of #8859. This PR adds the logic to keep track of this. Once we have db_metadata_nexus records (currently #8845), the last bit of 8859 will be to update those records whenever this value changes.

This is still a work in progress. I need to add some new tests and also put this into omdb.

davepacheco · 2025-08-27T22:44:47Z

Some sample output from the new omdb, run against cargo xtask omicron-dev run-all:

Initial state:

$ cargo run --bin=omdb -- --dns-server=[::1]:64971 nexus quiesce show
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 1.17s
     Running `target/debug/omdb '--dns-server=[::1]:64971' nexus quiesce show`
note: Nexus URL not specified.  Will pick one from DNS.
note: using Nexus URL http://[::1]:12221
running normally (not quiesced, not quiescing)
saga quiesce:
    new sagas: Allowed
    drained as of blueprint: none
    blueprint for last recovery pass: none
    blueprint for last reassignment pass: none
    reassignment generation: 1 (pass running: no)
    recovered generation: 1
    recovered at least once successfully: yes
    sagas running: 0
database connections held: 0

Enabled blueprint execution:

$ ./target/debug/omdb --dns-server=[::1]:64971 nexus blueprints target enable current -w
note: Nexus URL not specified.  Will pick one from DNS.
note: using Nexus URL http://[::1]:12221
set target blueprint c144655e-ab2e-4a4e-aa21-b7f1f70e4620 to enabled

$ cargo run --bin=omdb -- --dns-server=[::1]:64971 nexus quiesce show
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 1.14s
     Running `target/debug/omdb '--dns-server=[::1]:64971' nexus quiesce show`
note: Nexus URL not specified.  Will pick one from DNS.
note: using Nexus URL http://[::1]:12221
running normally (not quiesced, not quiescing)
saga quiesce:
    new sagas: Allowed
    drained as of blueprint: none
    blueprint for last recovery pass: c144655e-ab2e-4a4e-aa21-b7f1f70e4620
    blueprint for last reassignment pass: c144655e-ab2e-4a4e-aa21-b7f1f70e4620
    reassignment generation: 1 (pass running: no)
    recovered generation: 1
    recovered at least once successfully: yes
    sagas running: 0
database connections held: 0

Created a demo saga:

$ cargo run --bin=omdb -- --dns-server=[::1]:64971 nexus quiesce show
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 1.13s
     Running `target/debug/omdb '--dns-server=[::1]:64971' nexus quiesce show`
note: Nexus URL not specified.  Will pick one from DNS.
note: using Nexus URL http://[::1]:12221
running normally (not quiesced, not quiescing)
saga quiesce:
    new sagas: Allowed
    drained as of blueprint: none
    blueprint for last recovery pass: c144655e-ab2e-4a4e-aa21-b7f1f70e4620
    blueprint for last reassignment pass: c144655e-ab2e-4a4e-aa21-b7f1f70e4620
    reassignment generation: 1 (pass running: no)
    recovered generation: 1
    recovered at least once successfully: yes
    sagas running: 1
        saga 2bb35305-bd26-4476-b259-718bdb20b53a pending since 2025-08-27T22:40:42.260Z (demo)
database connections held: 0

Start quiescing:

$ cargo run --bin=omdb -- --dns-server=[::1]:64971 nexus quiesce start -w
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 1.14s
     Running `target/debug/omdb '--dns-server=[::1]:64971' nexus quiesce start -w`
note: Nexus URL not specified.  Will pick one from DNS.
note: using Nexus URL http://[::1]:12221
quiescing since 2025-08-27T22:40:58.983Z (0s ago)
details: waiting for running sagas to finish
saga quiesce:
    new sagas: DisallowedQuiesce
    drained as of blueprint: none
    blueprint for last recovery pass: c144655e-ab2e-4a4e-aa21-b7f1f70e4620
    blueprint for last reassignment pass: c144655e-ab2e-4a4e-aa21-b7f1f70e4620
    reassignment generation: 1 (pass running: no)
    recovered generation: 1
    recovered at least once successfully: yes
    sagas running: 1
        saga 2bb35305-bd26-4476-b259-718bdb20b53a pending since 2025-08-27T22:40:42.260Z (demo)
database connections held: 0

Complete the demo saga:

$ cargo run --bin=omdb -- --dns-server=[::1]:64971 nexus quiesce show
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 1.15s
     Running `target/debug/omdb '--dns-server=[::1]:64971' nexus quiesce show`
note: Nexus URL not specified.  Will pick one from DNS.
note: using Nexus URL http://[::1]:12221
quiesced since 2025-08-27T22:41:18.609Z (5s 411ms ago)
    waiting for sagas took 19s 626ms
    waiting for db quiesce took 0s
    recording quiesce took 0s
    total quiesce time: 19s 626ms
saga quiesce:
    new sagas: DisallowedQuiesce
    drained as of blueprint: c144655e-ab2e-4a4e-aa21-b7f1f70e4620
    blueprint for last recovery pass: c144655e-ab2e-4a4e-aa21-b7f1f70e4620
    blueprint for last reassignment pass: c144655e-ab2e-4a4e-aa21-b7f1f70e4620
    reassignment generation: 1 (pass running: no)
    recovered generation: 1
    recovered at least once successfully: yes
    sagas running: 0
database connections held: 0

jgallagher · 2025-08-28T15:30:42Z

nexus/types/src/quiesce.rs

+    /// whether a saga recovery operation is ongoing, and if one is:
+    /// - what `reassignment_generation` was when it started
+    /// - which blueprint id we'll be fully caught up to upon completion
+    #[serde(skip)] // XXX-dap


XXX here because we don't want to skip this field?

Yes -- sorry I missed that! This is a problem because we don't support a tuple in this context in the OpenAPI spec. I will replace it with a struct.

jgallagher · 2025-08-28T15:31:56Z

nexus/types/src/quiesce.rs

-            }
+            };
+
+            q.latch_drained_blueprint_id();


Is it correct to latch this even if quiescing is false?

Yes, the function checks that.

edit: to be precise, it is not correct to latch the value in this case. The function latch_drained_blueprint_id is intended to be called at any time and will only latch the state if appropriate, and it checks that. Is there a better name for that?

latch_blueprint_id_if_drained() maybe?

davepacheco · 2025-09-02T17:13:40Z

I've had to force push this for the same reason as in #8875. The diff-of-diffs is virtually empty -- nothing notable changed in the sync-up. @jgallagher do you want to take another look? (I don't think it's necessary but it's up to you!)

smklein · 2025-09-02T19:19:15Z

nexus/types/src/quiesce.rs

+        assert_eq!(qq.fully_drained_blueprint(), Some(blueprint3_id));
+        assert!(qq.is_fully_drained());
+
+        // Fully drained case 3: quiescing itself causes us to immediately


Suggested change

// Fully drained case 3: quiescing itself causes us to immediately

// Fully drained case 4: quiescing itself causes us to immediately

Could some of these cases become different, smaller tests?

Nice catch on the comment.

Could some of these cases become different, smaller tests?

I think this is easier said than done. It's definitely possible but I'm worried it'd be quite a lot more code, with a lot of duplication, that'd be easier to get wrong. That's because depending how fine-grained you look at it, it's currently testing about 12 behaviors (basically, each commented hunk (between blank lines) is a behavior I'm talking about). Many of those depend on complex state set up by the previous tests. This would wind up duplicated.

Subjectively, although I know it can be a little annoying when you first see a new assertion failure in a test like this, I still prefer the single straight-line test that reflects a realistic sequence to a bunch of separate-but-pretty-overlapping cases. I just wind up with higher confidence that we've covered everything.

The specific one you commented on is probably the easiest to fork off into a new test because it starts from a fresh handle. But then it'd be the only one not in the same test 🤷

smklein · 2025-09-02T19:38:10Z

nexus/types/src/quiesce.rs

@@ -353,7 +475,10 @@ impl SagaQuiesceHandle {
                "recovery_start() called twice without intervening \
                 recovery_done() (concurrent calls to recover()?)",
            );
-            q.recovery_pending = Some(q.reassignment_generation);
+            q.recovery_pending = Some(PendingRecovery {


(only semi-related to this PR)

I'm noticing that the pub async fn recover really needs that "only-one-caller-at-a-time" property to not break preconditions, trip assertions, etc. But the signature is &self, so Rust would be happy to allow concurrent calls.

WDYT about making it act on a &mut self API? Would this be too onerous? it would help ensure that future usage of API also cannot be done concurrently

Same question for pub async fn reassign_sagas

I like this idea, but worry it will be pretty onerous. If this ends up being owned by a &Nexus or whatever similar top-level object it might be pretty painful; those end up cloned and shared I think? Worth double-checking though.

We explored this in an earlier iteration and ran into trouble with the recovery code path holding multiple &mut references to itself (one for its own purposes, and one for needing one to call recover()). That seems to no longer be true. I was surprised that this does compile (leaving out the test diffs, which are mechanical):

diff --git a/nexus/src/app/background/tasks/saga_recovery.rs b/nexus/src/app/background/tasks/saga_recovery.rs index d9f2ed00f..5e56b986b 100644 --- a/nexus/src/app/background/tasks/saga_recovery.rs +++ b/nexus/src/app/background/tasks/saga_recovery.rs @@ -199,7 +199,7 @@ impl<N: MakeSagaContext> BackgroundTask for SagaRecovery<N> { async { // We don't need the future that's returned by activate_internal(). // That's only used by the test suite. - let _ = self.inner.activate_internal(opctx, &self.quiesce).await; + let _ = self.inner.activate_internal(opctx, &mut self.quiesce).await; serde_json::to_value(&self.inner.status).unwrap() } .boxed() @@ -237,7 +237,7 @@ impl<N: MakeSagaContext> SagaRecoveryInner<N> { async fn activate_internal( &mut self, opctx: &OpContext, - quiesce: &SagaQuiesceHandle, + quiesce: &mut SagaQuiesceHandle, ) -> Option<( BoxFuture<'static, Result<(), Error>>, nexus_saga_recovery::LastPassSuccess, @@ -743,7 +743,7 @@ mod test { ); let Some((completion_future, last_pass_success)) = - task.inner.activate_internal(&opctx, &task.quiesce).await + task.inner.activate_internal(&opctx, &mut task.quiesce).await else { panic!("saga recovery failed"); }; @@ -821,7 +821,7 @@ mod test { ); let Some((_, last_pass_success)) = - task.inner.activate_internal(&opctx, &task.quiesce).await + task.inner.activate_internal(&opctx, &mut task.quiesce).await else { panic!("saga recovery failed"); }; diff --git a/nexus/types/src/quiesce.rs b/nexus/types/src/quiesce.rs index b336cd037..d7f59edff 100644 --- a/nexus/types/src/quiesce.rs +++ b/nexus/types/src/quiesce.rs @@ -454,7 +454,7 @@ impl SagaQuiesceHandle { // because it's harder to mis-use (e.g., by forgetting to call // `recovery_done()`). But we keep the other two functions around because // it's easier to write tests against those. - pub async fn recover<F, T>(&self, f: F) -> T + pub async fn recover<F, T>(&mut self, f: F) -> T where F: AsyncFnOnce(&SagaRecoveryInProgress) -> (T, bool), { @@ -468,7 +468,7 @@ impl SagaQuiesceHandle { /// /// Only one of these may be outstanding at a time. The caller must call /// `saga_recovery_done()` before starting another one of these. - fn recovery_start(&self) -> SagaRecoveryInProgress { + fn recovery_start(&mut self) -> SagaRecoveryInProgress { self.inner.send_modify(|q| { assert!( q.recovery_pending.is_none(),

But this doesn't guarantee what you'd think it would based on the signature. The SagaQuiesceHandle is just a handle. It's freely cloneable and all clones refer to the same underlying state. So you can absolutely call recover concurrently from different code paths.

Maybe a safer way to enforce this would be to create a few distinct wrappers: one for saga recovery (which only exposes the recovery stuff, and uses a &mut), one for re-assignment (which only exposes the re-assignment stuff, which could get the same treatment), and one for saga creation (which only exposes the methods for that). For this to work, none of these (including the underlying handle) could be cloneable. Instead, you construct them all at once and get exactly one of each, which we then pass to the right place. This might work but it's not a small amount of work. (I don't think this is a big practical risk because there is only one non-test code path that calls recover and it's in a singleton context by design.)

smklein · 2025-09-02T19:42:00Z

nexus/types/src/quiesce.rs

    /// Returns whether sagas are fully drained
    ///
    /// This condition is not permanent.  New sagas can be re-assigned to this
    /// Nexus.
-    pub fn is_fully_drained(&self) -> bool {
+    fn is_fully_drained(&self) -> bool {


Is it possible this returns true if we have never tried to reassign sagas?

I don't think so; it checks for first_recovery_complete which should be false if we've never tried, right?

but recovery is a distinct step from re-assignment, right? Someone could call the pub async fn recover without calling pub async fn reassign_sagas

@smklein That's correct (your statement is correct, and I believe it's also the correct behavior). Think of "is_fully_drained()" as "there are no sagas running and none will ever run again unless there's a subsequent re-assignment". (It is not: "there are no sagas running and none will ever run again". This is what we spent so much time in RFD 588 trying to achieve but found it at odds with the idea that "but you might need to expunge a Nexus at any point". If you want this condition, you really want fully_drained_blueprint().)

…off-quiesce-2

davepacheco marked this pull request as ready for review August 27, 2025 22:44

davepacheco requested a review from jgallagher August 27, 2025 22:45

jgallagher reviewed Aug 28, 2025

View reviewed changes

jgallagher approved these changes Aug 28, 2025

View reviewed changes

davepacheco added 12 commits September 2, 2025 08:46

update quiesce states to reflect RFD 588

6dcdd51

self-review + regenerate API spec

330c721

tests need to wait for sagas to be enabled

b273427

need to activate blueprint loader after inserting initial blueprint

845d371

add the "second" Nexus to the test suite blueprint; fix omdb tests

78ee24f

review feedback

8af16b6

fix tests on GNU/Linux

48483d0

fix end to end dns test

cdfafb0

fix omdb test

f0a31b8

add test that Nexus quiesces when reading a blueprint saying so

d2b1f68

fixup conflict

3ce8d59

pull in BlueprintBuilder.set_nexus_generation()

6c84ded

davepacheco force-pushed the dap/handoff-quiesce-1 branch from dc85999 to 6c84ded Compare September 2, 2025 16:23

davepacheco added 6 commits September 2, 2025 09:57

quiesce needs to keep track of blueprint ids

93baf41

add test

6d5e952

is_fully_drained() can be more private

1ecea95

update omdb

a1c52ba

omdb output tweaks

791886a

review feedback

6fa9d9d

davepacheco force-pushed the dap/handoff-quiesce-2 branch from 73fff49 to 6fa9d9d Compare September 2, 2025 17:07

smklein reviewed Sep 2, 2025

View reviewed changes

jgallagher approved these changes Sep 2, 2025

View reviewed changes

davepacheco added 2 commits September 2, 2025 15:53

review feedback

1317dd7

review feedback

f04c429

Merge branch 'dap/handoff-quiesce-1' into dap/handoff-quiesce-2

405943b

Base automatically changed from dap/handoff-quiesce-1 to main September 3, 2025 04:20

davepacheco added 2 commits September 3, 2025 08:16

Merge commit '9eade0677ea09aeda4e807d15d86f3a3e6622976' into dap/hand…

b569f09

…off-quiesce-2

Merge commit 'b11266905429b319220414f08af1cce902e30c48' into dap/hand…

2d61323

…off-quiesce-2

davepacheco merged commit 32f0348 into main Sep 3, 2025
17 checks passed

davepacheco deleted the dap/handoff-quiesce-2 branch September 3, 2025 22:44

	// Fully drained case 3: quiescing itself causes us to immediately
	// Fully drained case 4: quiescing itself causes us to immediately

quiesce needs to keep track of blueprint ids #8919

quiesce needs to keep track of blueprint ids #8919

Uh oh!

Conversation

davepacheco commented Aug 27, 2025

Uh oh!

davepacheco commented Aug 27, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davepacheco Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davepacheco commented Sep 2, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jgallagher Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

davepacheco Aug 28, 2025 •

edited

Loading

jgallagher Sep 2, 2025 •

edited

Loading