[34/n] sled-agent logic to clear mupdate overrides #8572

sunshowers · 2025-07-10T22:49:26Z

This PR implements logic within sled-agent to clear mupdate overrides. Includes tests, database storage, and displayers.

This logic by itself does not introduce behavior changes, since the code to actually set this field is in #8456.

Depends on:

Created using spr 1.3.6-beta.1

Created using spr 1.3.6-beta.1 [skip ci]

Created using spr 1.3.6-beta.1

Created using spr 1.3.6-beta.1 [skip ci]

Created using spr 1.3.6-beta.1

sunshowers · 2025-07-12T02:42:14Z

sled-agent/config-reconciler/src/reconciler_task.rs

+        // Reconcile the mupdate override field. This can be done independently
+        // of the other parts of reconciliation (and this doesn't have to block
+        // other parts of reconciliation), but the argument for this is somewhat
+        // non-trivial. Here's an outline:


Worth having a look at this.

I've moved this to https://rfd.shared.oxide.computer/rfd/0556#sa_reconciler_error_handling.

#8596) While working on #8572, we realized that we need to put in a condition saying that if any zones can't be switched over to Artifact, we must not clear the mupdate override field. This effectively requires information about eligibility to be available in two spots: while updating the mupdate override, and while doing these noop conversions. In order to do that more easily, this PR builds up a decision tree by sled.

Created using spr 1.3.6-beta.1 [skip ci]

Created using spr 1.3.6-beta.1

Created using spr 1.3.6-beta.1 [skip ci]

Created using spr 1.3.6-beta.1

Created using spr 1.3.6-beta.1 [skip ci]

Created using spr 1.3.6-beta.1

jgallagher

LGTM, just one nontrivial comment about the current internal disks.

jgallagher · 2025-07-21T16:09:39Z

nexus/types/src/inventory/display.rs

+                            writeln!(
+                                indent2,
+                                "error reading mupdate override, so sled agent was \
+                                 not instructed to clear it"


"not instructed to" or "didn't attempt to"?

"not instructed to" because the instruction comes from the planner.

But an error here comes from sled-agent, right? This is probably fine, it just feels slightly weird that sled-agent is making a claim about the behavior of something else that it kinda has to infer. Could we have a case like:

current ledgered config has no mupdate override and no clear mupdate override

sled is mupdated

sled comes up and fails to read mupdate override

inventory is collected

in this display, we claim the planner didn't instruct us to do something, but the planner hasn't even run at all since we've been mupdated; we're still running out of the old ledgered config (in which the planner hadn't instructed us to clear a mupdate override because there wasn't one)

Ah yeah, good point -- will change to "didn't attempt to". It's definitely a bit weird.

jgallagher · 2025-07-21T16:13:29Z

sled-agent/config-reconciler/src/reconciler_task.rs

+        let clear_mupdate_override =
+            if let Some(override_id) = sled_config.remove_mupdate_override {
+                let internal_disks =
+                    self.internal_disks_rx.wait_for_boot_disk().await;


This feels sketchy; if the boot disk has gone away for some reason this will block until it comes back (maybe forever). Can we instead grab current() here and have clear_mupdate_override() return an error if the boot disk isn't in the current set?

Maybe we should also rename this method wait_forever_for_boot_disk() to make it scarier sounding?

Yeah, wait_forever_for_boot_disk sounds reasonable to me.

Updated this.

jgallagher · 2025-07-21T17:04:35Z

sled-agent/zone-images/src/mupdate_override.rs

@@ -181,12 +350,136 @@ fn make_non_boot_info(
    }
 }

+fn clear_non_boot_disk(


This looks fine but oof it seems brutal. Do we already have an issue for "just use M2Slot::A for all ledgers and ignore boot/non-boot"?

Created using spr 1.3.6-beta.1

sunshowers added 5 commits July 10, 2025 22:49

[spr] initial version

edba3e1

Created using spr 1.3.6-beta.1

[spr] changes to main this commit is based on

b989ccd

Created using spr 1.3.6-beta.1 [skip ci]

updates

27bb944

Created using spr 1.3.6-beta.1

[spr] changes introduced through rebase

5e78b5f

Created using spr 1.3.6-beta.1 [skip ci]

write a very long comment

dc1a782

Created using spr 1.3.6-beta.1

sunshowers commented Jul 12, 2025

View reviewed changes

sunshowers changed the title ~~[wip] [30/n] sled-agent logic to clear and honor mupdate overrides~~ [wip] [31/n] sled-agent logic to clear and honor mupdate overrides Jul 15, 2025

sunshowers mentioned this pull request Jul 15, 2025

[29/n] [reconfigurator] separate out noop image source decision making #8596

Merged

sunshowers added 2 commits July 16, 2025 05:43

[spr] changes introduced through rebase

d2b71f7

Created using spr 1.3.6-beta.1 [skip ci]

add db logic and migrations

d2d0ada

Created using spr 1.3.6-beta.1

sunshowers changed the title ~~[wip] [31/n] sled-agent logic to clear and honor mupdate overrides~~ [wip] [30/n] sled-agent logic to clear mupdate overrides Jul 16, 2025

sunshowers changed the title ~~[wip] [30/n] sled-agent logic to clear mupdate overrides~~ [wip] [34/n] sled-agent logic to clear mupdate overrides Jul 17, 2025

sunshowers added 2 commits July 17, 2025 02:00

[spr] changes introduced through rebase

710b13d

Created using spr 1.3.6-beta.1 [skip ci]

add tests -- should be ready for review

f87c5de

Created using spr 1.3.6-beta.1

sunshowers changed the title ~~[wip] [34/n] sled-agent logic to clear mupdate overrides~~ [34/n] sled-agent logic to clear mupdate overrides Jul 17, 2025

sunshowers marked this pull request as ready for review July 17, 2025 02:02

sunshowers requested a review from jgallagher July 17, 2025 02:02

sunshowers added 2 commits July 17, 2025 21:02

[spr] changes introduced through rebase

f25c04c

Created using spr 1.3.6-beta.1 [skip ci]

openapi

d9c873b

Created using spr 1.3.6-beta.1

sunshowers changed the base branch from sunshowers/spr/main.wip-30n-sled-agent-logic-to-clear-and-honor-mupdate-overrides to main July 20, 2025 03:22

rebase on main

66a1213

Created using spr 1.3.6-beta.1

jgallagher reviewed Jul 21, 2025

View reviewed changes

sunshowers added 3 commits July 21, 2025 17:11

rebase on main

c7d42ac

Created using spr 1.3.6-beta.1

rebase + review feedback

f01d533

Created using spr 1.3.6-beta.1

rustfmt

4eee3d7

Created using spr 1.3.6-beta.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[34/n] sled-agent logic to clear mupdate overrides #8572

[34/n] sled-agent logic to clear mupdate overrides #8572

Uh oh!

sunshowers commented Jul 10, 2025 •

edited

Loading

Uh oh!

sunshowers Jul 12, 2025

Uh oh!

sunshowers Jul 17, 2025

Uh oh!

jgallagher left a comment

Uh oh!

jgallagher Jul 21, 2025

Uh oh!

sunshowers Jul 21, 2025

Uh oh!

jgallagher Jul 21, 2025

Uh oh!

sunshowers Jul 21, 2025

Uh oh!

jgallagher Jul 21, 2025

Uh oh!

sunshowers Jul 21, 2025

Uh oh!

sunshowers Jul 21, 2025

Uh oh!

jgallagher Jul 21, 2025

Uh oh!

sunshowers Jul 21, 2025

Uh oh!

Uh oh!

[34/n] sled-agent logic to clear mupdate overrides #8572

Are you sure you want to change the base?

[34/n] sled-agent logic to clear mupdate overrides #8572

Uh oh!

Conversation

sunshowers commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jgallagher left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sunshowers commented Jul 10, 2025 •

edited

Loading