Skip to content

feat: add Kubernetes operator and instance settings YAML editor#7836

Merged
rubenfiszel merged 22 commits intomainfrom
rf/iac
Feb 14, 2026
Merged

feat: add Kubernetes operator and instance settings YAML editor#7836
rubenfiszel merged 22 commits intomainfrom
rf/iac

Conversation

@rubenfiszel
Copy link
Contributor

@rubenfiszel rubenfiszel commented Feb 6, 2026

Summary

Adds a Kubernetes operator for declarative instance configuration (IaC) and a YAML editor to the instance settings UI for bulk viewing/editing all settings.

Changes

Backend — Kubernetes Operator (windmill-operator crate)

  • New windmill-operator crate with CRD (WindmillInstance), reconciler, and db_sync modules
  • CRD uses fully typed structs (not opaque serde_json::Value) for schema validation
  • Supports secretKeyRef for sensitive fields (pulls values from Kubernetes Secrets)
  • CLI: windmill operator (run controller) / windmill operator crd (print CRD YAML)
  • Feature flag: operator (part of ee_core meta-feature)

Backend — Unified Instance Config

  • New windmill-common/src/instance_config.rs with shared types, diff, and apply logic
  • Bulk GET/PUT API endpoints: GET/PUT /api/settings/instance_config
  • skip_worker_configs query parameter on PUT to skip worker config processing from the UI
  • jwt_secret and min_keep_alive_version hidden from API responses and config exports
  • automate_username_creation default changed to true to match UI default
  • Pre-write hooks (run_setting_pre_write_hook) shared by per-key and bulk endpoints
  • set_global_setting_internal refactored to call run_setting_pre_write_hook (deduplicated side-effect logic)

Backend — CLI Fixes

  • Tracing subscriber initialized before early-return CLI paths (sync-config, operator) so tracing::info!/tracing::error! calls are not silently dropped
  • CRD printing moved to windmill_operator::print_crd_yaml() helper
  • Fixed initial_connection() import path in operator branch

CLI — wmill instance get-config

  • New wmill instance get-config command to dump the current instance config as YAML
  • Supports -o, --output-file <file> to write to a file instead of stdout
  • Calls GET /api/settings/instance_config via the generated API client

Frontend — Instance Settings YAML Editor

  • YAML toggle in the superadmin drawer header to switch between form and YAML views
  • YAML toggle also available in the Advanced Setup page (/user/instance_settings?mode=full)
  • Sensitive value masking (__SENSITIVE_AND_UNCHANGED__) for passwords, tokens, secrets
    • Only masks values unchanged from initial load; modified values show actual content
    • "Show sensitive values" toggle to reveal masked fields
  • Diff mode: shows inline YAML diff (initial vs current) before saving
  • Save flow: YAML toggle → Show diff → Confirm & Save (bulk PUT)
  • Discard button resets all changes to initial state
  • Per-category dirty tracking and save/discard (form mode, per-setting API)
  • Unsaved changes guard on drawer close with confirmation modal
  • Fixed oauths field ordering in YAML diff (sorted alphabetically instead of appended at end)

Frontend — Instance Settings Page (/user/instance_settings)

  • Continue button saves settings in all modes (full/YAML/wizard) before proceeding
  • In YAML mode, syncs editor content to form state before bulk-saving
  • In full (advanced) mode, bulk-saves all settings before switching back to wizard
  • yamlMode reset when switching from full mode back to wizard (fixes Back button)

Frontend — Drawer Layout

  • Sidebar navigation with grouped categories (Core, Authentication, Infrastructure, Monitoring, Advanced)
  • Version display in drawer title area
  • "Admins workspace" link in sidebar footer
  • Sidebar hidden in YAML/diff modes

Documentation & Dev Environment

  • Updated CLAUDE.md with REMOTE env var for local dev and Playwright MCP testing instructions
  • Example deployment configs in examples/deploy/instance-config-iac/

Test plan

  • Frontend npm run check passes (0 errors)
  • Backend cargo check --features operator passes
  • Playwright MCP: login, drawer open, sidebar navigation, YAML toggle, form restore, discard, Secret Storage tab — all verified
  • No console errors from instance settings code (only pre-existing CE/dev artifacts)
  • Manual: save settings in form mode, verify persistence
  • Manual: save settings in YAML mode, verify bulk PUT
  • Manual: test sensitive field masking round-trip
  • Manual: wmill instance get-config outputs valid YAML
  • Manual: YAML toggle and Continue save in Advanced Setup page
  • Manual: Back button works after Continue from YAML mode

Generated with Claude Code

Introduces a new `windmill-operator` crate that enables declarative
management of Windmill instance configuration via a Kubernetes
`WindmillInstance` CRD. The operator watches CRD resources and performs
full declarative sync of global_settings and worker configs to the
database, supporting GitOps workflows for instance-level configuration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@claude
Copy link
Contributor

claude bot commented Feb 6, 2026

Claude finished @rubenfiszel's task in 2m 56s —— View job


Review: Add Kubernetes operator for declarative instance config

Overall: This is a well-structured first implementation of a Kubernetes operator for Windmill. The crate separation is clean, the CRD design is straightforward, and the reconciler follows standard kube-rs patterns. I have a few concerns around atomicity and operational edge cases.


Issues Found

1. Critical: Missing tracing initialization for operator subcommand (backend/src/main.rs:629-644)

The operator CLI path returns early before the tracing subscriber is initialized at line 758. All the tracing::info!/tracing::error! calls in the reconciler and db_sync modules will be silently dropped. The operator will only show the three println! messages. See inline comment

2. Important: No transaction wrapping for sync operations (backend/windmill-operator/src/db_sync.rs:22-55)

Both sync_global_settings and sync_worker_configs execute individual queries without a transaction. If the operator crashes mid-sync, the database can be left in a partially-applied state. Wrapping each sync function in a db.begin()/tx.commit() would ensure atomicity. See inline comment

3. Design: Multiple CRD instances can conflict (backend/windmill-operator/src/crd.rs:12-23)

The CRD is namespaced, but the operator syncs to a single global database. Multiple WindmillInstance resources (across namespaces or within one) would race to overwrite each other's settings. Consider making it cluster-scoped or documenting/enforcing a single-instance constraint. See inline comment

4. Design: Protected settings list (backend/windmill-operator/src/db_sync.rs:7-11)

The hardcoded PROTECTED_SETTINGS list may need license_key and could become stale as new internal settings are added. See inline comment


Positive Observations

  • Clean crate structure -- the operator is well-isolated with clear module boundaries (crd.rs, db_sync.rs, reconciler.rs)
  • Feature gating -- properly gated behind the operator feature flag, included in ee_core
  • Status subresource -- good use of observedGeneration and lastSyncedAt for observability
  • Error handling -- the reconciler properly updates CRD status on failures and uses appropriate requeue intervals (30s for errors, 60s for unhandled, 300s for drift detection)
  • Drift detection -- the 5-minute periodic re-sync is a good default for catching out-of-band changes
  • Static CRD YAML in manifests for users who prefer kubectl apply over generating from binary

Minor Notes

  • The LIKE 'worker__%' pattern in db_sync.rs:69 is consistent with existing usage in windmill-api/src/configs.rs:59 -- the SQL _ wildcard is intentional and matches the established convention
  • No new dependencies beyond what's already in the workspace (kube, k8s-openapi, schemars are all workspace deps)
  • The hand-written CRD YAML in manifests/crd.yaml should stay in sync with the derived CRD from windmill operator crd -- consider adding a CI check or removing the static file in favor of the generated one

Testing Instructions

This feature adds a Kubernetes operator mode to the Windmill binary. To test:

  1. Build with cargo build --features operator
  2. Run windmill operator crd to print the CRD YAML and verify it's valid
  3. Apply the CRD to a Kubernetes cluster with kubectl apply
  4. Create a WindmillInstance custom resource in a namespace with global_settings and worker_configs fields
  5. Run windmill operator with DATABASE_URL pointing to a Windmill Postgres database
  6. Verify that the global_settings and config tables in the database reflect the CRD spec
  7. Modify the CRD and verify the operator reconciles within seconds (for spec changes) or within 5 minutes (for drift)

Note: This feature is not accessible through the Windmill web UI -- it is a separate CLI mode (windmill operator) designed for Kubernetes-native infrastructure-as-code deployments.

- wmi
scope: Namespaced
versions:
- name: v1alpha1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure the CRD YAML manifest remains in sync with the CustomResource definition in code.

tracing::info!("Synced global setting: {key}");
}

// Delete settings not in desired (except protected keys)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider batching deletion queries for global settings and worker configs to improve performance with many keys.

let generation = instance.metadata.generation.unwrap_or(0);

// Sync global settings
if let Err(e) = db_sync::sync_global_settings(&ctx.db, &instance.spec.global_settings).await {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed requeue delays on errors; consider implementing exponential backoff to avoid constant rapid retries on persistent failures.

"futures",
"k8s-openapi",
"kube",
"schemars 0.8.22",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo check: The dependency string "schemars 0.8.22" looks unusual as it combines the package name with a version number. Please verify if it is correctly formatted or if it should be just "schemars".

if sub_arg == "crd" {
use kube::CustomResourceExt;
let crd = windmill_operator::crd::WindmillInstance::crd();
println!("{}", serde_yml::to_string(&crd).unwrap());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential typo: You are using serde_yml::to_string here, which might be a typo if the intended crate is serde_yaml. Please verify if this is correct.

Comment on lines 22 to 55
// Fetch current settings from DB
let current_rows: Vec<(String,)> = sqlx::query_as("SELECT name FROM global_settings")
.fetch_all(db)
.await?;
let current_keys: HashSet<String> = current_rows.into_iter().map(|(name,)| name).collect();

// Upsert desired settings
for (key, value) in desired {
sqlx::query(
"INSERT INTO global_settings (name, value) VALUES ($1, $2) \
ON CONFLICT (name) DO UPDATE SET value = EXCLUDED.value, updated_at = now()",
)
.bind(key)
.bind(value)
.execute(db)
.await?;
tracing::info!("Synced global setting: {key}");
}

// Delete settings not in desired (except protected keys)
let desired_keys: HashSet<&str> = desired.keys().map(|s| s.as_str()).collect();
for current_key in &current_keys {
if !desired_keys.contains(current_key.as_str())
&& !PROTECTED_SETTINGS.contains(&current_key.as_str())
{
sqlx::query("DELETE FROM global_settings WHERE name = $1")
.bind(current_key)
.execute(db)
.await?;
tracing::info!("Deleted global setting not in CRD: {current_key}");
}
}

Ok(())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug risk: No transactional guarantee for the sync operation.

The upsert and delete operations are executed as individual queries without a transaction. If the operator crashes or the DB connection drops mid-sync, the state can be left partially applied -- e.g., some settings upserted but stale ones not yet deleted, or vice versa.

Consider wrapping the entire sync in a transaction:

Suggested change
// Fetch current settings from DB
let current_rows: Vec<(String,)> = sqlx::query_as("SELECT name FROM global_settings")
.fetch_all(db)
.await?;
let current_keys: HashSet<String> = current_rows.into_iter().map(|(name,)| name).collect();
// Upsert desired settings
for (key, value) in desired {
sqlx::query(
"INSERT INTO global_settings (name, value) VALUES ($1, $2) \
ON CONFLICT (name) DO UPDATE SET value = EXCLUDED.value, updated_at = now()",
)
.bind(key)
.bind(value)
.execute(db)
.await?;
tracing::info!("Synced global setting: {key}");
}
// Delete settings not in desired (except protected keys)
let desired_keys: HashSet<&str> = desired.keys().map(|s| s.as_str()).collect();
for current_key in &current_keys {
if !desired_keys.contains(current_key.as_str())
&& !PROTECTED_SETTINGS.contains(&current_key.as_str())
{
sqlx::query("DELETE FROM global_settings WHERE name = $1")
.bind(current_key)
.execute(db)
.await?;
tracing::info!("Deleted global setting not in CRD: {current_key}");
}
}
Ok(())
let mut tx = db.begin().await?;
// Fetch current settings from DB
let current_rows: Vec<(String,)> = sqlx::query_as("SELECT name FROM global_settings")
.fetch_all(&mut *tx)
.await?;
let current_keys: HashSet<String> = current_rows.into_iter().map(|(name,)| name).collect();
// Upsert desired settings
for (key, value) in desired {
sqlx::query(
"INSERT INTO global_settings (name, value) VALUES ($1, $2) \
ON CONFLICT (name) DO UPDATE SET value = EXCLUDED.value, updated_at = now()",
)
.bind(key)
.bind(value)
.execute(&mut *tx)
.await?;
tracing::info!("Synced global setting: {key}");
}
// Delete settings not in desired (except protected keys)
let desired_keys: HashSet<&str> = desired.keys().map(|s| s.as_str()).collect();
for current_key in &current_keys {
if !desired_keys.contains(current_key.as_str())
&& !PROTECTED_SETTINGS.contains(&current_key.as_str())
{
sqlx::query("DELETE FROM global_settings WHERE name = $1")
.bind(current_key)
.execute(&mut *tx)
.await?;
tracing::info!("Deleted global setting not in CRD: {current_key}");
}
}
tx.commit().await?;
Ok(())

Same concern applies to sync_worker_configs below.

Comment on lines 7 to 11
const PROTECTED_SETTINGS: &[&str] = &[
"ducklake_user_pg_pwd",
"ducklake_settings",
"custom_instance_pg_databases",
];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider expanding the protected list or making it configurable.

A few questions:

  1. Should license_key be protected? Accidentally deleting it via CRD could break the EE instance.
  2. The list is hardcoded -- if new internal settings are added later, this list could become stale silently. Consider adding a comment documenting when/how this list should be updated, or alternatively using a naming convention (e.g., prefix-based) to auto-detect internal settings.

Comment on lines +49 to +52
async fn reconcile(
instance: Arc<WindmillInstance>,
ctx: Arc<Context>,
) -> Result<Action, kube::Error> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return type Result<Action, kube::Error> means DB errors from db_sync are caught but converted to status updates. This is a reasonable pattern, but note that kube::Error is returned for status update failures -- meaning if the DB sync fails AND the status update also fails, the original DB error is swallowed and only the kube error propagates. Consider logging the original error more prominently or using a custom error type that wraps both.

Comment on lines 59 to 72
// Sync global settings
if let Err(e) = db_sync::sync_global_settings(&ctx.db, &instance.spec.global_settings).await {
tracing::error!("Failed to sync global settings for {name}: {e:#}");
update_status(
&ctx.client,
&instance,
false,
format!("Error syncing global settings: {e}"),
generation,
)
.await?;
// Requeue after 30s on error
return Ok(Action::requeue(Duration::from_secs(30)));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial sync issue: If sync_global_settings succeeds but sync_worker_configs fails, the global settings are persisted but the status is marked as failed. On the next reconciliation, sync_global_settings will re-run (no-op in practice since it's idempotent), which is fine. But the status message only mentions "worker configs" -- the user might not know global settings were actually applied. Consider noting which phase succeeded in the status message.

Also, since global_settings writes are not transactional (see db_sync comment), a partial global_settings write followed by a worker_configs failure could leave the system in an inconsistent state.

Comment on lines +24 to +28
// Verify the CRD is installed by attempting to list
let api: Api<WindmillInstance> = Api::all(client.clone());
api.list(&Default::default()).await.map_err(|e| {
anyhow::anyhow!("Failed to list WindmillInstance CRDs. Is the CRD installed? Error: {e}")
})?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: CRD verification lists all resources across all namespaces. For large clusters with many WindmillInstance resources (unlikely but possible), this initial list call fetches all of them. This is standard for kube-rs controllers, but worth noting. An alternative would be a lighter check like a dry-run or API discovery call, though this is fine for v1alpha1.

.await?;

// Periodic re-sync every 5 minutes for drift detection
Ok(Action::requeue(Duration::from_secs(300)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5-minute re-sync interval: This is reasonable for drift detection, but worth documenting that this means the operator may take up to 5 minutes to detect out-of-band changes to global_settings or config tables. If someone modifies settings via the UI, the operator will overwrite them within 5 minutes. This is the expected behavior for a declarative operator, but it should be clearly documented for users.

Comment on lines +12 to +23
#[derive(CustomResource, Deserialize, Serialize, Clone, Debug, JsonSchema)]
#[kube(
group = "windmill.dev",
version = "v1alpha1",
kind = "WindmillInstance",
namespaced,
shortname = "wmi",
status = "WindmillInstanceStatus",
printcolumn = r#"{"name":"Synced","type":"string","jsonPath":".status.synced"}"#,
printcolumn = r#"{"name":"Last Synced","type":"date","jsonPath":".status.lastSyncedAt"}"#,
printcolumn = r#"{"name":"Age","type":"date","jsonPath":".metadata.creationTimestamp"}"#
)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Consider adding doc attribute to the CRD derive.

The CRD is namespaced which means each namespace can have its own WindmillInstance. Since this operator connects to a single database, having multiple WindmillInstance resources across namespaces (or even within the same namespace) could cause conflicts -- they'd all try to sync to the same global_settings table and overwrite each other.

Consider either:

  1. Making the CRD cluster-scoped (remove namespaced) since instance config is inherently global, or
  2. Adding validation/documentation that only one WindmillInstance should exist per database, or
  3. Adding a label selector or name filter in the controller to only watch a specific instance

@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Feb 6, 2026

Deploying windmill with  Cloudflare Pages  Cloudflare Pages

Latest commit: ce64a4a
Status: ✅  Deploy successful!
Preview URL: https://727c7d37.windmill.pages.dev
Branch Preview URL: https://rf-iac.windmill.pages.dev

View logs

Comment on lines 629 to 645
#[cfg(feature = "operator")]
"operator" => {
let sub_arg = std::env::args().nth(2).unwrap_or_default();
if sub_arg == "crd" {
use kube::CustomResourceExt;
let crd = windmill_operator::crd::WindmillInstance::crd();
println!("{}", serde_yml::to_string(&crd).unwrap());
return Ok(());
}

println!("Starting Windmill Kubernetes operator...");
println!("Connecting to database...");
let db = windmill_common::initial_connection().await?;
println!("Database connected. Starting controller...");
windmill_operator::run(db).await?;
return Ok(());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracing setup is skipped for the operator subcommand. The operator returns early at line 644, before the tracing subscriber is initialized (line 758). This means the operator will only get println! output but none of the structured tracing::info!/tracing::error! calls in reconciler.rs and db_sync.rs will be visible.

Consider initializing tracing before the operator starts, similar to how it's done for the main flow. At minimum, a basic subscriber should be set up.

- 9 unit tests for CRD serialization, deserialization, metadata, and
  status field behavior
- 15 integration tests for db_sync using #[sqlx::test] with full
  declarative sync coverage: upsert, delete, protected keys,
  idempotency, worker config prefix handling, and end-to-end sync

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
.expect("sync should succeed");

for key in &protected_keys {
assert!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assert message using "Protected key {key} should not be deleted" won’t interpolate the variable. Consider using a proper format string, e.g. "Protected key {} should not be deleted", key.

#[test]
fn crd_generation_produces_valid_yaml() {
let crd = WindmillInstance::crd();
let yaml = serde_yml::to_string(&crd).expect("CRD should serialize to YAML");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using 'serde_yml' here is unusual; typically the crate is 'serde_yaml'. Verify if this naming is intentional.

"schemars 0.8.22",
"serde",
"serde_json",
"serde_yml",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo alert: The crate name is specified as "serde_yml"; most likely you intended to use "serde_yaml". Please confirm and correct if necessary.

schemars = "0.8"

[dev-dependencies]
serde_yml.workspace = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: It appears that the dependency name 'serde_yml' may be misspelled. Did you mean 'serde_yaml.workspace = true'? Please confirm the correct package name.

…idation

GlobalSettings, SmtpSettings, IndexerSettings, and WorkerGroupConfig now
have explicit typed fields with serde(flatten) catch-all for forward
compatibility. The generated CRD YAML includes a full OpenAPI v3 schema
that Kubernetes validates on kubectl apply.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
name: v1alpha1
schema:
openAPIV3Schema:
description: Auto-generated derived type for WindmillInstanceSpec via `CustomResource`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure the CRD description and properties remain in sync with the Rust docs for WindmillInstanceSpec.

pub extra: BTreeMap<String, serde_json::Value>,
}

impl GlobalSettings {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review to_settings_map: flattened extra fields may override explicit fields if keys conflict.

default: {}
description: Global settings to sync to the `global_settings` table.
properties:
app_workspaced_route:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: The property name 'app_workspaced_route' may be misspelled. Did you mean 'app_workspace_route'?

base_url:
nullable: true
type: string
bunfig_install_scopes:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: The property name 'bunfig_install_scopes' looks like it might be a misspelling. Consider reviewing its intended spelling.

#[serde(skip_serializing_if = "Option::is_none")]
pub monitor_logs_on_s3: Option<bool>,
#[serde(skip_serializing_if = "Option::is_none")]
pub app_workspaced_route: Option<bool>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in field name: "app_workspaced_route". Consider renaming it to "app_workspace_route" for clarity.

#[serde(skip_serializing_if = "Option::is_none")]
pub npm_config_registry: Option<String>,
#[serde(skip_serializing_if = "Option::is_none")]
pub bunfig_install_scopes: Option<String>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible typo in field name: "bunfig_install_scopes". Verify if this should be "bunfig_install_scopes" or something like "bun_config_install_scopes".

Replace most remaining serde_json::Value fields in WindmillInstance CRD
with properly typed structs derived from the codebase:
- oauths: BTreeMap<String, OAuthClient>
- otel: OtelSettings
- otel_tracing_proxy: OtelTracingProxySettings with ScriptLang enum
- critical_error_channels: Vec<CriticalErrorChannel> (untagged enum)
- critical_alerts_on_db_oversize: DbOversizeAlert
- ducklake_settings: DucklakeSettings with nested catalog/storage types
- custom_instance_pg_databases: CustomInstancePgDatabases
- autoscaling (worker config): AutoscalingConfig with integration struct
- custom_tags, default_tags_workspaces: Vec<String>
- default_tags_per_workspace: bool

Still opaque (serde_json::Value): object_store_cache_config (kube-core
can't generate schemas for internally-tagged enums), secret_backend
(EE-private), slack, teams (no clear struct definitions).

Regenerated CRD YAML with full OpenAPI schema (352→703 lines).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
minimum: 0.0
nullable: true
type: integer
integration:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider enforcing allowed values for the autoscaling integration's type field (e.g. 'script', 'dryrun', 'kubernetes') by using an enum or validation in the schema.

/// The `type` field selects the backend: `"script"`, `"dryrun"`, or `"kubernetes"`.
/// For `"script"`, `path` is required and `tag` is optional.
#[derive(Deserialize, Serialize, Clone, Debug, Default, JsonSchema)]
pub struct AutoscalingIntegration {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For AutoscalingIntegration, consider replacing the raw string field for integration_type with an enum to enforce valid backend types and reduce typos.

#[serde(skip_serializing_if = "Option::is_none")]
pub teams: Option<serde_json::Value>,

/// Catch-all for settings not yet covered by typed fields.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of a flattened extra field in GlobalSettings is useful for forward compatibility; however, consider namespacing these keys to avoid collisions with future typed fields.

}

#[test]
fn autoscaling_script_integration_roundtrips() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent round-trip tests for autoscaling and ducklake settings. Consider adding tests for error cases (e.g. missing required fields in AutoscalingIntegration) to further ensure robustness.

Deno,
Go,
Bash,
Powershell,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: In the 'ScriptLang' enum, consider using 'PowerShell' (with an uppercase S) instead of 'Powershell' for consistency with common naming conventions.

Mysql,
Bigquery,
Snowflake,
Graphql,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: In the 'ScriptLang' enum, 'Graphql' may be intended as 'GraphQL' (with 'QL' capitalized). Please verify.

Bigquery,
Snowflake,
Graphql,
Nativets,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: The enum variant 'Nativets' looks unusual. If this is meant to represent native TypeScript (or similar), please double-check the spelling.

// DuckLake
// ---------------------------------------------------------------------------

/// DuckLake catalog database settings.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistency: The comment refers to 'DuckLake' (with an uppercase L) while the struct names use 'Ducklake'. Consider standardizing the capitalization for clarity.

Move all typed settings (GlobalSettings, WorkerGroupConfig, etc.) from
windmill-operator/crd.rs into windmill-common/instance_config.rs so both
the API server and operator share a single source of truth. Add diff/apply
logic (Merge mode for UI, Replace mode for operator) and InstanceConfig::from_db().

Add GET/PUT /settings/instance_config endpoints so the frontend loads all
settings in 1 call instead of 42, and saves with a single bulk PUT. The
backend handles the diff internally, running pre-write hooks for changed keys.

Refactor windmill-operator/db_sync.rs to use the shared diff+apply functions
and slim crd.rs down to the CRD wrapper with re-exports.

Includes 32 unit tests and 30 integration tests covering serialization,
diff logic, DB roundtrips, protected settings, and edge cases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
}

let nvalues = JSON.parse(JSON.stringify(initialValues))
if (nvalues['base_url'] == undefined) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate base_url fallback: There are two consecutive "if (nvalues['base_url'] == undefined)" blocks (lines 73 and 79). Consider consolidating into one, or clarifying the intended fallback order.

APP_WORKSPACED_ROUTE_SETTING => {
let serde_json::Value::Bool(workspaced_route) = value else {
return Err(error::Error::BadRequest(format!(
"{} setting Expected to be boolean",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: In the error message, the word 'Expected' appears capitalized. Consider rephrasing to "{} setting expected to be boolean" for consistency.


if !duplicate_app.is_empty() {
tracing::error!(
"Cannot disable {} setting as duplicate app with custom path were found: {:?}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: The error message reads "duplicate app with custom path were found". Consider rephrasing to "duplicate apps with custom paths were found" to fix the subject-verb agreement.

pub critical_alert_mute_ui: Option<bool>,
#[serde(skip_serializing_if = "Option::is_none")]
pub monitor_logs_on_s3: Option<bool>,
#[serde(skip_serializing_if = "Option::is_none")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a possible typo with the field name app_workspaced_route. It may have been intended to be app_workspace_route.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
if (oauths && Object.keys(oauths).length > 0) {
obj['oauths'] = oauths
}
if (requirePreexistingUserForOauth) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider always including the 'require_preexisting_user_for_oauth' key in the YAML output even when its value is false. Omitting it may be confusing when a user expects to see an explicit false flag.

}
$values = parsed
yamlError = ''
return true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the catch block of syncYamlToForm, consider not exposing raw exception details directly to the user. This may leak internal error information. A more generic error message or sanitizing the exception details might be preferable.


{#if viewMode === 'yaml' && !hideTabs}
<div class="border rounded w-full" style="min-height: 600px;">
{#await import('$lib/components/SimpleEditor.svelte')}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dynamic import of SimpleEditor lacks a catch block. Consider adding a {:catch} handler to gracefully handle module load failures for better user feedback.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
</script>

<div class="pb-12">
<!-- svelte-ignore a11y_label_has_associated_control -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removal of the {#if !hideTabs} check now always renders the toggle button group. Confirm that this change is intentional, as the hideTabs prop is no longer respected here.

</ToggleButtonGroup>
</div>

{#if viewMode === 'yaml'}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conditional for rendering the YAML editor is changed from {#if viewMode === 'yaml' && !hideTabs} to {#if viewMode === 'yaml'}. Verify that this behavior is desired, especially when hideTabs is true.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@@ -15,7 +15,10 @@
import AuthSettings from './AuthSettings.svelte'
import InstanceSetting from './InstanceSetting.svelte'
import { writable, type Writable } from 'svelte/store'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace ToggleButtonGroup/ToggleButton imports with a single Toggle import. Ensure the new Toggle supports all needed features.

@@ -111,6 +108,12 @@
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update condition in saveSettings to use 'yamlMode' boolean instead of string 'viewMode'.

@@ -249,11 +249,98 @@
function openSmtpSettings() {
tab = 'SMTP'
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switch state variable from viewMode ('form'|'yaml') to a boolean yamlMode; verify all references update accordingly.

sendUserToast('Invalid YAML: ' + e, true)
return false
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored handleYamlToggle to use a boolean 'checked' parameter; ensure sync logic (form ↔ YAML) works as expected.

</script>

<div class="pb-12">
<!-- svelte-ignore a11y_label_has_associated_control -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated UI to use the Toggle component with an on:change event; confirm that the event emits a boolean (via e.detail) as expected.

/>
</div>

{#if yamlMode}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conditional rendering now checks yamlMode instead of viewMode; verify that the YAML editor displays only in YAML mode.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
let yamlCode = $state('')
let yamlEditor: SimpleEditor | undefined = $state(undefined)
let yamlError = $state('')
let showSensitive = $state(false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a short comment explaining that 'showSensitive' toggles the display of actual sensitive values in the YAML editor.

let yamlError = $state('')
let showSensitive = $state(false)

const SENSITIVE_UNCHANGED = '__SENSITIVE_AND_UNCHANGED__'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document the purpose of the SENSITIVE_UNCHANGED constant and the sensitiveKeys set to clarify why these values are masked.

.map((s) => s.key)
)

function maskSensitive(obj: Record<string, any>): Record<string, any> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In maskSensitive, note that only one level (with special handling for 'oauths') is masked. If nested sensitive fields are introduced later, consider making the masking recursive.

}

function buildYamlObject(): Record<string, any> {
const obj: Record<string, any> = { ...$values }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The buildYamlObject function now returns either the raw object or a masked version based on 'showSensitive'. Ensure that all sensitive fields are included in sensitiveKeys.

return false
}

// Restore sensitive values that were not changed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restoring the 'oauths' provider secrets when the secret field equals the placeholder is good—just verify that this correctly handles cases where a user might intentionally edit these fields.

}

// Restore unchanged sensitive settings
for (const key of sensitiveKeys) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop restoring unchanged sensitive settings (using sensitiveKeys) always resets the field to the current $values. Confirm that this merge behavior is intended even if a user might try to clear a field.

}
}

function handleShowSensitiveToggle(checked: boolean) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The handleShowSensitiveToggle function syncs YAML to form before and after toggling. Consider handling potential YAML parsing errors here to avoid unexpected state if the YAML is invalid.

</div>

{#if yamlMode}
<div class="flex items-center justify-end mb-2">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new UI toggle for 'Show sensitive values' in YAML mode looks well integrated. Ensure the label and accessibility attributes are sufficient for all users.

rubenfiszel and others added 2 commits February 7, 2026 12:11
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…exports

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
];

/// Internal settings that are never exposed via the API or included in config exports.
pub const HIDDEN_SETTINGS: &[&str] = &["uid", "rsa_keys", "jwt_secret", "min_keep_alive_version"];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added 'jwt_secret' and 'min_keep_alive_version' to HIDDEN_SETTINGS to ensure these sensitive keys are not exposed via API/config exports. Confirm related functions (e.g. from_db) correctly filter them.


const SENSITIVE_UNCHANGED = '__SENSITIVE_AND_UNCHANGED__'

const sensitiveKeys: Set<string> = new Set(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed explicit inclusion of 'jwt_secret' from sensitiveKeys so that the UI masking stays in sync with backend hidden settings. The sensitive keys are now solely derived from settings definitions.

rubenfiszel and others added 4 commits February 14, 2026 15:30
Allow sensitive fields (license_key, hub_api_secret, scim_token,
smtp_password, OAuthClient.secret, custom PG user_pwd) to reference
Kubernetes Secrets via the standard secretKeyRef pattern instead of
inlining values as plaintext YAML. The reconciler resolves all refs
by reading K8s Secrets before syncing to the database.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@rubenfiszel rubenfiszel changed the title Add Kubernetes operator for declarative instance config feat: add Kubernetes operator and instance settings YAML editor Feb 14, 2026
rubenfiszel and others added 4 commits February 14, 2026 21:59
# Conflicts:
#	CLAUDE.md
#	backend/Cargo.toml
#	backend/src/main.rs
#	backend/src/monitor.rs
#	backend/windmill-api-users/src/users.rs
#	backend/windmill-api-workspaces/src/workspaces.rs
#	backend/windmill-common/Cargo.toml
#	frontend/src/lib/components/InstanceSetting.svelte
#	frontend/src/lib/components/InstanceSettings.svelte
#	frontend/src/lib/components/SuperadminSettings.svelte
#	frontend/src/lib/components/SuperadminSettingsInner.svelte
Resolve merge conflicts from origin/main, fix duplicate
UV_INDEX_STRATEGY_SETTING import, and add Playwright MCP
testing instructions to CLAUDE.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@rubenfiszel
Copy link
Contributor Author

/ai-fast re-review entirely this PR

@claude
Copy link
Contributor

claude bot commented Feb 14, 2026

Claude finished @rubenfiszel's task in 2m 29s —— View job


Re-review of PR #7836: Kubernetes operator and instance settings YAML editor

I've read all 37 changed files in this PR. Here's a comprehensive review.


Architecture Assessment

The PR is well-structured with a clean separation of concerns:

  1. windmill-common/src/instance_config.rs (~1900 lines) — Single source of truth for all typed instance config structs, diff/apply logic, and DB I/O. Both the API server and operator share this module.
  2. windmill-operator/ — Clean crate with 4 modules: crd.rs (CRD derive), db_sync.rs (thin wrapper calling shared diff logic), reconciler.rs (kube-rs controller), resolve.rs (K8s secret resolution).
  3. windmill-api-settings/src/lib.rs — Bulk GET/PUT /settings/instance_config endpoints using the shared diff/apply logic.
  4. Frontend — YAML editor with sensitive value masking, diff view, discard, and unsaved changes guard.

The design decision to share diff/apply logic between the operator (ApplyMode::Replace) and the UI (ApplyMode::Merge) via a common module is sound and avoids logic duplication.


Issues

1. Tracing not initialized for operator subcommandbackend/src/main.rs:633-648

The operator branch returns early at line 648, before the tracing subscriber is set up later in main(). All tracing::info!/tracing::error! calls in reconciler.rs and db_sync.rs will be silently dropped. The operator will only show the 3 println! messages.

The sync-config subcommand at line 613 has the same issue — it also returns before tracing init.

Consider initializing a basic tracing subscriber at the top of main() or before the early-return CLI paths. Fix this →

2. No transaction wrapping for sync operationsinstance_config.rs:773-823

apply_settings_diff and apply_configs_diff execute individual queries on the pool without a transaction. If the operator or API server crashes mid-sync, the database can be left in a partially-applied state (e.g., some settings upserted but stale ones not yet deleted). Consider wrapping each in db.begin()/tx.commit().

This is particularly relevant for the operator's ApplyMode::Replace path where deletions follow upserts — a crash between these phases could leave orphaned settings.

3. Duplicated side-effect logic in set_global_setting_internal and run_setting_pre_write_hookwindmill-api-settings/src/lib.rs:261-382 and 386-477

The run_setting_pre_write_hook function (lines 386-477) is a near-exact copy of the logic inside set_global_setting_internal (lines 276-360). The three match arms for AUTOMATE_USERNAME_CREATION_SETTING, CRITICAL_ALERT_MUTE_UI_SETTING, and APP_WORKSPACED_ROUTE_SETTING are duplicated verbatim, including the inner struct definitions.

Consider refactoring set_global_setting_internal to call run_setting_pre_write_hook instead of inlining the same logic. Fix this →

4. extra field collision risk in GlobalSettings and WorkerGroupConfiginstance_config.rs:288 and 678

Both structs use #[serde(flatten)] pub extra: BTreeMap<String, serde_json::Value> as a catch-all. If a user puts a key in extra that matches a future typed field name, deserialization could produce unexpected results (the typed field wins, the extra entry is silently ignored on the next roundtrip). This is a known serde(flatten) pitfall. It's fine for v1alpha1, but worth documenting.

5. to_settings_map serializes StringOrSecretRef as objects for unresolved refsinstance_config.rs:293-302

When the operator resolves secret refs before calling to_settings_map(), this is fine — all refs become literals. But if to_settings_map() is called on unresolved settings (e.g., from the from_db path), any StringOrSecretRef::SecretRef values would serialize as {"secretKeyRef": {...}} objects into the DB, which the rest of Windmill wouldn't understand.

Currently this can't happen because from_db reads plain strings from the DB (which deserialize as Literal), and the operator always resolves before syncing. But there's no compile-time guarantee — adding a debug assertion or type-level guard would prevent accidental misuse.


Frontend Review

6. YAML editor yamlCode binding and editor stateInstanceSettings.svelte:643-649

The syncYamlToForm() function manually reads from the editor via yamlEditor?.getCode() to bypass the 200ms debounce in SimpleEditor. This is correct but fragile — if SimpleEditor's internal API changes, this could silently use stale code. A comment documenting this coupling would help.

7. handleShowSensitiveToggle double-syncInstanceSettings.svelte:730-735

This calls syncYamlToForm() then syncFormToYaml(). If the YAML is invalid (e.g., user is mid-edit), the first call shows an error toast and returns false, but the function continues to set showSensitive and re-sync anyway. Consider returning early if the first sync fails:

function handleShowSensitiveToggle(checked: boolean) {
    if (!syncYamlToForm()) return
    showSensitive = checked
    syncFormToYaml()
}

8. Unsaved changes detection in YAML modeInstanceSettings.svelte:759-767

In YAML mode, dirty state is tracked by comparing yamlCode !== yamlCodeInitial (string comparison of YAML). In form mode, it uses structured per-category comparison. This means trivially different YAML (e.g., trailing newline, key reordering) could trigger a false "unsaved changes" warning in YAML mode. This is probably acceptable UX, but worth noting.

9. Drawer close guard works wellSuperadminSettings.svelte:51-66

The bypassCloseCheck pattern for intercepting both intentional closes and click-away/escape closes is clean and handles the edge cases correctly.

10. YAML-excluded keysInstanceSettings.svelte:553-557

custom_instance_pg_databases, ducklake_settings, ducklake_user_pg_pwd are excluded from YAML export. These are also in PROTECTED_SETTINGS on the backend. Good consistency. The excludedKeys are preserved during syncYamlToForm() (line 694-698) so they survive YAML round-trips.


Operator Review

11. Reconciler partial syncreconciler.rs:98-125

If sync_global_settings succeeds but sync_worker_configs fails, global settings are persisted but the status is marked as failed. On next reconciliation, sync_global_settings will re-run (idempotent since values match). The status message correctly identifies which phase failed ("Error syncing worker configs"). This is reasonable behavior.

12. CRD is namespaced but syncs to a global DBcrd.rs:27

Multiple WindmillInstance resources across namespaces would race to overwrite each other. The Api::all() in the reconciler (line 27 of reconciler.rs) watches all namespaces, so all instances trigger reconciliation. For v1alpha1 this is fine if documented.

13. Secret caching in resolve.rsresolve.rs:29

The cache is per-reconciliation (scoped to the resolve_secret_refs call). This means if the same secret is referenced by multiple fields, it's only fetched once per reconciliation cycle. Good.

14. 5-minute drift detection intervalreconciler.rs:138

Action::requeue(Duration::from_secs(300)) means out-of-band changes (e.g., via UI) will be overwritten within 5 minutes. This is the expected declarative operator behavior and is documented in the PR description.


Test Coverage

The test coverage is thorough:

  • instance_config.rs unit tests (~950 lines): Cover serialization round-trips, diff logic edge cases, StringOrSecretRef variants (literal, secret ref, env ref), env ref resolution, protected settings, empty/identical diffs
  • tests/instance_config.rs (integration): DB-level tests for from_db, apply_settings_diff, apply_configs_diff, full round-trips
  • tests/operator_db_sync.rs (integration): Tests for sync_global_settings and sync_worker_configs including protected key preservation
  • crd.rs unit tests: CRD YAML generation, schema validation, secret ref deserialization

Minor Notes

  • serde_yml (not serde_yaml) is used — this is intentional. serde_yml is a maintained fork, used as a workspace dependency in Cargo.lock. Previous review comments flagging this as a typo are incorrect.
  • The LIKE 'worker__%' SQL pattern uses SQL wildcards intentionally (each _ matches one character), consistent with existing usage in windmill-api/src/configs.rs.
  • Feature gating: operator feature properly gated in Cargo.toml, included in ee_core, and #[cfg(feature = "operator")] used in main.rs.
  • The sync-config CLI subcommand (line 613-631 of main.rs) provides a non-K8s path for IaC — useful for Docker Compose deployments with the included example config.

Summary

This is a solid implementation. The shared instance_config module is well-designed and the diff/apply abstraction (ApplyMode::Merge vs ApplyMode::Replace) cleanly separates UI and operator concerns. The main items to address are:

Priority Issue Location
High Tracing not initialized for operator/sync-config CLI paths main.rs:613-648
Medium No transaction wrapping in apply_settings_diff/apply_configs_diff instance_config.rs:773-823
Medium Duplicated side-effect logic windmill-api-settings/src/lib.rs:261-477
Low handleShowSensitiveToggle should return early on parse failure InstanceSettings.svelte:730

rubenfiszel and others added 3 commits February 14, 2026 22:27
…fects

Initialize tracing subscriber before early-return CLI paths (sync-config,
operator) so tracing calls are not silently dropped. Refactor
set_global_setting_internal to call run_setting_pre_write_hook instead of
duplicating the side-effect logic.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Dumps the current instance config (global settings + worker configs) as
YAML. Supports --output-file to write to a file instead of stdout.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@rubenfiszel
Copy link
Contributor Author

/ai-fast re-review entirely this PR

@claude
Copy link
Contributor

claude bot commented Feb 14, 2026

Claude finished @rubenfiszel's task in 3m 43s —— View job


Comprehensive Re-review of PR #7836

  • Read backend operator crate (crd.rs, db_sync.rs, reconciler.rs, resolve.rs)
  • Read shared instance config module (instance_config.rs)
  • Read API settings changes (windmill-api-settings/src/lib.rs)
  • Read main.rs CLI integration
  • Read frontend changes (InstanceSettings.svelte, SuperadminSettings.svelte, etc.)
  • Read tests (instance_config.rs, operator_db_sync.rs)
  • Read CLI changes (instance.ts)
  • Check if previous review issues were addressed
  • Write comprehensive review

Previous Review Issues — Status

Issue Status Notes
Tracing not initialized for operator/sync-config Fixed Both paths now call tracing_subscriber::fmt::init() before their logic (main.rs:614, main.rs:642)
Duplicated side-effect logic Fixed set_global_setting_internal now calls run_setting_pre_write_hook — single source of truth
No transaction wrapping for sync operations Not addressed apply_settings_diff and apply_configs_diff still execute individual queries without a transaction
Multiple CRD instances can conflict Not addressed CRD is still namespaced; multiple instances would race. Acceptable for v1alpha1 if documented
handleShowSensitiveToggle should early-return on parse failure Not addressed Still calls syncFormToYaml() even if syncYamlToForm() returns false

Architecture

The overall architecture is well-designed:

  1. windmill-common/src/instance_config.rs (~1900 lines) — Single source of truth for typed config structs, diff/apply logic, and DB I/O. Shared by API server and operator.
  2. windmill-operator/ — Clean crate with 4 modules: crd.rs (CRD derive), db_sync.rs (thin wrapper calling shared diff logic), reconciler.rs (kube-rs controller), resolve.rs (K8s secret resolution).
  3. windmill-api-settings/src/lib.rs — Bulk GET/PUT /settings/instance_config endpoints using the shared diff/apply logic.
  4. Frontend — YAML editor with sensitive value masking, diff view, discard, and unsaved changes guard.

The ApplyMode::Merge (UI) vs ApplyMode::Replace (operator) abstraction cleanly separates UI and operator concerns without duplicating logic.


Issues Found

1. ResolveError::InvalidUtf8 is defined but never raisedresolve.rs:12-13 and resolve.rs:111-113

The ResolveError::InvalidUtf8 variant exists (line 13) but is never used. In fetch_secret_value, non-UTF-8 values are silently filtered out with .filter_map(|(k, v)| String::from_utf8(v.0).ok()...) (line 113). If a Kubernetes Secret key exists but its value is invalid UTF-8, the user gets the misleading KeyNotFound error rather than the intended InvalidUtf8 error.

Consider checking the specific requested key before returning KeyNotFound, or collecting with proper error handling:

let decoded: BTreeMap<String, String> = data
    .into_iter()
    .map(|(k, v)| {
        String::from_utf8(v.0)
            .map(|s| (k.clone(), s))
            .map_err(|_| ResolveError::InvalidUtf8 {
                secret: secret_ref.name.clone(),
                key: k,
            })
    })
    .collect::<Result<_, _>>()?;

2. No transaction wrapping in apply_settings_diff / apply_configs_diffinstance_config.rs:773-823

These functions execute individual queries on the pool without a transaction. If the operator or API server crashes mid-sync, the database can be left in a partially-applied state. This is particularly relevant for the operator's ApplyMode::Replace path where deletions follow upserts — a crash between these phases could leave orphaned settings.

3. handleShowSensitiveToggle doesn't early-return on parse failureInstanceSettings.svelte:730-735

function handleShowSensitiveToggle(checked: boolean) {
    syncYamlToForm()   // returns false on parse error
    showSensitive = checked
    syncFormToYaml()
}

If the YAML is invalid (user mid-edit), syncYamlToForm() returns false and shows an error toast, but the function continues to toggle showSensitive and re-sync. This could overwrite the user's partially-edited YAML with a re-serialization of stale form state. Should return early:

function handleShowSensitiveToggle(checked: boolean) {
    if (!syncYamlToForm()) return
    showSensitive = checked
    syncFormToYaml()
}

4. automate_username_creation normalizationInstanceSettings.svelte:331-333

normalizeValue treats automate_username_creation: true as equivalent to "not set" (returns undefined). This means the setting won't appear in YAML export when true. Works because the default changed to true, but creates a fragile coupling between frontend normalization and backend default.


Positive Observations

  • Clean db_sync refactor: The operator's db_sync.rs is a thin ~50-line wrapper delegating entirely to the shared diff/apply functions. No duplication between operator and API paths.
  • Comprehensive test suite: ~950 lines of unit tests + integration tests in tests/instance_config.rs and tests/operator_db_sync.rs covering round-trips, diff edge cases, and protected settings.
  • Well-structured frontend: Sensitive masking with SENSITIVE_UNCHANGED placeholder, nested sensitive fields, excluded keys preserved during YAML round-trips, per-category dirty tracking in form mode.
  • Drawer UX: bypassCloseCheck pattern in SuperadminSettings.svelte correctly handles all close scenarios with unsaved changes guard and confirmation modal.
  • Feature gating: operator properly gated behind ee_core with #[cfg(feature = "operator")] in main.rs.
  • CLI additions: Both sync-config (non-K8s IaC) and wmill instance get-config are clean and well-implemented.
  • Secret caching in resolve.rs: Per-reconciliation cache — same Secret fetched once even if referenced by multiple fields.

Summary

Priority Issue Location
Medium InvalidUtf8 error variant unused; misleading KeyNotFound on bad UTF-8 resolve.rs:111-113
Medium No transaction wrapping in apply_settings_diff/apply_configs_diff instance_config.rs:773-823
Low handleShowSensitiveToggle should return early on parse failure InstanceSettings.svelte:730
Low automate_username_creation: true normalization coupling InstanceSettings.svelte:331

Note: PR was merged during the review. These items can be addressed in follow-up PRs.


| Branch: rf/iac

@rubenfiszel rubenfiszel merged commit 82e5f6d into main Feb 14, 2026
10 of 11 checks passed
@rubenfiszel rubenfiszel deleted the rf/iac branch February 14, 2026 22:42
@github-actions github-actions bot locked and limited conversation to collaborators Feb 14, 2026
@claude claude bot unlocked this conversation Feb 14, 2026
@rubenfiszel rubenfiszel restored the rf/iac branch February 15, 2026 07:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant