Persist service registry by ismellike · Pull Request #1116 · Lay3rLabs/WAVS

ismellike · 2026-02-12T17:01:30Z

Introduces a ServiceRegistry that persists registered ServiceManager entries and their HD key derivation indices to a JSON file (service_registry.json) in the data directory. On startup, services are restored from the registry by re-fetching their definitions from the chain, re-storing components, and re-registering with all subsystems using their original HD indices.

ueco-jb

Few comments. Good work 👍

ueco-jb · 2026-02-13T12:53:19Z

packages/wavs/src/dispatcher.rs

+        // Allocate HD index from the registry (single source of truth for key derivation indices)
+        let hd_index = self.service_registry.append(service_manager)?;
+
+        self.add_service_direct(service.clone(), Some(hd_index))
+            .await?;


If add_service_direct would fail for any reason, the registry entry is already done. On next restart it would try to restore non existing service.

Great catch! Added a rollback here

ueco-jb · 2026-02-13T13:17:52Z

packages/wavs/src/dispatcher.rs

+                        // Store components
+                        if let Err(err) = self
+                            .engine_manager
+                            .store_components_for_service(&service)
+                            .await
+                        {
+                            tracing::warn!(
+                                "Failed to store components for restored service {}: {:?}",
+                                service.name,
+                                err
+                            );
+                        }


Should we just warn in this case?
I'm thinking from node's operator pov:

he enables persistent storage so quickly recover from restart

for whatever reason there would be a problem with DB, some services are stored some aren't; Their functionality is not affected and warning traces are easily omitted

operator restarts node and surprise - some services are restored, some aren't

My point is - I think that feature should be opt-in, but if enabled should just fail adding the new service entirely in case of error.
What do you think?

Yeah, looking back at this, I like the tracing error better too. If any part fails, we just log and skip.

I'm not sure I see the use-case for optional persistent storage atm (maybe a future PR?). If anything, we should probably add some retry mechanisms to improve reliability if we notice this is the case.

Copilot

Pull request overview

Introduces a persisted ServiceRegistry to retain registered ServiceManager entries and their signer HD derivation indices across node restarts, then restores services on startup by re-fetching definitions from chain and re-registering subsystems with the original HD indices.

Changes:

Add JSON-backed ServiceRegistry (service_registry.json) with append/remove + unit tests.
Update Dispatcher startup to restore services from the registry and allocate HD indices from it on registration/removal.
Adjust submission HD-index counter handling for explicit indices; update call sites for new add_service_direct(..., hd_index) signature.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
packages/wavs/src/service_registry.rs	New persisted registry implementation + tests.
packages/wavs/src/dispatcher.rs	Load registry, restore services at startup, allocate/remove HD indices via registry, update `add_service_direct` signature.
packages/wavs/src/subsystems/submission.rs	Ensure signer HD-index counter advances past explicitly assigned indices.
packages/wavs/src/http/handlers/service/add.rs	Update dev direct-add handler for new `add_service_direct` signature.
packages/wavs/src/lib.rs	Export `service_registry` module.
packages/wavs/tests/wavs_systems/mock_app.rs	Update test helper to pass new `add_service_direct` arg.
packages/wavs/tests/mock_e2e.rs	Update e2e test to pass new `add_service_direct` arg.
packages/wavs/tests/dispatcher_tests.rs	Update dispatcher test to pass new `add_service_direct` arg.
packages/wavs/benches/dev_triggers/setup.rs	Update bench setup to pass new `add_service_direct` arg.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-15T06:55:20Z

packages/wavs/src/subsystems/submission.rs

+        // This is a no-op for auto-incremented indices but critical for
+        // explicit indices during restoration from the service registry.
+        self.signing_mnemonic_hd_index_count
+            .fetch_max(hd_index + 1, std::sync::atomic::Ordering::SeqCst);


hd_index + 1 can overflow (u32::MAX) in debug builds (panic) and wrap in release builds. Even if unlikely, it’s easy to make this robust by using saturating_add(1)/checked_add(1) and handling the max-index case explicitly.

Suggested change

.fetch_max(hd_index + 1, std::sync::atomic::Ordering::SeqCst);

.fetch_max(hd_index.saturating_add(1), std::sync::atomic::Ordering::SeqCst);

Copilot · 2026-02-15T06:55:20Z

packages/wavs/src/service_registry.rs

+        let hd_index = *next;
+        entries.push(RegistryEntry {
+            service_manager: sm,
+            hd_index,
+        });
+        *next = hd_index + 1;
+
+        self.write_locked(&entries, *next)?;


*next = hd_index + 1 can overflow once next_hd_index reaches u32::MAX, which would wrap/panic depending on build settings. Consider using checked_add/saturating_add and returning a dedicated error when the HD index space is exhausted.

Copilot · 2026-02-15T06:55:20Z

packages/wavs/src/dispatcher.rs

+        // Allocate HD index from the registry (single source of truth for key derivation indices)
+        let hd_index = self.service_registry.append(service_manager.clone())?;
+


add_service calls service_registry.append(...), which performs synchronous filesystem I/O. Since this runs on the async request path, it can block Tokio worker threads. Consider moving registry writes to spawn_blocking, using async fs, or batching writes so service registration doesn’t block the runtime.

Copilot · 2026-02-15T06:55:20Z

packages/wavs/src/dispatcher.rs

+        self.remove_service_inner(id.clone())?;
+
+        // Remove from persistent registry
+        if let Some(sm) = service_manager {
+            self.service_registry.remove(&sm)?;
+        }
+


remove_service removes the service from in-memory/DB state before mutating the persistent registry. If service_registry.remove fails (I/O), this returns an error after the service is already removed, leaving the registry potentially stale and the call non-atomic. Consider removing from the registry first (or making registry removal best-effort with a warning), or add a compensating rollback.

Suggested change

self.remove_service_inner(id.clone())?;

// Remove from persistent registry

if let Some(sm) = service_manager {

self.service_registry.remove(&sm)?;

}

// First remove from persistent registry to avoid leaving it stale if this fails

if let Some(sm) = service_manager {

self.service_registry.remove(&sm)?;

}

// Then remove from in-memory/DB state

self.remove_service_inner(id.clone())?;

Copilot · 2026-02-15T06:55:20Z

packages/wavs/src/http/handlers/service/add.rs

 async fn add_service_direct_inner(state: HttpState, service_hash: String) -> HttpResult<()> {
    let service = get_service_inner_hash(&state, service_hash).await?;
-    state.dispatcher.add_service_direct(service).await?;
+    state.dispatcher.add_service_direct(service, None).await?;



handle_add_service_direct registers a service via add_service_direct(service, None) but does not append to the persisted ServiceRegistry, so services added through this endpoint won’t be restored after restart and won’t get stable HD indices. If this endpoint is meant to survive restarts, allocate an HD index via the registry and persist it (or clearly document that /dev/services/* is non-persistent).

Copilot · 2026-02-15T06:55:21Z

packages/wavs/src/service_registry.rs

+        let mut entries = self.entries.write().unwrap();
+        let mut next = self.next_hd_index.write().unwrap();
+
+        // Check for duplicates
+        if entries.iter().any(|e| e.service_manager == sm) {
+            return Err(RegistryError::AlreadyRegistered);
+        }
+
+        let hd_index = *next;
+        entries.push(RegistryEntry {
+            service_manager: sm,
+            hd_index,
+        });
+        *next = hd_index + 1;
+
+        self.write_locked(&entries, *next)?;
+        Ok(hd_index)
+    }
+
+    pub fn remove(&self, sm: &ServiceManager) -> Result<(), RegistryError> {
+        let mut entries = self.entries.write().unwrap();
+        let next = *self.next_hd_index.read().unwrap();
+
+        if let Some(pos) = entries.iter().position(|e| &e.service_manager == sm) {
+            entries.remove(pos);
+            self.write_locked(&entries, next)?;


append/remove hold the RwLock write guard while performing write_locked (filesystem I/O). This blocks other readers/writers for the duration of the disk write and can become problematic under contention. Consider cloning the updated state, dropping the lock before writing, and only committing the in-memory update after the file write succeeds (or otherwise handling write failures).

Suggested change

let mut entries = self.entries.write().unwrap();

let mut next = self.next_hd_index.write().unwrap();

// Check for duplicates

if entries.iter().any(|e| e.service_manager == sm) {

return Err(RegistryError::AlreadyRegistered);

}

let hd_index = *next;

entries.push(RegistryEntry {

service_manager: sm,

hd_index,

});

*next = hd_index + 1;

self.write_locked(&entries, *next)?;

Ok(hd_index)

}

pub fn remove(&self, sm: &ServiceManager) -> Result<(), RegistryError> {

let mut entries = self.entries.write().unwrap();

let next = *self.next_hd_index.read().unwrap();

if let Some(pos) = entries.iter().position(|e| &e.service_manager == sm) {

entries.remove(pos);

self.write_locked(&entries, next)?;

// First, take a snapshot of the new state without holding write locks during I/O.

let (new_entries, new_next, hd_index) = {

let entries = self.entries.read().unwrap();

// Check for duplicates using a read lock.

if entries.iter().any(|e| e.service_manager == sm) {

return Err(RegistryError::AlreadyRegistered);

}

let next = *self.next_hd_index.read().unwrap();

let mut new_entries = entries.clone();

let hd_index = next;

new_entries.push(RegistryEntry {

service_manager: sm,

hd_index,

});

let new_next = hd_index + 1;

(new_entries, new_next, hd_index)

};

// Persist the new state without holding any locks.

self.write_locked(&new_entries, new_next)?;

// After successful persistence, commit the new state in memory.

{

let mut entries = self.entries.write().unwrap();

let mut next = self.next_hd_index.write().unwrap();

*entries = new_entries;

*next = new_next;

}

Ok(hd_index)

}

pub fn remove(&self, sm: &ServiceManager) -> Result<(), RegistryError> {

// Take a snapshot of the updated entries without holding write locks during I/O.

let (new_entries, next_opt) = {

let entries = self.entries.read().unwrap();

let next = *self.next_hd_index.read().unwrap();

if let Some(pos) = entries.iter().position(|e| &e.service_manager == sm) {

let mut new_entries = entries.clone();

new_entries.remove(pos);

(Some(new_entries), Some(next))

} else {

(None, None)

}

};

// If there was nothing to remove, we're done.

if let (Some(new_entries), Some(next)) = (new_entries, next_opt) {

// Persist the new state without holding any locks.

self.write_locked(&new_entries, next)?;

// After successful persistence, commit the new entries in memory.

let mut entries = self.entries.write().unwrap();

*entries = new_entries;

Copilot · 2026-02-15T06:55:21Z

packages/wavs/src/service_registry.rs

+        tmp.flush()?;
+        tmp.persist(&self.path)?;
+


tmp.flush() doesn’t guarantee durability on crash/power loss; the registry file may be lost even though the write returned OK. If the goal is crash-safe persistence, consider calling sync_all() on the temp file before persist, and (on Unix) syncing the parent directory after the rename.

Suggested change

tmp.flush()?;

tmp.persist(&self.path)?;

// Ensure data is durably written to the temp file before renaming.

tmp.as_file_mut().sync_all()?;

tmp.persist(&self.path)?;

// On Unix, also sync the parent directory so the rename is durable.

#[cfg(unix)]

if let Some(parent) = self.path.parent() {

let dir_file = std::fs::File::open(parent)?;

dir_file.sync_all()?;

}

Copilot · 2026-02-15T06:55:21Z

packages/wavs/src/service_registry.rs

+    #[test]
+    fn empty_file_gives_empty_registry() {
+        let dir = TempDir::new().unwrap();
+        let reg = ServiceRegistry::load(dir.path()).unwrap();
+        assert!(reg.entries().is_empty());
+        assert_eq!(reg.next_hd_index(), 1);
+    }


The test name empty_file_gives_empty_registry is misleading: it tests the case where the registry file does not exist, not an empty file. Renaming it (or creating an actual empty file and asserting on the error/behavior) would make the intent clearer.

persist service registry

5926cae

ismellike requested review from dakom and ueco-jb as code owners February 12, 2026 17:01

ueco-jb reviewed Feb 13, 2026

View reviewed changes

ismellike added 2 commits February 13, 2026 09:58

roll back registry entry on failure

5250b3b

continue on store failure + tracing error

b64fba5

dakom requested a review from Copilot February 15, 2026 06:50

Copilot started reviewing on behalf of dakom February 15, 2026 06:50 View session

Copilot AI reviewed Feb 15, 2026

View reviewed changes

	.fetch_max(hd_index + 1, std::sync::atomic::Ordering::SeqCst);
	.fetch_max(hd_index.saturating_add(1), std::sync::atomic::Ordering::SeqCst);

		// Allocate HD index from the registry (single source of truth for key derivation indices)
		let hd_index = self.service_registry.append(service_manager.clone())?;

-        tmp.flush()?;
-        tmp.persist(&self.path)?;
+        // Ensure data is durably written to the temp file before renaming.
+        tmp.as_file_mut().sync_all()?;
+        tmp.persist(&self.path)?;
+        // On Unix, also sync the parent directory so the rename is durable.
+        #[cfg(unix)]
+        if let Some(parent) = self.path.parent() {
+            let dir_file = std::fs::File::open(parent)?;
+            dir_file.sync_all()?;
+        }

Conversation

ismellike commented Feb 12, 2026

Uh oh!

ueco-jb left a comment

Choose a reason for hiding this comment

Uh oh!

ueco-jb Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

ismellike Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

ueco-jb Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

ismellike Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ismellike Feb 13, 2026 •

edited

Loading