From @Eengineer1
Current hypothesis:
The failure is a cheqd localnet cold-start race, not a cache logic defect. CI starts services and then runs e2e immediately, while the cheqd container is still finishing startup/funding work. The first network read in CheqdDidRegistrar.createResource() can hit fetch failed before RPC is fully stable.
Evidence:
CI brings services up without an explicit readiness wait in continuous-integration.yml:124-152
cheqd localnet bootstrap still does async startup/funding after cheqd-noded start in docker-compose.yml:71-87
module init does fire-and-forget connect() in CheqdModule.ts:28-32
createResource() immediately does a DID resolve in CheqdDidRegistrar.ts:574-577
that resolve is a direct sdk.queryDidDoc() call with no retry in CheqdLedgerService.ts:174-176
Concrete patch plan:
Add a real cheqd readiness gate:
Update docker-compose.yml so cheqd reports healthy only after RPC is up and bootstrap/funding is complete.
Update CI in continuous-integration.yml to wait for service health before running e2e.
Add bounded retry/backoff around cheqd read operations:
In CheqdLedgerService.ts, wrap resolve(), resolveCollectionResources(), resolveResource(), and resolveResourceMetadata() with short retry logic for transient transport errors (fetch failed, connection reset/refused, timeout).
Keep retries small and targeted so real failures still surface quickly.
Optional hardening in resource creation path:
In CheqdDidRegistrar.ts, evaluate whether the pre-flight DID read can fall back to local didRecord when the DID was just created locally.
Treat this as secondary to readiness + retry.
Re-enable the cached cheqd anoncreds test after the above:
Restore cheqd-sdk-anoncreds-registry-cached.e2e.test.ts once CI is stable.
From @Eengineer1
Current hypothesis:
The failure is a cheqd localnet cold-start race, not a cache logic defect. CI starts services and then runs e2e immediately, while the cheqd container is still finishing startup/funding work. The first network read in CheqdDidRegistrar.createResource() can hit fetch failed before RPC is fully stable.
Evidence:
CI brings services up without an explicit readiness wait in continuous-integration.yml:124-152
cheqd localnet bootstrap still does async startup/funding after cheqd-noded start in docker-compose.yml:71-87
module init does fire-and-forget connect() in CheqdModule.ts:28-32
createResource() immediately does a DID resolve in CheqdDidRegistrar.ts:574-577
that resolve is a direct sdk.queryDidDoc() call with no retry in CheqdLedgerService.ts:174-176
Concrete patch plan:
Add a real cheqd readiness gate:
Update docker-compose.yml so cheqd reports healthy only after RPC is up and bootstrap/funding is complete.
Update CI in continuous-integration.yml to wait for service health before running e2e.
Add bounded retry/backoff around cheqd read operations:
In CheqdLedgerService.ts, wrap resolve(), resolveCollectionResources(), resolveResource(), and resolveResourceMetadata() with short retry logic for transient transport errors (fetch failed, connection reset/refused, timeout).
Keep retries small and targeted so real failures still surface quickly.
Optional hardening in resource creation path:
In CheqdDidRegistrar.ts, evaluate whether the pre-flight DID read can fall back to local didRecord when the DID was just created locally.
Treat this as secondary to readiness + retry.
Re-enable the cached cheqd anoncreds test after the above:
Restore cheqd-sdk-anoncreds-registry-cached.e2e.test.ts once CI is stable.