lmos-operator issue resolved #149
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #51
Title
feat(operator): alert on unresolved channels in LMOS Operator
Description
What does this change do?
This PR introduces an alerting mechanism for unresolved Channels in the LMOS Operator.
When a Channel cannot resolve its required capabilities against available Agent resources, the Operator already marks the Channel as UNRESOLVED, but there was no notification or alert emitted. This change adds structured alerting so that unresolved states are visible to operators and monitoring systems.
The alert includes:
Namespace
Channel name
Unresolved capabilities (id, name, version)
Reason for unresolved state
Alerts are always logged and can optionally be sent to an external system via a configurable webhook.
Why is this change needed?
Previously:
Channels could remain unresolved silently
Operators had no easy way to detect capability mismatches
Debugging required manual inspection of Channel status
This led to:
Reduced observability
Delayed detection of misconfigurations
Potential runtime issues going unnoticed
This PR improves operational visibility while keeping runtime behavior unchanged.
How is this implemented?
Added a new AlertClient component responsible for:
Structured logging of unresolved Channel alerts
Optional webhook-based notification (LMOS_ALERT_WEBHOOK_URL)
Integrated alert emission into ChannelDependentResource when:
Capability resolution fails
Channel transitions into UNRESOLVED state
Alerting is non-blocking and best-effort
Failures in alert delivery do not affect reconciliation logic
Files changed
src/main/kotlin/org/eclipse/lmos/operator/alert/AlertClient.kt
New alert client responsible for logging and webhook delivery
src/main/kotlin/org/eclipse/lmos/operator/reconciler/ChannelDependentResource.kt
Emit alerts when Channel capability resolution fails
Testing done
Manually tested reconciliation flow with:
Missing Agent capabilities
Incompatible capability versions
Verified:
Channel status is set to UNRESOLVED
Structured alert log entry is emitted
Webhook delivery succeeds when LMOS_ALERT_WEBHOOK_URL is configured
Operator behavior remains unchanged when webhook is not configured
Confirmed no impact on successful Channel resolution paths
Backward compatibility
✅ No breaking changes
✅ Existing Channel resolution logic unchanged
✅ Alerting is additive and optional
✅ No configuration required unless webhook notifications are desired
Security considerations
Webhook URL is read from environment variable
No sensitive data beyond Channel metadata is transmitted
Failures in external communication do not affect core Operator logic
Screenshots
N/A – backend and operator logic change only.
Checklist
Code follows existing project conventions
Change is minimal and scoped
No behavioral regression introduced
Alerting is optional and non-blocking
Logging provides sufficient diagnostic context
Notes for reviewers
This PR focuses on observability only.
It does not alter scheduling, routing, or capability matching logic.
Future enhancements (out of scope here) could include:
Prometheus metrics for unresolved Channels
Alert deduplication or throttling
Pluggable notification backends