Skip to content

Conversation

@Jayesh45-master
Copy link

Fixes #51

Title
feat(operator): alert on unresolved channels in LMOS Operator

Description
What does this change do?
This PR introduces an alerting mechanism for unresolved Channels in the LMOS Operator.
When a Channel cannot resolve its required capabilities against available Agent resources, the Operator already marks the Channel as UNRESOLVED, but there was no notification or alert emitted. This change adds structured alerting so that unresolved states are visible to operators and monitoring systems.

The alert includes:
Namespace
Channel name
Unresolved capabilities (id, name, version)
Reason for unresolved state
Alerts are always logged and can optionally be sent to an external system via a configurable webhook.

Why is this change needed?
Previously:
Channels could remain unresolved silently
Operators had no easy way to detect capability mismatches
Debugging required manual inspection of Channel status

This led to:
Reduced observability
Delayed detection of misconfigurations
Potential runtime issues going unnoticed
This PR improves operational visibility while keeping runtime behavior unchanged.

How is this implemented?
Added a new AlertClient component responsible for:
Structured logging of unresolved Channel alerts
Optional webhook-based notification (LMOS_ALERT_WEBHOOK_URL)
Integrated alert emission into ChannelDependentResource when:
Capability resolution fails
Channel transitions into UNRESOLVED state
Alerting is non-blocking and best-effort
Failures in alert delivery do not affect reconciliation logic

Files changed
src/main/kotlin/org/eclipse/lmos/operator/alert/AlertClient.kt
New alert client responsible for logging and webhook delivery

src/main/kotlin/org/eclipse/lmos/operator/reconciler/ChannelDependentResource.kt
Emit alerts when Channel capability resolution fails

Testing done
Manually tested reconciliation flow with:
Missing Agent capabilities
Incompatible capability versions

Verified:
Channel status is set to UNRESOLVED
Structured alert log entry is emitted
Webhook delivery succeeds when LMOS_ALERT_WEBHOOK_URL is configured
Operator behavior remains unchanged when webhook is not configured
Confirmed no impact on successful Channel resolution paths

Backward compatibility
✅ No breaking changes
✅ Existing Channel resolution logic unchanged
✅ Alerting is additive and optional
✅ No configuration required unless webhook notifications are desired

Security considerations
Webhook URL is read from environment variable
No sensitive data beyond Channel metadata is transmitted
Failures in external communication do not affect core Operator logic

Screenshots
N/A – backend and operator logic change only.

Checklist
Code follows existing project conventions
Change is minimal and scoped
No behavioral regression introduced
Alerting is optional and non-blocking
Logging provides sufficient diagnostic context

Notes for reviewers
This PR focuses on observability only.
It does not alter scheduling, routing, or capability matching logic.
Future enhancements (out of scope here) could include:
Prometheus metrics for unresolved Channels
Alert deduplication or throttling
Pluggable notification backends

import java.net.HttpURLConnection
import java.net.URL

object AlertClient {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Operator is using Spring Boot. This could be a Spring bean instead of a object.
Configuration should not be read directly via System.gentenv.
Spring Boot provides better options. Please have a look at https://github.com/eclipse-lmos/lmos-operator/blob/main/src/main/kotlin/org/eclipse/lmos/operator/OperatorConfig.kt and

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review!

I've refactored AlertClient to be a Spring-managed bean and switched configuration
handling to Spring Boot property injection, avoiding direct use of System.getenv.

This aligns with the existing OperatorConfig approach and makes the alerting
logic easier to test and configure.

Please let me know if you'd prefer @ConfigurationProperties instead of @value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Alerting for Unresolved Channels in LMOS Operator

2 participants