-
Notifications
You must be signed in to change notification settings - Fork 2.4k
feat: implement phantom_threading to group email alerts into threads #4623
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
dd8d44b to
ab70ee3
Compare
sysadmind
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we also need to add a test to make sure that functionality doesn't break in the future. I think we need to find a decision on if we want daily threading only, or if we want the user to choose from a set of threading options before we move forward.
| if n.conf.PhantomThreading && len(as) > 0 { | ||
| // Add threading headers. All notifications for the same alert | ||
| // (identified by fingerprint) on the same day are threaded together. | ||
| // The thread root ID is a phantom Message-ID that doesn't correspond to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is where the name "Phantom Threading" comes from? I'm not sure people will understand that based on the name. Maybe "daily threading" would make more sense, although in that case what if we want to do a different threading strategy in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the term “phantom threading” is a creation of my own. I am open to a better suggestion. If we decide that “daily” is the only granularity we want, daily threading would be fine with me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if we just call it threading? Or if that's too concrete, maybe attempt_threading
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, this is an interesting change! Threading for alertmanager notifications would be very useful for teams using email!
I have a couple questions:
- Did you consider using the
GroupKeyhas to source for the thread root id? If I understand the rest of this change correct, this would result in one thread per group. It's also consistent with how we typically group/dedup notifications. For example, the PagerDuty integration uses the hash of the group key as the PagerDuty dedup key: https://github.com/prometheus/alertmanager/blob/main/notify/pagerduty/pagerduty.go#L241. This would also make threading configured directly by routing config. - It looks like we end up just setting some headers on the email. Is this something that could be implemented by templated header? I'm also wondering if this has any strange interaction with the user's config if they set headers.
- The comment mentions email clients that use "
(commonly used) JWZ" - do you know how this change interacts with email clients that don't behave that way?
Some email clients such as Gmail apparently use their own heuristics for threading and already implement this behavior based on the subject. But for users of other email clients that only implement threading based on the relevant headers (e.g. notmuch), those users currently get one email thread for each newly firing alert. With phantom_threading enabled, all alert emails (of the same alert) on the same day are grouped into the same thread. Much nicer :) Signed-off-by: Michael Stapelberg <[email protected]>
ab70ee3 to
418578d
Compare
|
Thank you both for your review! Answers inline:
Thanks, that’s a great tip! Done.
I prototyped this idea, and if we expose the shortened GroupKeyHash and n.hostname to the template, then users would be able to configure a templated header like so: …but that’s pretty complicated for a user. My preference would be to stick with the high-level configuration option (phantom threading enabled/disabled). If we really think it’s required, we can make the date configurable, but realistically, I don’t see one-thread-per-month as an option that anyone would want. Maybe one-thread-per-calendar-week? But that seems rather unconventional, too.
The worst that can happen is that email clients keep threading the way they currently do (i.e. no threading for alerts). Email clients must cope with non-existing references (common case: somebody adds you to an email thread, so you don’t have the earlier messages). |
Spaceman1701
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making those changes!
I prototyped this idea, and if we expose the shortened GroupKeyHash and n.hostname to the template, then users would be able to configure a templated header like so:
I see what you mean. I agree with your conclusion - having explicit config is more clear.
If we really think it’s required, we can make the date configurable, but realistically, I don’t see one-thread-per-month as an option that anyone would want.
I'm of two minds on the one-thread-per-day behavior. One one hand, I understand wanting to have threads end after a while (even now that there should be one thread per-group). On the other hand, it feels a little arbitrary to me.
To me, it seems like the ideal behavior would be to make one thread specific alert group (e.g. when a group resolves, the thread would be ended). However, we'd need to expose the notification reason AND aribtrary metadata (like suggested here) to make that work.
I really don't want to block this change on that feature, since it might be a while before it gets merged. I also could see users wanting daily/monthly/hourly threads anyway.
So I think I'd prefer if the date was configurable (and optional). Would you be open to that? For now, we can support just "daily" or no date-based cutoff, but I'd prefer if the config was left extensible so we can iterate on it over time. Something like:
phantom_threading:
enabled: true
thread_by_date: daily
| if n.conf.PhantomThreading && len(as) > 0 { | ||
| // Add threading headers. All notifications for the same alert | ||
| // (identified by fingerprint) on the same day are threaded together. | ||
| // The thread root ID is a phantom Message-ID that doesn't correspond to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if we just call it threading? Or if that's too concrete, maybe attempt_threading
Some email clients such as Gmail apparently use their own heuristics for threading and already implement this behavior based on the subject.
But for users of other email clients that only implement threading based on the relevant headers (e.g. notmuch), those users currently get one email thread for each newly firing alert.
With phantom_threading enabled, all alert emails (of the same alert) on the same day are grouped into the same thread. Much nicer :)
I have tested this manually and you can see the effect start to work in this screenshot:
(Monday morning, I got one thread per alert email notification; in the evening, the threading change was effective and emails are grouped into the daily thread.)