Skip to content

Conversation

@stapelberg
Copy link
Contributor

Some email clients such as Gmail apparently use their own heuristics for threading and already implement this behavior based on the subject.

But for users of other email clients that only implement threading based on the relevant headers (e.g. notmuch), those users currently get one email thread for each newly firing alert.

With phantom_threading enabled, all alert emails (of the same alert) on the same day are grouped into the same thread. Much nicer :)


I have tested this manually and you can see the effect start to work in this screenshot:

2025-10-22-alertmanager-threading

(Monday morning, I got one thread per alert email notification; in the evening, the threading change was effective and emails are grouped into the daily thread.)

@stapelberg stapelberg force-pushed the threading branch 3 times, most recently from dd8d44b to ab70ee3 Compare October 22, 2025 06:55
Copy link
Contributor

@sysadmind sysadmind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also need to add a test to make sure that functionality doesn't break in the future. I think we need to find a decision on if we want daily threading only, or if we want the user to choose from a set of threading options before we move forward.

if n.conf.PhantomThreading && len(as) > 0 {
// Add threading headers. All notifications for the same alert
// (identified by fingerprint) on the same day are threaded together.
// The thread root ID is a phantom Message-ID that doesn't correspond to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is where the name "Phantom Threading" comes from? I'm not sure people will understand that based on the name. Maybe "daily threading" would make more sense, although in that case what if we want to do a different threading strategy in the future?

Copy link
Contributor Author

@stapelberg stapelberg Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the term “phantom threading” is a creation of my own. I am open to a better suggestion. If we decide that “daily” is the only granularity we want, daily threading would be fine with me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if we just call it threading? Or if that's too concrete, maybe attempt_threading

Copy link
Contributor

@Spaceman1701 Spaceman1701 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, this is an interesting change! Threading for alertmanager notifications would be very useful for teams using email!

I have a couple questions:

  1. Did you consider using the GroupKey has to source for the thread root id? If I understand the rest of this change correct, this would result in one thread per group. It's also consistent with how we typically group/dedup notifications. For example, the PagerDuty integration uses the hash of the group key as the PagerDuty dedup key: https://github.com/prometheus/alertmanager/blob/main/notify/pagerduty/pagerduty.go#L241. This would also make threading configured directly by routing config.
  2. It looks like we end up just setting some headers on the email. Is this something that could be implemented by templated header? I'm also wondering if this has any strange interaction with the user's config if they set headers.
  3. The comment mentions email clients that use "(commonly used) JWZ" - do you know how this change interacts with email clients that don't behave that way?

Some email clients such as Gmail apparently use their own heuristics
for threading and already implement this behavior based on the subject.

But for users of other email clients that only implement threading
based on the relevant headers (e.g. notmuch), those users currently
get one email thread for each newly firing alert.

With phantom_threading enabled, all alert emails (of the same alert)
on the same day are grouped into the same thread. Much nicer :)

Signed-off-by: Michael Stapelberg <[email protected]>
@stapelberg
Copy link
Contributor Author

Thank you both for your review! Answers inline:

  1. Did you consider using the GroupKey has to source for the thread root id? If I understand the rest of this change correct, this would result in one thread per group. It's also consistent with how we typically group/dedup notifications. For example, the PagerDuty integration uses the hash of the group key as the PagerDuty dedup key: https://github.com/prometheus/alertmanager/blob/main/notify/pagerduty/pagerduty.go#L241. This would also make threading configured directly by routing config.

Thanks, that’s a great tip! Done.

  1. It looks like we end up just setting some headers on the email. Is this something that could be implemented by templated header? I'm also wondering if this has any strange interaction with the user's config if they set headers.

I prototyped this idea, and if we expose the shortened GroupKeyHash and n.hostname to the template, then users would be able to configure a templated header like so:

References: <alert-{{ .GroupKeyHash }}-{{ range $idx, $alert := .Alerts }}{{ if (eq $idx 0) }}{{ $alert.StartsAt.Format \"2006-01-02\" }}{{ end }}{{ end }}@{{ .Hostname }}>
In-Reply-To: <alert-{{ .GroupKeyHash }}-{{ range $idx, $alert := .Alerts }}{{ if (eq $idx 0) }}{{ $alert.StartsAt.Format \"2006-01-02\" }}{{ end }}{{ end }}@{{ .Hostname }}>

…but that’s pretty complicated for a user.

My preference would be to stick with the high-level configuration option (phantom threading enabled/disabled). If we really think it’s required, we can make the date configurable, but realistically, I don’t see one-thread-per-month as an option that anyone would want. Maybe one-thread-per-calendar-week? But that seems rather unconventional, too.

  1. The comment mentions email clients that use "(commonly used) JWZ" - do you know how this change interacts with email clients that don't behave that way?

The worst that can happen is that email clients keep threading the way they currently do (i.e. no threading for alerts). Email clients must cope with non-existing references (common case: somebody adds you to an email thread, so you don’t have the earlier messages).

Copy link
Contributor

@Spaceman1701 Spaceman1701 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making those changes!

I prototyped this idea, and if we expose the shortened GroupKeyHash and n.hostname to the template, then users would be able to configure a templated header like so:

I see what you mean. I agree with your conclusion - having explicit config is more clear.

If we really think it’s required, we can make the date configurable, but realistically, I don’t see one-thread-per-month as an option that anyone would want.

I'm of two minds on the one-thread-per-day behavior. One one hand, I understand wanting to have threads end after a while (even now that there should be one thread per-group). On the other hand, it feels a little arbitrary to me.

To me, it seems like the ideal behavior would be to make one thread specific alert group (e.g. when a group resolves, the thread would be ended). However, we'd need to expose the notification reason AND aribtrary metadata (like suggested here) to make that work.

I really don't want to block this change on that feature, since it might be a while before it gets merged. I also could see users wanting daily/monthly/hourly threads anyway.

So I think I'd prefer if the date was configurable (and optional). Would you be open to that? For now, we can support just "daily" or no date-based cutoff, but I'd prefer if the config was left extensible so we can iterate on it over time. Something like:

phantom_threading:
    enabled: true
    thread_by_date: daily

if n.conf.PhantomThreading && len(as) > 0 {
// Add threading headers. All notifications for the same alert
// (identified by fingerprint) on the same day are threaded together.
// The thread root ID is a phantom Message-ID that doesn't correspond to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if we just call it threading? Or if that's too concrete, maybe attempt_threading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants