Skip to content

Conversation

nielsbauman
Copy link
Contributor

@nielsbauman nielsbauman commented Sep 25, 2025

During component and composable template validation we check if any data streams become unreferenced if the composite template would be added to the project metadata. We currently do this regardless of what changed in the component or composable template. This check is quite expensive if the cluster has many index templates or many data streams, mostly because the regex matching on index patterns in findV2Template is expensive.

We know that only a few properties of the composite template can affect data stream references. Specifically, we know that the index patterns, the hidden index setting, the priority, and the presence of the data_stream field in the template are used to determine the template that a data stream is based on.

By checking if only one of those properties changed, we can avoid running the expensive checks in a great deal of template updates. For example, mapping updates, _meta updates, any index settings other than hidden are all irrelevant to this check, and we can thus skip it for those updates. Especially component template updates will benefit from this change, as they aren't even able to influence the index patterns, data stream template, or priority of the composite template.

The main downside of this change is that we have an implicit dependency from this new check on the implementation of findV2Template. We try to mitigate that risk by doing the expensive/original check in an assertion.

During component and composable template validation we check if any data
streams become unreferenced if the composite template would be added to
the project metadata. We currently do this regardless of what changed in
the component or composable template. This check is quite expensive if
the cluster has many index templates or many data streams, mostly
because the regex matching on index patterns in `findV2Template` is
expensive.

We know that only a few properties of the composite template can affect
data stream references. Specifically, we know that the index patterns,
the `hidden` index setting, the priority, and the presence of the
`data_stream` field in the template are used to determine the template
that a data stream is based on.

By checking if only one of those properties changed, we can avoid
running the expensive checks in a great deal of template updates. For
example, mapping updates, `_meta` updates, any index settings other than
`hidden` are all irrelevant to this check, and we can thus skip it for
those updates. Especially component template updates will benefit from
this change, as they aren't even able to influence the index patterns,
data stream template, or priority of the composite template.

The main downside of this change is that we have an implicit dependency
from this new check on the implementation of `findV2Template`. If we
were to modify `findV2Template` to depend on another property (e.g.
another setting), we'd have to remember to update this check to avoid us
skipping the reference check even though a relevant property was
updated.
@nielsbauman nielsbauman added :Data Management/Indices APIs APIs to create and manage indices and templates >refactoring labels Sep 25, 2025
@elasticsearchmachine elasticsearchmachine added Team:Data Management Meta label for data/management team v9.2.0 labels Sep 25, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

Copy link
Contributor

@joegallo joegallo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a missing assert that will make us much happier here. Let's chat about it out of band.

@nielsbauman
Copy link
Contributor Author

@joegallo, great suggestion! I added some assertion logic in 346a82d.

// We check whether anything relevant has changed that could affect data stream coverage and return early if not.
// These checks are based on the implementation of findV2Template and the data stream template check in this method.
// If we're adding a new template, we do the full check in case this template's priority changes coverage.
if (existing != null
Copy link
Contributor

@joegallo joegallo Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This a nit, but it's throwing me off: the use of Objects.equals here is inconsistent. I'd rather it's either used exactly and only where it is 100% necessary for the code to be correct, or that it's just used for all the stanzas without thinking. As it is right now, the Objects.equals around the two != null checks seems unnecessary.

If you're truly neutral on the only-where-necessary versus always front, I slightly prefer always, because then instead of a mix of a.equals(b) and a == b and Object.equals(a, b), we'd just have only Object.equals(a, b) four times.

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This crossed my mind too! I preferred Object.equals for the != null checks because I felt like (existingTemplate.getDataStreamTemplate() != null) == (newTemplate.getDataStreamTemplate() != null) would be annoying to read and I didn't know how our spotless configuration would format it. Although the boolean boxing is unfortunate, I figured it wasn't that problematic. I agree on the consistency, though. I'll push a commit to sync them.

String templateName,
ComposableIndexTemplate newTemplate
) {
final ComposableIndexTemplate existing = project.templatesV2().get(templateName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't believe I'm saying this (usually I'm a big fan of conserving precious characters), but perhaps existingTemplate for symmetry with newTemplate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh, I renamed the variable in another place on another branch, but forgot this one. Agreed and will do!

Copy link
Contributor

@joegallo joegallo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM, but I have a couple minor suggestions so that the future will remember that I was here and totally contributed. (😆)

Copy link
Contributor

@joegallo joegallo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Niels!

@nielsbauman nielsbauman enabled auto-merge (squash) October 1, 2025 14:18
@nielsbauman nielsbauman merged commit 6b53351 into elastic:main Oct 1, 2025
34 checks passed
@nielsbauman nielsbauman deleted the skip-ds-reference-check branch October 1, 2025 15:21
nielsbauman added a commit to nielsbauman/elasticsearch that referenced this pull request Oct 1, 2025
I realized that this might not be obvious and is worthy of an
explanation. Follow-up of elastic#135457.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Indices APIs APIs to create and manage indices and templates >refactoring Team:Data Management Meta label for data/management team v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants