-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Skip data stream reference check in some cases #135457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
During component and composable template validation we check if any data streams become unreferenced if the composite template would be added to the project metadata. We currently do this regardless of what changed in the component or composable template. This check is quite expensive if the cluster has many index templates or many data streams, mostly because the regex matching on index patterns in `findV2Template` is expensive. We know that only a few properties of the composite template can affect data stream references. Specifically, we know that the index patterns, the `hidden` index setting, the priority, and the presence of the `data_stream` field in the template are used to determine the template that a data stream is based on. By checking if only one of those properties changed, we can avoid running the expensive checks in a great deal of template updates. For example, mapping updates, `_meta` updates, any index settings other than `hidden` are all irrelevant to this check, and we can thus skip it for those updates. Especially component template updates will benefit from this change, as they aren't even able to influence the index patterns, data stream template, or priority of the composite template. The main downside of this change is that we have an implicit dependency from this new check on the implementation of `findV2Template`. If we were to modify `findV2Template` to depend on another property (e.g. another setting), we'd have to remember to update this check to avoid us skipping the reference check even though a relevant property was updated.
Pinging @elastic/es-data-management (Team:Data Management) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a missing assert
that will make us much happier here. Let's chat about it out of band.
// We check whether anything relevant has changed that could affect data stream coverage and return early if not. | ||
// These checks are based on the implementation of findV2Template and the data stream template check in this method. | ||
// If we're adding a new template, we do the full check in case this template's priority changes coverage. | ||
if (existing != null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This a nit, but it's throwing me off: the use of Objects.equals
here is inconsistent. I'd rather it's either used exactly and only where it is 100% necessary for the code to be correct, or that it's just used for all the stanzas without thinking. As it is right now, the Objects.equals
around the two != null
checks seems unnecessary.
If you're truly neutral on the only-where-necessary
versus always
front, I slightly prefer always
, because then instead of a mix of a.equals(b)
and a == b
and Object.equals(a, b)
, we'd just have only Object.equals(a, b)
four times.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This crossed my mind too! I preferred Object.equals
for the != null
checks because I felt like (existingTemplate.getDataStreamTemplate() != null) == (newTemplate.getDataStreamTemplate() != null)
would be annoying to read and I didn't know how our spotless configuration would format it. Although the boolean boxing is unfortunate, I figured it wasn't that problematic. I agree on the consistency, though. I'll push a commit to sync them.
String templateName, | ||
ComposableIndexTemplate newTemplate | ||
) { | ||
final ComposableIndexTemplate existing = project.templatesV2().get(templateName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't believe I'm saying this (usually I'm a big fan of conserving precious characters), but perhaps existingTemplate
for symmetry with newTemplate
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argh, I renamed the variable in another place on another branch, but forgot this one. Agreed and will do!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM, but I have a couple minor suggestions so that the future will remember that I was here and totally contributed. (😆)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks Niels!
I realized that this might not be obvious and is worthy of an explanation. Follow-up of elastic#135457.
During component and composable template validation we check if any data streams become unreferenced if the composite template would be added to the project metadata. We currently do this regardless of what changed in the component or composable template. This check is quite expensive if the cluster has many index templates or many data streams, mostly because the regex matching on index patterns in
findV2Template
is expensive.We know that only a few properties of the composite template can affect data stream references. Specifically, we know that the index patterns, the
hidden
index setting, the priority, and the presence of thedata_stream
field in the template are used to determine the template that a data stream is based on.By checking if only one of those properties changed, we can avoid running the expensive checks in a great deal of template updates. For example, mapping updates,
_meta
updates, any index settings other thanhidden
are all irrelevant to this check, and we can thus skip it for those updates. Especially component template updates will benefit from this change, as they aren't even able to influence the index patterns, data stream template, or priority of the composite template.The main downside of this change is that we have an implicit dependency from this new check on the implementation of
findV2Template
. We try to mitigate that risk by doing the expensive/original check in an assertion.