Skip to content

Conversation

Kubik42
Copy link
Contributor

@Kubik42 Kubik42 commented Aug 15, 2025

This is a small refactor + bug for fix 131282.

The refactor changes how text, match_only_text, and annotated_text fields use keyword multi fields for synthetic source. Currently, this is done via the hasSyntheticSourceCompatibleKeywordField argument, where we set a boolean flag to indicate whether there is a keyword multi field that is either stored or has doc values. This is not a good approach for addressing 131282 because we want to disable the following logic for multi fields. With that disabled, the parent fields will no longer have a multi field to use for synthetic source.

We could designate one of the keyword fields as some kind of "synthetic source provider" for the parent. This way the field will always create a StoredField when ignore_above is tripped. However, this is a poor approach since it exposes how text fields are implemented to the keyword field. If the parent field decides how and what is stored, it'll be a lot clearer in the code.

This is where this PR comes in. It aims to remove hasSyntheticSourceCompatibleKeywordField (although kept for now for bwc) and instead relies on the syntheticSourceDelegate. With the addition of a new method canUseSyntheticSourceDelegateForSyntheticSource(), which is called during indexing, we can determine whether a particular keyword multi field is a valid supporter of synthetic source. If it isn't, then the parent field will explicitly create a StoredField for that.

Note: there are a lot of changed files, that said, most of them are just constructor changes. The actual changes are pretty limited.

@Kubik42
Copy link
Contributor Author

Kubik42 commented Aug 15, 2025

Accidentally nuked the previous PR while messing with branches. I did address all of the comments besides this one.

@Kubik42 Kubik42 force-pushed the 131282-2 branch 6 times, most recently from 1f84854 to 53ded56 Compare August 22, 2025 21:33
@Kubik42 Kubik42 marked this pull request as ready for review August 22, 2025 22:51
@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label and removed Team:StorageEngine labels Aug 22, 2025
storedFieldInBinaryFormat
isWithinMultiField,
storedFieldInBinaryFormat,
TextFieldMapper.SyntheticSourceHelper.syntheticSourceDelegate(getFieldType(), multiFields)
Copy link
Contributor Author

@Kubik42 Kubik42 Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was somewhat of a bug - the syntheticSourceDelegate was missing despite MatchOnlyTextMapper using it indirectly. With this change, we're now passing the delegate directly to TextFieldMapper.

@@ -249,59 +244,31 @@ private IOFunction<LeafReaderContext, CheckedIntFunction<List<Object>, IOExcepti
"Field [" + name() + "] of type [" + CONTENT_TYPE + "] cannot run positional queries since [_source] is disabled."
);
}
if (searchExecutionContext.isSourceSynthetic() && withinMultiField) {
Copy link
Contributor Author

@Kubik42 Kubik42 Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this block of code was broken down into three smaller functions for readability:

  • parentFieldFetcher
  • delegateFieldFetcher
  • sourceFieldFetcher

@Kubik42 Kubik42 changed the title Abstracted how Text fields use Keyword fields inside of Text fields Don't store keyword multi fields when they trip ignore_above Aug 22, 2025
/**
* Returns a new {@link CompositeSyntheticFieldLoader} that merges this field loader with the given one.
*/
public CompositeSyntheticFieldLoader mergedWith(CompositeSyntheticFieldLoader other) {
Copy link
Contributor Author

@Kubik42 Kubik42 Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed to merge two field loaders: one for loading values stored by the parent field (when ignore_above is tripped), and the second for loading values stored by the keyword multi field (when ignore_above isn't tripped). Since the keyword multi field already produces a CompositeFieldLoader, I'm just extending that class.

@@ -818,6 +832,10 @@ public Builder builder(BlockFactory factory, int expectedCount) {
return new BlockSourceReader.BytesRefsBlockLoader(fetcher, sourceBlockLoaderLookup(blContext));
}

public boolean isIgnoreAboveSet() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helper function for better readability

@Kubik42 Kubik42 added Team:StorageEngine and removed needs:triage Requires assignment of a team area label labels Aug 23, 2025
@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label and removed Team:StorageEngine labels Aug 23, 2025
Comment on lines +314 to +319
if (indexCreatedVersion.onOrAfter(IndexVersions.DISABLE_NORMS_BY_DEFAULT_FOR_LOGSDB_AND_TSDB)) {
// don't enable norms by default if the index is LOGSDB or TSDB based
return indexMode != IndexMode.LOGSDB && indexMode != IndexMode.TIME_SERIES;
}
// bwc - historically, norms were enabled by default on text fields regardless of which index mode was used
return true;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its better to define these in the constructor to avoid any issues with calling this. before the constructor returns

@Kubik42 Kubik42 requested a review from martijnvg August 27, 2025 14:57
Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good!

I only realize that we are missing multi-field bwc tests. Would you be able to update MatchOnlyTextRollingUpgradeIT to test the multi-field aspect (including ignore_above)? And I think we may want a similar rolling upgrade test suite for text field as well.

Update: This can be done in a separate PR and then we update this PR to include improved bwc test coverage.

@Kubik42
Copy link
Contributor Author

Kubik42 commented Sep 5, 2025

Marked as blocked, pending #134097 and #134096

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Avoid storing ignored source for multi-fields
3 participants