-
Notifications
You must be signed in to change notification settings - Fork 25.4k
Don't store keyword multi fields when they trip ignore_above #132962
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Accidentally nuked the previous PR while messing with branches. I did address all of the comments besides this one. |
1f84854
to
53ded56
Compare
storedFieldInBinaryFormat | ||
isWithinMultiField, | ||
storedFieldInBinaryFormat, | ||
TextFieldMapper.SyntheticSourceHelper.syntheticSourceDelegate(getFieldType(), multiFields) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was somewhat of a bug - the syntheticSourceDelegate
was missing despite MatchOnlyTextMapper
using it indirectly. With this change, we're now passing the delegate directly to TextFieldMapper
.
@@ -249,59 +244,31 @@ private IOFunction<LeafReaderContext, CheckedIntFunction<List<Object>, IOExcepti | |||
"Field [" + name() + "] of type [" + CONTENT_TYPE + "] cannot run positional queries since [_source] is disabled." | |||
); | |||
} | |||
if (searchExecutionContext.isSourceSynthetic() && withinMultiField) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this block of code was broken down into three smaller functions for readability:
parentFieldFetcher
delegateFieldFetcher
sourceFieldFetcher
/** | ||
* Returns a new {@link CompositeSyntheticFieldLoader} that merges this field loader with the given one. | ||
*/ | ||
public CompositeSyntheticFieldLoader mergedWith(CompositeSyntheticFieldLoader other) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed to merge two field loaders: one for loading values stored by the parent field (when ignore_above is tripped), and the second for loading values stored by the keyword multi field (when ignore_above isn't tripped). Since the keyword multi field already produces a CompositeFieldLoader
, I'm just extending that class.
@@ -818,6 +832,10 @@ public Builder builder(BlockFactory factory, int expectedCount) { | |||
return new BlockSourceReader.BytesRefsBlockLoader(fetcher, sourceBlockLoaderLookup(blContext)); | |||
} | |||
|
|||
public boolean isIgnoreAboveSet() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Helper function for better readability
…ize Builder fields
if (indexCreatedVersion.onOrAfter(IndexVersions.DISABLE_NORMS_BY_DEFAULT_FOR_LOGSDB_AND_TSDB)) { | ||
// don't enable norms by default if the index is LOGSDB or TSDB based | ||
return indexMode != IndexMode.LOGSDB && indexMode != IndexMode.TIME_SERIES; | ||
} | ||
// bwc - historically, norms were enabled by default on text fields regardless of which index mode was used | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its better to define these in the constructor to avoid any issues with calling this.
before the constructor returns
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good!
I only realize that we are missing multi-field bwc tests. Would you be able to update MatchOnlyTextRollingUpgradeIT
to test the multi-field aspect (including ignore_above)? And I think we may want a similar rolling upgrade test suite for text field as well.
Update: This can be done in a separate PR and then we update this PR to include improved bwc test coverage.
This is a small refactor + bug for fix 131282.
The refactor changes how
text
,match_only_text
, andannotated_text
fields usekeyword
multi fields for synthetic source. Currently, this is done via the hasSyntheticSourceCompatibleKeywordField argument, where we set a boolean flag to indicate whether there is a keyword multi field that is either stored or has doc values. This is not a good approach for addressing 131282 because we want to disable the following logic for multi fields. With that disabled, the parent fields will no longer have a multi field to use for synthetic source.We could designate one of the keyword fields as some kind of "synthetic source provider" for the parent. This way the field will always create a
StoredField
whenignore_above
is tripped. However, this is a poor approach since it exposes how text fields are implemented to the keyword field. If the parent field decides how and what is stored, it'll be a lot clearer in the code.This is where this PR comes in. It aims to remove
hasSyntheticSourceCompatibleKeywordField
(although kept for now for bwc) and instead relies on thesyntheticSourceDelegate
. With the addition of a new methodcanUseSyntheticSourceDelegateForSyntheticSource()
, which is called during indexing, we can determine whether a particular keyword multi field is a valid supporter of synthetic source. If it isn't, then the parent field will explicitly create aStoredField
for that.Note: there are a lot of changed files, that said, most of them are just constructor changes. The actual changes are pretty limited.