Skip to content

Conversation

gmarouli
Copy link
Contributor

@gmarouli gmarouli commented Sep 22, 2025

In #134790 we introduced a bug that was caught by our tests.;

The problem manifested itself when a multi field mapper would call getText() which would set _tokenIncomplete to false after the getOptimisedText(). This would evaluate the condition _currToken == JsonToken.VALUE_STRING && _tokenIncomplete && stringEnd > 0 to false and the lastOptimisedValue would not be reset.

We changed the code to always reset the lastOptimisedValue when a next*() method is called.

Furthermore, we introduced a randomised unit test that creates two XContentParsers and runs one as a baseline using none of the optimised code and the other one is accessing both optimised and non optimised in a random pattern. This test was able to catch both #134770 & #135256.

Fixes #135256

@gmarouli gmarouli requested a review from a team as a code owner September 22, 2025 11:33
@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label v9.2.0 labels Sep 22, 2025
@gmarouli gmarouli requested a review from martijnvg September 22, 2025 11:33
@gmarouli gmarouli added >bug :Core/Infra/Core Core issues without another label v9.1.5 v8.19.5 labels Sep 22, 2025
@elasticsearchmachine elasticsearchmachine added Team:Core/Infra Meta label for core/infra team and removed needs:triage Requires assignment of a team area label labels Sep 22, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @gmarouli, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Mary!

I think you also need to undo a part #135172 did: https://github.com/elastic/elasticsearch/pull/135172/files#diff-8ef1741dbe67a9d21e8154754d17238802a9b21344f997067befe4f618f985feR145 ?

I think it would be great if we can get a unit test that reproduces the issue. For example a json document that we know caused the logsdb randomized tests to fail. If we need a DotExpandingXContentParser to reproduce that should be ok. We should reproduce a the bug, which is that an incorrect value is returned for a specific field.

I think the unit test coverage needs to be improved. Maybe we should create a unit test that parses a random json document. First with a parser that never uses the uff-8 parsing optimization and secondly with a parser that uses the utf-8 parsing optimization. Both parsers should produce the exact same result. (I think we can let both parsers something like a map of maps). In order to make it more evil, we can randomly invoke text() or optimizedText() multiple times.

@gmarouli gmarouli marked this pull request as draft September 22, 2025 13:22
@gmarouli gmarouli changed the title Bug fix, last optimised value in ESUTF8StreamJsonParser kept old value. WIP - Bug fix, last optimised value in ESUTF8StreamJsonParser kept old value. Sep 22, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @gmarouli, I've updated the changelog YAML for you.

@gmarouli gmarouli added >non-issue and removed >bug labels Sep 25, 2025
@gmarouli
Copy link
Contributor Author

Changed to non-issue to remove the changelog.

@gmarouli gmarouli marked this pull request as ready for review September 25, 2025 08:05
@gmarouli gmarouli changed the title WIP - Bug fix, last optimised value in ESUTF8StreamJsonParser kept old value. Bug fix, last optimised value in ESUTF8StreamJsonParser kept old value. Sep 25, 2025
@gmarouli gmarouli requested a review from martijnvg September 25, 2025 08:19
Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good @gmarouli! I left a few comments around testing.

Maybe also add a yaml test similar to reproduction in #135256?

@gmarouli
Copy link
Contributor Author

Thanks for the feedback @martijnvg , it is much simpler now and I learned a few extra things. I extended the existing test to capture both bugs we encountered.

@gmarouli gmarouli requested a review from martijnvg September 29, 2025 08:42
@gmarouli gmarouli added the auto-backport Automatically create backport pull requests when merged label Sep 29, 2025
Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one testing suggestion, otherwise LGTM.

@gmarouli gmarouli added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Sep 29, 2025
@justzh
Copy link

justzh commented Sep 29, 2025

Ok. First, you guys are making way too many changes in the span of 4 - 5 days. Second of all...

@justzh
Copy link

justzh commented Sep 29, 2025

#132306

@gmarouli gmarouli removed the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Sep 29, 2025
@gmarouli
Copy link
Contributor Author

Ok. First, you guys are making way too many changes in the span of 4 - 5 days. Second of all...

Hi @justzh , indeed we have been very active on this part of the code, happy to walk you through this.

We are trying to ensure this will be in 9.2 so our users will not "miss" the optimisation's benefits. This is why we have been so active. Does it make sense?

If you have any feedback on our approach on handling this bug, I am very happy to hear.

@gmarouli
Copy link
Contributor Author

gmarouli commented Sep 29, 2025

#132306

This is unrelated.

@gmarouli gmarouli merged commit 5420780 into elastic:main Sep 29, 2025
31 of 34 checks passed
@gmarouli gmarouli deleted the bug-fix-utf8-optimiser branch September 29, 2025 12:46
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.19 Commit could not be cherrypicked due to conflicts
9.1 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 135184

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged backport pending :Core/Infra/Core Core issues without another label >non-issue Team:Core/Infra Meta label for core/infra team v8.19.5 v9.1.5 v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ESUTF8StreamJsonParser does not reset correctly the last optimised value.
4 participants