Remove early phase failure in batched #136889

benchaplin · 2025-10-21T16:11:13Z

Background

A bug was introduced by #121885 due to the following code, which handles batched query exceptions due to a batched partial reduction failure:

elasticsearch/server/src/main/java/org/elasticsearch/action/search/SearchQueryThenFetchAsyncAction.java

Lines 525 to 544 in bd35649

    
           @Override 
        
           public void handleException(TransportException e) { 
        
               Exception cause = (Exception) ExceptionsHelper.unwrapCause(e); 
        
               logger.debug("handling node search exception coming from [" + nodeId + "]", cause); 
        
               if (e instanceof SendRequestTransportException || cause instanceof TaskCancelledException) { 
        
                   // two possible special cases here where we do not want to fail the phase: 
        
                   // failure to send out the request -> handle things the same way a shard would fail with unbatched execution 
        
                   // as this could be a transient failure and partial results we may have are still valid 
        
                   // cancellation of the whole batched request on the remote -> maybe we timed out or so, partial results may 
        
                   // still be valid 
        
                   onNodeQueryFailure(e, request, routing); 
        
               } else { 
        
                   // Remote failure that wasn't due to networking or cancellation means that the data node was unable to reduce 
        
                   // its local results. Failure to reduce always fails the phase without exception so we fail the phase here. 
        
                   if (results instanceof QueryPhaseResultConsumer queryPhaseResultConsumer) { 
        
                       queryPhaseResultConsumer.failure.compareAndSet(null, cause); 
        
                   } 
        
                   onPhaseFailure(getName(), "", cause); 
        
               } 
        
           }

Raising a phase failure in this way leads to a couple issues:

It can be called more than once (as seen in [Search] Exceptions in datanodes leading to assertFirstRun() failures #134151).
The subsequent freeing of contexts can miss concurrent in-flight queries, resulting in open contexts after the failure (as seen in [CI] SearchWithRejectionsIT testOpenContextsAfterRejections failing #130821).

Solution

Problem 1 could be resolved with a simple flag, as proposed in #131085. Problem 2 could be resolved with some careful use of the same flag to clean contexts upon receiving stale query results. However, in the interest of stability, I propose a solution that more closely resembles how a reduction failure is handled by a non-batched query phase. In non-batched, a reduction failure is held in the QueryPhaseResultConsumer until shard fanout is complete. Only later, during final reduction at the beginning of the fetch phase, do we fail the search.

Fast failure + proper task cancellation are worthy goals for the future. I am tracking these as follow-up improvements for after the release of batched query execution.

This PR:

Alters a batched query request to respond with shard results in the case of a reduction failure on the data node (the failure is now conditionally included in the NodeQueryResponse).
Removes the early phase failure on the coord node. The coord's QueryPhaseResultConsumer will hold onto the failure and fail eventually during the fetch phase, same as non-batched.

elasticsearchmachine · 2025-10-21T16:11:38Z

Pinging @elastic/es-search-foundations (Team:Search Foundations)

elasticsearchmachine · 2025-10-21T16:11:39Z

Hi @benchaplin, I've created a changelog YAML for you.

chrisparrinello · 2025-10-21T16:55:23Z

server/src/main/java/org/elasticsearch/action/search/SearchQueryThenFetchAsyncAction.java

            this.results = in.readArray(i -> i.readBoolean() ? new QuerySearchResult(i) : i.readException(), Object[]::new);
-            this.mergeResult = QueryPhaseResultConsumer.MergeResult.readFrom(in);
-            this.topDocsStats = SearchPhaseController.TopDocsStats.readFrom(in);
+            boolean hasReductionFailure = in.readBoolean();


Since we're changing the shape of this message, do we need to create a new transport version or is that taken care of for us?

Yes I believe I do, once I learn how 😂

chrisparrinello

LGTM

benchaplin added 2 commits October 21, 2025 10:23

Rework reduction failure to be successful transport response

234089c

Merge branch 'main' into fix_batched_phase_failure

76e325c

benchaplin added >bug Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch :Search Foundations/Search Catch all for Search Foundations v9.1.7 labels Oct 21, 2025

elasticsearchmachine added the v9.3.0 label Oct 21, 2025

Update docs/changelog/136889.yaml

bf76a65

chrisparrinello reviewed Oct 21, 2025

View reviewed changes

benchaplin and others added 2 commits October 22, 2025 10:48

Set up new transport version

7540aa9

[CI] Update transport version definitions

c48633c

chrisparrinello approved these changes Oct 22, 2025

View reviewed changes

benchaplin added auto-backport Automatically create backport pull requests when merged v9.2.1 labels Oct 22, 2025

[CI] Update transport version definitions

d9d0b7f

benchaplin marked this pull request as draft October 22, 2025 21:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove early phase failure in batched #136889

Remove early phase failure in batched #136889

benchaplin commented Oct 21, 2025

Uh oh!

elasticsearchmachine commented Oct 21, 2025

Uh oh!

elasticsearchmachine commented Oct 21, 2025

Uh oh!

chrisparrinello Oct 21, 2025

Uh oh!

benchaplin Oct 21, 2025

Uh oh!

chrisparrinello left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	@Override
	public void handleException(TransportException e) {
	Exception cause = (Exception) ExceptionsHelper.unwrapCause(e);
	logger.debug("handling node search exception coming from [" + nodeId + "]", cause);
	if (e instanceof SendRequestTransportException \|\| cause instanceof TaskCancelledException) {
	// two possible special cases here where we do not want to fail the phase:
	// failure to send out the request -> handle things the same way a shard would fail with unbatched execution
	// as this could be a transient failure and partial results we may have are still valid
	// cancellation of the whole batched request on the remote -> maybe we timed out or so, partial results may
	// still be valid
	onNodeQueryFailure(e, request, routing);
	} else {
	// Remote failure that wasn't due to networking or cancellation means that the data node was unable to reduce
	// its local results. Failure to reduce always fails the phase without exception so we fail the phase here.
	if (results instanceof QueryPhaseResultConsumer queryPhaseResultConsumer) {
	queryPhaseResultConsumer.failure.compareAndSet(null, cause);
	}
	onPhaseFailure(getName(), "", cause);
	}
	}

Remove early phase failure in batched #136889

Are you sure you want to change the base?

Remove early phase failure in batched #136889

Conversation

benchaplin commented Oct 21, 2025

Background

Solution

Uh oh!

elasticsearchmachine commented Oct 21, 2025

Uh oh!

elasticsearchmachine commented Oct 21, 2025

Uh oh!

chrisparrinello Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

benchaplin Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

chrisparrinello left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants