Sibling errors should not be added after propagation #1184

benjie · 2025-07-10T18:05:07Z

This PR is built on top of:

Fix "response position" definition; clarify sibling errors on propagation #1183

GraphQL.js output is not (currently) stable after an operation terminates: more errors may be added to the result after the promise has resolved!

Reproduction with `graphql` module `test.mts`

import type { ExecutionResult } from "graphql";
import {
  graphql,
  GraphQLInt,
  GraphQLNonNull,
  GraphQLObjectType,
  GraphQLSchema,
} from "graphql";

const sleep = (ms: number) => new Promise((resolve) => setTimeout(resolve, ms));

const Test = new GraphQLObjectType({
  name: "Test",
  fields: {
    a: {
      type: GraphQLInt,
      async resolve() {
        await sleep(0);
        throw new Error(`a`);
      },
    },
    b: {
      type: new GraphQLNonNull(GraphQLInt),
      async resolve() {
        await sleep(10);
        throw new Error(`b`);
      },
    },
    c: {
      type: GraphQLInt,
      async resolve() {
        await sleep(20);
        throw new Error(`c`);
      },
    },
  },
});

const Query = new GraphQLObjectType({
  name: "Query",
  fields: {
    test: {
      type: Test,
      resolve() {
        return {};
      },
    },
  },
});
const schema = new GraphQLSchema({
  query: Query,
});

const result = await graphql({
  schema,
  source: `{ test { a b c } }`,
});

console.log("Result:");
console.log();
console.log(JSON.stringify(result, null, 2));
await sleep(100);
console.log();
console.log("Exact same object 100ms later:");
console.log();
console.log(JSON.stringify(result, null, 2));

$ node test.mts 
Result:

{
  "errors": [
    { "message": "a", "path": ["test", "a"] },
    { "message": "b", "path": ["test", "b"] }
  ],
  "data": { "test": null }
}

Exact same object 100ms later:

{
  "errors": [
    { "message": "a", "path": ["test", "a"] },
    { "message": "b", "path": ["test", "b"] },
    { "message": "c", "path": ["test", "c"] }
  ],
  "data": { "test": null }
}

(I've formatted this output for brevity)

The reason for this: though we note in the spec that you may cancel sibling execution positions, we don't do that in GraphQL.js; and furthermore, we even process errors from the result and add them to the errors list!

This is particularly problematic for client-side "throw on error". Given this schema:

type Query {
  test: Test
}
type Test {
  a: Int  # Throws immediately
  b: Int! # Throws after 10ms
  c: Int  # Throws after 20ms
}

And the same spec-valid result as above:

{
  "errors": [
    { "message": "a", "path": ["test", "a"] },
    { "message": "b", "path": ["test", "b"] },
    { "message": "c", "path": ["test", "c"] }
  ],
  "data": { "test": null }
}

Technically the Test.b field is the field that caused data.test to be null - it's non-nullable, so it triggered error propagation - but without looking at the schema we can't determine this.

Solution: recommend that servers don't keep adding to errors after error propagation has occurred. This would mean:

GraphQL.js won't keep adding to errors after the operation has "completed"
We can throw the last error received that relates to the associated field, and trust that for an implementation following the recommendations it's going to be the one either from the field itself or from the field that triggered error propagation to this level.

yaacovCR · 2025-07-15T14:04:50Z

Took a stab at the implementation in graphql-js within our 16.x.x line:

graphql/graphql-js#4458

Although part of me feels like an implementer with deep knowledge of the relative expected ordering of its resolvers could be theoretically confused by the missing errors such that this might belong in v17.

Thoughts?

martinbonnin

+1 to this 👍

That being said, I think this problem highlights that the current algorithms are not 100% clear on the semantics of raising errors and cancellation.

I get that resolvers might not be cancellable but it's suprising to me that graphql-js code is still executed after the execution result is received by the caller.

Ideally, there is a prompt cancellation guarantee that every callback checks for cancellation and stops processing if cancelled. Doing so in the language-neutral spec sounds like a terrible head ache though 😄 Problem for another day!

Never mind, it's an issue even (and especially) in the absence of cancellation. Well, that sucks

martinbonnin · 2025-07-18T09:58:24Z

Putting down my thoughts from yesterday's wg before I forget everything about them.

This is a significant issue but the ultimate fix is onError: NULL IMO. I'm currently leaning towards declaring bankrupcy on this specific issue:

If users can update their servers, they should add proper support for onError: NULL. If they can't, they won't be able to fix this issue anyways 🤷

This means when an error bubbles, there is no way to know which error triggered the bubbling. It's the existing behaviour. It's unfortunate but it is what it is. If you want to do better, migrate to onError: NULL.

Note: I would still change the graphql-js behaviour to not have the response change after the promise is resolved, this feels very surprising to me.

yaacovCR · 2025-07-18T10:25:35Z

Relevant, in terms of cancellation, merged to v17:

stop resolvers after execution ends graphql-js#4263

yaacovCR · 2025-07-18T10:28:01Z

@martinbonnin could you elaborate a bit more on the scenarios in which

Sibling errors should not be added after propagation graphql-js#4458

And relying on the final error on that nulled path does not work?

martinbonnin · 2025-07-18T14:32:16Z

Relevant, in terms of cancellation, merged to v17

I can still reproduce @benjie behaviour that the result changes after the promise has been resolved, even using 17.0.0-alpha.9:

$ node test.mts 
Result:

{
  "errors": [
    { "message": "a", "path": ["test", "a"] },
    { "message": "b", "path": ["test", "b"] }
  ],
  "data": { "test": null }
}

Exact same object 100ms later:

{
  "errors": [
    { "message": "a", "path": ["test", "a"] },
    { "message": "b", "path": ["test", "b"] },
    { "message": "c", "path": ["test", "c"] }
  ],
  "data": { "test": null }
}

This seems suprising to me. Is that expected?

could you elaborate a bit more on the scenarios in which graphql/graphql-js#4458 and relying on the final error on that nulled path does not work?

My understanding is that the problem we are trying to solve is allowing clients using graphql-toe to determine what error caused the null-bubbling without schema knowledge?

#4458 is indeed a solution to that problem.

My point is that it is an inferior solution to onError: null. It is potentially a breaking change (the same query now returns a different result) and also requires updating your server (same as onError: null) while not allowing fine-grained error-handling.

I'd rather focus our efforts and messaging on onError: null.

yaacovCR · 2025-07-18T15:50:20Z

I can still reproduce @benjie behaviour that the result changes after the promise has been resolved, even using 17.0.0-alpha.9:

This seems suprising to me. Is that expected?

Yes, it needs graphql/graphql-js#4458 to solve that issue. What has been merged is eventual cancellation of the resolver cascade (in addition to triggering of passed abort signal merged separately). Just adding that your (and my) prompt cancellation aspirations, while not fulfilled in v17, have been pushed forward a bit.

Creating many fine grained abort controllers to immediately cancel turned out to be too much of a performance hit… and we didn’t think enough to care about the issue of spooky additional errors after completion.

My point is that it is an inferior solution to onError: null. It is potentially a breaking change (the same query now returns a different result) and also requires updating your server (same as onError: null) while not allowing fine-grained error-handling.

Got it.

I'd rather focus our efforts and messaging on onError: null.

Shouldn’t we fix this behavior for all onError modes?

martinbonnin · 2025-07-18T16:04:30Z

Yes, it needs graphql/graphql-js#4458 to solve that issue

Gotcha 👍 . If I may nitpick the terminology a bit here, my point is that:

Sibling errors should not be added after **cancellation** => this is needed.
Sibling errors should not be added after **propagation** => this is probably not needed.

Creating many fine grained abort controllers to immediately cancel turned out to be too much of a performance hit…

Apologies in advance for the naive question but since JS is ultimately single threaded, shouldn't checking for cancellation be reading a single per-field flag? Or is this what is actually slow?

Shouldn’t we fix this behavior for all onError modes?

I'm fine and happy to let it go for the current onError: PROPAGATE mode.

Fixing it means that some current queries will see a different response (some errors will disappear). As with every change, it might break someone's workflow. It's more work for us, new entries to process in the graphql-js changelog for everyone, all of that for something that IMO should become the "legacy" error mode.

I say it's not worth the tradeoff.

benjie · 2025-07-22T07:57:38Z

It is potentially a breaking change

Any change is potentially a breaking change. This one does not break the spec, or expectations, and is incredibly unlikely to break anything. Most people don't handle any errors, let alone assert that two different errors are thrown in the same spot. If you can demonstrate any application in the wild that this would break, I would be amazed.

Creating many fine grained abort controllers to immediately cancel turned out to be too much of a performance hit… and we didn’t think enough to care about the issue of spooky additional errors after completion.

Yeah, I think abort controllers are excessive for this, it just needs a boolean somewhere related to the path that you can say wasCancelled=true and if so you stop accumulating. Given we already have the data the resolvers return, it seems to me like it could be tracked in parallel to that?

For solving the spookiness it's trivial, just slice the array before returning it:

 // Pseudo-diff ;)
 return {
-  errors: exeContext.errors,
+  errors: exeContext.errors.slice(),
   data,
 }

Shouldn’t we fix this behavior for all onError modes?

Yes. Though it doesn't exist in onError: NULL mode, and onError: ABORT mode just throws the first error no matter where it occurs, so doesn't need solving. What we should do though is solve it in the main, default mode, that everyone uses by default, and will probably continue to do so for the next 3-10 years whilst we slowly transition people to onError: NULL.

I say it's not worth the tradeoff.

If it turns out to be not too challenging to implement, I think it's worthwhile because it helps people move towards the new way of doing things even before they can move their server. Easing the adoption story is really important IMO, in general we need the frontend engineers to put pressure on the backend engineers, and if they aren't going to get the same behavior after this change as before it makes it challenging to build a convincing demo.

What makes a really convincing demo is:

try {
  return renderData(data);
} catch (e) {
  return renderError(e.code ?? 'E_UNKNOWN');
}

"Look - those errors you've been carefully throwing - they're useful now!"

We can't do this if the wrong error is thrown - we're going to render the wrong error message and mislead users.

If you throw an aggregate error then you're going to need to do something like:

/**
 * List of relevant error codes to this action; the last entry in this list is
 * the highest priority match (since multiple errors may occur, and we don't know
 * which of those is specifically the one that caused this blow up due to
 * limitations in GraphQL's error handling).
 */
const KNOWN_ERRORS = [
  'E_FORBIDDEN',
  'E_RATE_LIMIT',
  'E_SERVICE_UNAVAILABLE',
];

try {
  return renderData(data);
} catch (e) {
  const codes = [];
  if (e.code) {
    codes.push(e.code);
  }
  if (e.errors) {
    // Aggregate error; look at the underlying errors
    for (const err of e.errors) {
      if (e.code) {
        codes.push(err.code);
      }
    }
  }
  codes.sort((a, z) => {
    return KNOWN_ERRORS.indexOf(z) - KNOWN_ERRORS.indexOf(a);
  });
  const code = codes[0];
  return renderError(code ?? "E_UNKNOWN");
}

and even then, you might be rendering the wrong code (e.g. a forbidden occurred inside one of the nullable fields, but that's not what caused the entire thing to blow up, so rendering "Forbidden" would be unexpected).

benjie · 2025-07-22T08:31:41Z

I think we have a related issue from the other side of the fence to think about too, have raised a related discussion here:

Indicating likely error positions / which error to render? nullability-wg#112

(The above relates to onError: NULL so is out of scope for this current discussion.)

yaacovCR · 2025-07-22T12:55:44Z

If it turns out to be not too challenging to implement,

Not adding sibling errors after propagation is implemented in graphql/graphql-js#4458 => it was not so challenging!

Fine grained cancellation of cancellable async work in javascript is more challenging! Ideally, as soon as any async resolver for a non-null field errors, all of its cancellable async sibling resolvers will immediately stop all work. This means passing an abortSignal to those async resolvers so they can stop work immediately, but requires creating a new abort controller per parent.

shouldn't checking for cancellation be reading a single per-field flag? Or is this what is actually slow?

Yeah, I think abort controllers are excessive for this, it just needs a boolean somewhere related to the path that you can say wasCancelled=true and if so you stop accumulating.

If we don't want to actually pass the abortSignal to the resolver, we can use the trick @benjie is saying here, i.e. within part of our completion chain, we can check the entire path of every field to make sure a parent has not been nulled. We have so far opted not to do that (in the name of performance/simplicity) but just doing this same thing for one global value, i.e. operation completion. [Although I am not sure I have actually tested the performance differential to checking the entire path, if I do I can report back.] The background of my thinking is that the goal would be to get a performant-enough fine grained abortSignal option working, so I have not moved forward on additional work around this in the hopes that would materialize.

I hope this is somewhat clearer.

Sibling errors should not be added after propagation

e199ebb

benjie added the 💭 Strawman (RFC 0) RFC Stage 0 (See CONTRIBUTING.md) label Jul 10, 2025

yaacovCR mentioned this pull request Jul 15, 2025

Sibling errors should not be added after propagation graphql/graphql-js#4458

Open

martinbonnin approved these changes Jul 17, 2025

View reviewed changes

yaacovCR mentioned this pull request Jul 17, 2025

Fix "response position" definition; clarify sibling errors on propagation #1183

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sibling errors should not be added after propagation #1184

Sibling errors should not be added after propagation #1184

Uh oh!

benjie commented Jul 10, 2025 •

edited

Loading

Uh oh!

yaacovCR commented Jul 15, 2025

Uh oh!

martinbonnin left a comment •

edited

Loading

Uh oh!

martinbonnin commented Jul 18, 2025

Uh oh!

yaacovCR commented Jul 18, 2025

Uh oh!

yaacovCR commented Jul 18, 2025

Uh oh!

martinbonnin commented Jul 18, 2025

Uh oh!

yaacovCR commented Jul 18, 2025

Uh oh!

martinbonnin commented Jul 18, 2025

Uh oh!

benjie commented Jul 22, 2025

Uh oh!

benjie commented Jul 22, 2025

Uh oh!

yaacovCR commented Jul 22, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Sibling errors should not be added after propagation #1184

Are you sure you want to change the base?

Sibling errors should not be added after propagation #1184

Uh oh!

Conversation

benjie commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yaacovCR commented Jul 15, 2025

Uh oh!

martinbonnin left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martinbonnin commented Jul 18, 2025

Uh oh!

yaacovCR commented Jul 18, 2025

Uh oh!

yaacovCR commented Jul 18, 2025

Uh oh!

martinbonnin commented Jul 18, 2025

Uh oh!

yaacovCR commented Jul 18, 2025

Uh oh!

martinbonnin commented Jul 18, 2025

Uh oh!

benjie commented Jul 22, 2025

Uh oh!

benjie commented Jul 22, 2025

Uh oh!

yaacovCR commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

benjie commented Jul 10, 2025 •

edited

Loading

martinbonnin left a comment •

edited

Loading

yaacovCR commented Jul 22, 2025 •

edited

Loading