Skip to content

Conversation

@najuna-brian
Copy link

@najuna-brian najuna-brian commented Oct 17, 2025

Description

Fixes #4505

Fixed fatal error propagation in async operations so fatal errors crash the JVM instead of being caught and wrapped.

Changes

kernel/jvm/src/main/scala/cats/effect/kernel/AsyncPlatform.scala: Added fatal error detection in fromCompletableFuture to re-throw fatal errors instead of wrapping them
core/shared/src/main/scala/cats/effect/IO.scala: Removed IO.delay() wrapper in IO.async_ to allow fatal errors to propagate during callback registration

Why both changes?

  • fromCompletableFuture uses CompletableFuture.handle() which catches all exceptions, including fatal ones
  • async_ had IO.delay() wrapper catching fatal errors during callback registration

Testing

  • Added tests for both fromCompletableFuture and async_ fatal error scenarios
  • All tests pass (JVM tests, kernel tests)
  • Eliminates need for .onError(_ => IO.unit) workaround mentioned in the issue

@najuna-brian
Copy link
Author

Hi @armanbilge, @durban

Would you mind taking a look at this when you have a moment?

Thanks!

@najuna-brian najuna-brian marked this pull request as ready for review October 17, 2025 12:35
Copy link
Member

@armanbilge armanbilge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

  • Since this is a bug fix, you can target the series/3.6.x branch
  • Can you add a test for a minimized version of the issue? You may need to add it to IOAppSuite because this deals with fatal errors.

* @param fut
* The `java.util.concurrent.CompletableFuture` to suspend in `F[_]`
*/
def fromCompletableFuture[A](fut: F[CompletableFuture[A]]): F[A] = cont {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are changes to this method necessary? #4505 (comment) suggests that the problem is actually in async and not directly in fromCompletableFuture.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right that async_ is the core issue, but i think both changes are needed:
fromCompletableFuture: Uses CompletableFuture.handle() which catches all exceptions, including fatal ones which without fix, OutOfMemoryError gets wrapped instead of crashing the JVM.

async_: The IO.delay() wrapper was catching fatal errors during callback registration, causing the same problem.
Both mechanisms need the fix to ensure fatal errors consistently crash the JVM.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added more insights on how the error re-surface from onError here: #4505 (comment)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tpetillot

I added more insights on how the error re-surface from onError here: #4505 (comment)

looks like this code (Attached on the comment) is not detecting the original fatal error, but it's actually:

  • Detecting errors thrown BY the error handler (f(error)) but NOT detecting the original fatal error that triggered onError

The fatal error detection happens after the onError machinery, not during it.
I need to be guided if that is right 😊

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, definitely missed that, tried to draw the route of the error outcome for OOM with IO.cont here: #4505 (comment)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about the handling fatal failure (onFatalFailure) in failed to ensure the error is treated properly on consumption?

@najuna-brian najuna-brian changed the base branch from series/3.x to series/3.6.x October 21, 2025 14:43
@najuna-brian najuna-brian force-pushed the fix/fatal-error-propagation branch from 4b68700 to ee0f110 Compare October 21, 2025 15:01
@najuna-brian
Copy link
Author

Since this is a bug fix, you can target the series/3.6.x branch

Thanks for the review and guidance. Just learnt about all these upstream branches now 😊

@najuna-brian
Copy link
Author

Hello @armanbilge
I guess i will have to wait for this PR t be merged and then I can proceed..
Thank you

@armanbilge
Copy link
Member

@najuna-brian #4518 is merged now, thanks for your patience! You can merge the latest series/3.6.x into your branch.

@armanbilge armanbilge closed this Oct 28, 2025
@armanbilge armanbilge reopened this Oct 28, 2025
Copy link
Member

@armanbilge armanbilge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@najuna-brian it looks like one of the tests you added is hanging in the CI. Are you able to run the tests locally?

h.stderr() must contain("Boom from async!")
h.stdout() must not(contain("sadness"))
} else {
// Fatal error testing is JVM-only
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, are you sure? I am pretty sure we support fatal errors on all platforms.

@najuna-brian
Copy link
Author

@najuna-brian it looks like one of the tests you added is hanging in the CI. Are you able to run the tests locally?

Yes true they keep hanging, Though I am not sure of how to solve that.

@najuna-brian
Copy link
Author

@najuna-brian #4518 is merged now, thanks for your patience! You can merge the latest series/3.6.x into your branch.

Thank you @armanbilge
I will please proceed now

@armanbilge
Copy link
Member

Yes true they keep hanging, Though I am not sure of how to solve that.

You should start by identifying which test is hanging (a low-tech way to do this is to comment out tests until you figure out which one it is). Then, once you know which test it is, you can try reverting some of your changes to identify which change you made may cause it to hang.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OutOfMemoryError not propagated when IO originates from CompletableFuture

3 participants