Fix timing race condition in AwaitAssertAsync causing premature timeouts by Aaronontheweb · Pull Request #7976 · akkadotnet/akka.net

Aaronontheweb · 2025-12-22T21:06:32Z

Summary

Fixed a 6-year-old check-then-act race condition in TestKit's AwaitAssertAsync that could cause tests to timeout prematurely with only 1 retry attempt instead of the expected number of retries within the timeout window.

Problem

The bug was introduced in PR #4075 (Dec 2019) when async API was added to TestKit. The timeout check occurred BEFORE Task.Delay(), creating a timing window where thread scheduling delays, GC pauses, or system load could cause the actual elapsed time to exceed the timeout even though the pre-check indicated a retry should occur.

This manifested as flaky test failures like:

Expected cluster.ReadView.Members.Count(m => m.Status == MemberStatus.Up) to be 1, but found 0.
AwaitAssert failed, timeout [00:00:03] is over after [1] attempts and [00:00:04.6625130] elapsed time

Only 1 attempt in 4.6 seconds instead of ~30 attempts in 3 seconds.

Changes

Move timeout check to AFTER Task.Delay() in both AwaitAssertAsync overloads
Changed boundary condition from >= to > to allow final retry when at exact timeout
Re-run assertion on timeout to propagate the actual exception instead of generic timeout
Added explanatory comments documenting the race condition

Impact

This benefits all tests across the entire Akka.NET suite that use AwaitAssert(), preventing false timeout failures under load.

Testing

Verified that StartEntitySpec.StartEntity_while_the_entity_is_waiting_for_restart_should_restart_it_immediately now passes with the default 3-second timeout (previously required 10 seconds as a workaround).

Fixed a check-then-act race condition in TestKit's AwaitAssertAsync that could cause tests to timeout prematurely with only 1 retry attempt instead of the expected number of retries within the timeout window. The bug was introduced in PR akkadotnet#4075 (Dec 2019) when async API was added to TestKit. The timeout check occurred BEFORE Task.Delay(), creating a timing window where thread scheduling delays, GC pauses, or system load could cause the actual elapsed time to exceed the timeout even though the pre-check indicated a retry should occur. Changes: - Move timeout check to AFTER Task.Delay() in both AwaitAssertAsync overloads - Changed boundary condition from >= to > to allow final retry when at exact timeout - Re-run assertion on timeout to propagate the actual exception instead of generic timeout - Added explanatory comments documenting the race condition This fixes flaky test failures in cluster tests where initialization under load would take longer than expected, causing false timeout failures. Fixes StartEntitySpec.StartEntity_while_the_entity_is_waiting_for_restart_should_restart_it_immediately

Aaronontheweb

Detailed my changes

Aaronontheweb · 2025-12-29T15:28:28Z

src/core/Akka.TestKit/TestKitBase_AwaitAssert.cs

                catch(Exception)
                {
-                    var stopped = Now + t;
-                    if (stopped >= stop)


The problem here is that we don't get 1 last chance to test the assertion if the first attempt failed and took too long

Aaronontheweb · 2025-12-29T15:28:48Z

src/core/Akka.TestKit/TestKitBase_AwaitAssert.cs

                await Task.Delay(t, cancellationToken);
+
+                // Check if we've exceeded the timeout AFTER sleeping
+                if (Now > stop)


Now, after the delay has been completed, even if we're overdue on time we still check one final time on exit.

…he_entity_is_waiting_for_restart_should_restart_it_immediately

Arkatufus

The code smells a bit

Arkatufus · 2025-12-29T16:26:26Z

src/core/Akka.TestKit/TestKitBase_AwaitAssert.cs

+                if (Now > stop)
+                {
+                    Sys.Log.Warning("AwaitAssert failed, timeout [{0}] is over after [{1}] attempts and [{2}] elapsed time", max, attempts, Now - start);
+                    // Re-run the assertion one final time to get the actual exception
+                    await assertion();
+                }


My only problem is that this code logs a fail warning regardless if the extra assertion invocation failed or not.

Yeah we should remove that

There were some other errors with this code too, like if the final assertion succeeded we'd end up running again

…he_entity_is_waiting_for_restart_should_restart_it_immediately

Aaronontheweb added the akka-testkit Akka.NET Testkit issues label Dec 22, 2025

Aaronontheweb force-pushed the claude-wt-StartEntitySpec.StartEntity_while_the_entity_is_waiting_for_restart_should_restart_it_immediately branch from de11f30 to 5a234b6 Compare December 23, 2025 15:54

Aaronontheweb enabled auto-merge (squash) December 23, 2025 15:54

Aaronontheweb commented Dec 29, 2025

View reviewed changes

Merge branch 'dev' into claude-wt-StartEntitySpec.StartEntity_while_t…

09b82d7

…he_entity_is_waiting_for_restart_should_restart_it_immediately

Arkatufus requested changes Dec 29, 2025

View reviewed changes

Aaronontheweb and others added 3 commits December 29, 2025 12:57

fix issues with AwaitAssert

181cf76

Merge branch 'dev' into claude-wt-StartEntitySpec.StartEntity_while_t…

d63e738

…he_entity_is_waiting_for_restart_should_restart_it_immediately

Merge branch 'dev' into claude-wt-StartEntitySpec.StartEntity_while_t…

2c177e2

…he_entity_is_waiting_for_restart_should_restart_it_immediately

Arkatufus mentioned this pull request Jan 9, 2026

Fix AwaitAssertAsync logic causing premature timeout #7986

Merged

Aaronontheweb closed this in #7986 Jan 9, 2026

auto-merge was automatically disabled January 9, 2026 17:23
Pull request was closed

Aaronontheweb deleted the claude-wt-StartEntitySpec.StartEntity_while_the_entity_is_waiting_for_restart_should_restart_it_immediately branch January 9, 2026 17:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Fix timing race condition in AwaitAssertAsync causing premature timeouts#7976

Fix timing race condition in AwaitAssertAsync causing premature timeouts#7976
Aaronontheweb wants to merge 5 commits intoakkadotnet:devfrom
Aaronontheweb:claude-wt-StartEntitySpec.StartEntity_while_the_entity_is_waiting_for_restart_should_restart_it_immediately

Aaronontheweb commented Dec 22, 2025

Uh oh!

Aaronontheweb left a comment

Uh oh!

Aaronontheweb Dec 29, 2025

Uh oh!

Aaronontheweb Dec 29, 2025

Uh oh!

Arkatufus left a comment

Uh oh!

Arkatufus Dec 29, 2025

Uh oh!

Aaronontheweb Dec 29, 2025

Uh oh!

Aaronontheweb Dec 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

Aaronontheweb commented Dec 22, 2025

Summary

Problem

Changes

Impact

Testing

Uh oh!

Aaronontheweb left a comment

Choose a reason for hiding this comment

Uh oh!

Aaronontheweb Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Aaronontheweb Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Arkatufus left a comment

Choose a reason for hiding this comment

Uh oh!

Arkatufus Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Aaronontheweb Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Aaronontheweb Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants