-
-
Notifications
You must be signed in to change notification settings - Fork 5
Enable safe recursion with trampoline #54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
This is just a proof of concept. Most probably I'll rework it quite a lot before it's ready. |
|
This currently is blocked by an issue that possibly needs fixing elsewhere: When you queue a lot of work items concurrently (for example with Task.Yield() in all the "for in ..." tests here), the following starts to throw in resumable code: if sm.Data.Finished then
if not sm.Data.Finished then failwith "corrupted state machine"I'll try to make a minimal repro later. This is probably the same as dotnet/fsharp#18853, with the difference being TaskSeq uses a reference type for state machine Data, hence the null there and zeroed Data struct here. |
Yeah, the bug is in my code after all :D The thing is, |
|
Ok, major things are solved and this seems to be holding up when I test it locally. I'll try to run some benchmarks later. |
|
AsyncCompletion benchmark master
trampoline, this PR
|
|
Now there is yet another problem. The general idea how this PR works: This generally works well within a computation where tasks are started when bound (recursive Consider let rec foo n =
async {
if n = 0 then
return 42
else
let x = foo (n - 1) |> Async.RunSynchronously
return x
}
foo 10_000 |> Async.RunSynchronouslythis will SO because each Async.RunSynchronously starts on a new trampoline. There's nothing we can do about it and it's ok. However with this PR, let rec foo n =
coldTask {
if n = 0 then
return 42
else
let x = foo (n - 1) () |> _.Result
return x
}
foo 10_000 () |> _.ResultWill hang instead. This is bad. The issue is, given OTOH I can easily make a type Async2<'T>(start: unit -> Task<'T>) =
member _.Start() =
Trampoline.PushNewTrampoline()
start()
member _.StartBound() = start()then in the builder: member inline _.Source(code: Async2<_>) = code.StartBound() |> _.GetAwaiter()and it will work just as async in the same scenario: let rec foo n =
async2 {
if n = 0 then
return 42
else
let x = foo (n - 1) |> _.Start() |> _.Result
return x
}
foo2 10_000 |> Async2.runSo, the big TODO now is come up with some way of doing this with tasks / cold tasks. This seems doable for the cold task family at least without changes to the API. |
So I removed hot tasks from the picture. This is currently only for the cold start variants. |
|
Unfortunatelly I can't find a clean way to make it work with |
It should be similar to ColtTasks. There was a lot of shared code between the various builders, and was moved into CancellableTaskBuilderBase. I didn't use the alias here for some reason. IcedTasks/src/IcedTasks/CancellableTaskBuilderBase.fs Lines 735 to 738 in 5af1568
|
| if n = 0 then | ||
| return false | ||
| else | ||
| return! evenC (n - 1) CancellationToken.None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it seems, this won't work for dissimilar CE's involving cancellableTask like here. The problem is, the function call CancellationToken -> Task<'t> happens before the resulting Task<'t> is passed to Source, so there is no way to intercept it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, you would have to do return! fun () -> evenC (n - 1) CancellationToken.None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonder if we can get an LLM to write an analyzer for checking if returning tasks in a TCO spot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:) OTOH this is not a big deal, there just should be clear documentation that this does not work for hot tasks, and in the test we effectively bind a hot task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a pity this could work for all tasks, hot and cold, if not for a possibility of sync over async antipattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:) OTOH this is not a big deal, there just should be clear documentation that this does not work for hot tasks, and in the test we effectively bind a hot task.
Feels like such an easy footgun though 😢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a pity this could work for all tasks, hot and cold, if not for a possibility of sync over async antipattern.
What do you mean here. Im familiar with the pattern but not why it applies here. Is this something we could handle but if someone calls .Result its their problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's what I'm fighting for almost a week.
In practice, we could just count tasks started synchronously on the thread and bounce every 50th or so, and we're safe from SO.
But if someone calls _.Result on a bounced task (from the inside of the CE), it's a instant dead lock. Because the trampoline is synchronous, it's not unlike synchronization context dead lock, I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like “sync over async” pattern is documented and taught enough that we rely on that footgun being user error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I feel it happens in the wild a lot, in non-essential places like tests for example. If this started dead locking it would break a lot of things.
|
BTW i reduced the bind limit from 100 to 50, probably that's why the MacOS leg passes now. |
Is this something we need to be a bit more dynamic about or allow someone to set? I know you can set the stack size as an environment variable. Doubt we can handle if someone uses |
Classic The internal I think for this PR eventually this could be configurable by an utility function, the whole thing could be also opt in, if there is no good solution for the problematic edge cases. As an aside, there's also a mistake in this PR currently that I need to fix/revert, the limit should be checked each time there's a possible yield, not just once for each started task. |
|
Bind limit of 300 do pass the tests on Windows, but IIUC, Macs usually have a shorter stack. This is also very much dependent on the complexity of the compiled state machine. Another open question is how to do a seamless tail-call hand over with |
|
I'm not giving up on this. I think this could work in general. I have a few more ideas I need to try and some improvements I need to integrate, for example a working prototype of |
Proposed Changes
Use synchronous single thread trampolines whenever recursion is deep enough.
Cache exceptions in a
ExceptionDispatchInfoto cut short deep stack traces and enable fast recovery from deep call stacks.This builds and the test passes without SO.
Types of changes
Checklist
Many todos here:
SetResult/SetExceptionto make this work for old .NET FrameworkFurther comments
If this is a relatively large or complex change, kick off the discussion by explaining why you chose the solution you did and what alternatives you considered, etc...