feat(createAgent): Add code execution middleware #9209
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why?
In order to support data analysis using the new v1 agent / middleware tooling, we need to be able to do the following:
What?
This PR adds code execution middleware, which does the following:
Currently, the middleware is implemented to be agnostic to both the sandbox provider and the file storage mechanism.
A sandbox provider must implement the
ContainerProviderinterface, which exposes a list of tools, as well as methods for creating containers, uploading files, modifying requests and extracting generated files. This PR includes an implementation for both Anthropic and OpenAI code execution environments.A file storage mechanism must implement the
FileProviderinterface, which exposes methods for adding and retrieving files.What's missing?
There are a few things that need to be discussed and/or implemented before this is ready to merge:
The
ContainerProviderabstractionAs part of implementing OpenAI and Anthropic container providers, I realized that the
ContainerProviderabstraction doesn't quite fit either service perfectly.There is a lot of machinery in the
ContainerProviderinterface that is only relevant to Anthropic, because Anthropic doesn't have a first-class container api; they're created automatically when the LLM calls the Anthropic server-side code execution tool, and file upload happens via special user messages. This means that theContainerProviderinterface has methods likestartContaineranduploadFileToContainer, which are more or less no-ops for the Anthropic provider, and on the other hand it has methods likemodifyModelRequest, which will probably be no-ops for many container providers.On the other hand, OpenAI has a first-class container API, but their container expiration model is based on inactivity, which means I had to do some hacking because the
ContainerProviderinterface assumes that containers expire after a fixed amount of time.All of these knobs also mean that the
codeExecutionmiddleware is more complex than it needs to be for either provider, because it has to handle all of these different cases.File system middleware
Much of the machinery in this PR is very similar to the proposed deep agents filesystem middleware. We need to understand what, if any, the relationship is between the two, and whether we can consolidate them in any way.
One possibility to consider to solve both this and the
ContainerProviderabstraction issues is to let the file system middleware keep track of files, and then remove the genericcodeExecutionmiddleware entirely in favor of directly implementing Anthropic and OpenAI code execution middleware that uses the file system middleware to manage files. We could build some utility functions to take care of keeping track of which files have been uploaded to the container, depending on how much code duplication we see.The
FileProviderabstractionThis can probably be replaced by
BaseStore, and if not passed in, we should probably just store the file contents directly on the graph, the way we do for deep agents filesystem middleware. But as mentioned above, this problem might go away entirely if we can consolidate with the filesystem middleware.Hiding container providers
In order to give us flexibility to implement the approach above of creating specialized code execution middleware for each provider, we might want to hide the
ContainerProviderinterface and implementations from the public API. We could keep today's code, and just do something like the following:That would still bind us to the
FileProviderabstraction, so we'd need to think about that too.Circular dependencies
This PR has the same circular dependency issues that we tried to address when moving prompt caching middleware into anthropic provider, which we still don't have a good solution for
Tests for container re-creation
There are no tests for the machinery that re-creates containers when they expire. That should probably be a unit test as it would be hard to control in an integration test
Issues with path handling for Anthropic container provider
In the
codeExecutionExtraToolsexample, we occasionally see issues where the model generates paths that don't match the actual paths of files in the container, e.g. generating/tmp/file.txtinstead offile.txt. We need to investigate this further and see if there's a way to make this more robust. See eg https://smith.langchain.com/public/79d05f34-2924-4707-b0ae-73b8da3d9a9d/rNote that this is not a problem with simpler examples that don't use the extra tools, because in those cases the anthropic code execution tools are smart enough to use the right paths.
Extra
We have a draft PR to add a Daytona container provider to see if it works well with this abstraction. It was written mostly by Claude code. I took a chance to review it (see inline comments on that PR), and while it needs work, it does seem like the abstraction works well enough for it, though I would be curious to see if it would simplify if we just custom-built the middleware like proposed for openai and anthropic above. Fwiw in reality, we'd probably not support Daytona first-class, but if we can simplify that example enough it could go into docs, and the experiment was more to understand how the container provider abstraction works for a third-party container provider