Skip to content

Conversation

@pokey
Copy link
Contributor

@pokey pokey commented Oct 16, 2025

Why?

In order to support data analysis using the new v1 agent / middleware tooling, we need to be able to do the following:

  • Upload user files to a sandboxed code execution environment
  • Allow the model to run multiple steps of code in a persistent environment
  • Download any files generated by the model during code execution

What?

This PR adds code execution middleware, which does the following:

  • Exposes the tools necessary to run code in a sandboxed container
  • Makes it easy to upload files to the container
  • Automatically extracts files generated by the model
  • Manages sandbox lifecycle, which entails the following:
    • Creating a new container if one doesn't exist and uploading the necessary files to it
    • Reusing an existing container across steps if one does exist
    • Handling container expiration and re-creation
    • Detecting and extracting files generated by the model both for user access and for re-uploading to a new container if the old one expires

Currently, the middleware is implemented to be agnostic to both the sandbox provider and the file storage mechanism.

A sandbox provider must implement the ContainerProvider interface, which exposes a list of tools, as well as methods for creating containers, uploading files, modifying requests and extracting generated files. This PR includes an implementation for both Anthropic and OpenAI code execution environments.

A file storage mechanism must implement the FileProvider interface, which exposes methods for adding and retrieving files.

What's missing?

There are a few things that need to be discussed and/or implemented before this is ready to merge:

The ContainerProvider abstraction

As part of implementing OpenAI and Anthropic container providers, I realized that the ContainerProvider abstraction doesn't quite fit either service perfectly.

There is a lot of machinery in the ContainerProvider interface that is only relevant to Anthropic, because Anthropic doesn't have a first-class container api; they're created automatically when the LLM calls the Anthropic server-side code execution tool, and file upload happens via special user messages. This means that the ContainerProvider interface has methods like startContainer and uploadFileToContainer, which are more or less no-ops for the Anthropic provider, and on the other hand it has methods like modifyModelRequest, which will probably be no-ops for many container providers.

On the other hand, OpenAI has a first-class container API, but their container expiration model is based on inactivity, which means I had to do some hacking because the ContainerProvider interface assumes that containers expire after a fixed amount of time.

All of these knobs also mean that the codeExecution middleware is more complex than it needs to be for either provider, because it has to handle all of these different cases.

File system middleware

Much of the machinery in this PR is very similar to the proposed deep agents filesystem middleware. We need to understand what, if any, the relationship is between the two, and whether we can consolidate them in any way.

One possibility to consider to solve both this and the ContainerProvider abstraction issues is to let the file system middleware keep track of files, and then remove the generic codeExecution middleware entirely in favor of directly implementing Anthropic and OpenAI code execution middleware that uses the file system middleware to manage files. We could build some utility functions to take care of keeping track of which files have been uploaded to the container, depending on how much code duplication we see.

The FileProvider abstraction

This can probably be replaced by BaseStore, and if not passed in, we should probably just store the file contents directly on the graph, the way we do for deep agents filesystem middleware. But as mentioned above, this problem might go away entirely if we can consolidate with the filesystem middleware.

Hiding container providers

In order to give us flexibility to implement the approach above of creating specialized code execution middleware for each provider, we might want to hide the ContainerProvider interface and implementations from the public API. We could keep today's code, and just do something like the following:

export const codeExecutionMiddlewareAnthropic = (fileProvider) =>
  codeExecutionMiddleware(new AnthropicContainerProvider(), fileProvider);

// ...then clients do:

const middleware = codeExecutionMiddlewareAnthropic(new MemoryFileProvider());

That would still bind us to the FileProvider abstraction, so we'd need to think about that too.

Circular dependencies

This PR has the same circular dependency issues that we tried to address when moving prompt caching middleware into anthropic provider, which we still don't have a good solution for

Tests for container re-creation

There are no tests for the machinery that re-creates containers when they expire. That should probably be a unit test as it would be hard to control in an integration test

Issues with path handling for Anthropic container provider

In the codeExecutionExtraTools example, we occasionally see issues where the model generates paths that don't match the actual paths of files in the container, e.g. generating /tmp/file.txt instead of file.txt. We need to investigate this further and see if there's a way to make this more robust. See eg https://smith.langchain.com/public/79d05f34-2924-4707-b0ae-73b8da3d9a9d/r

Note that this is not a problem with simpler examples that don't use the extra tools, because in those cases the anthropic code execution tools are smart enough to use the right paths.

Extra

We have a draft PR to add a Daytona container provider to see if it works well with this abstraction. It was written mostly by Claude code. I took a chance to review it (see inline comments on that PR), and while it needs work, it does seem like the abstraction works well enough for it, though I would be curious to see if it would simplify if we just custom-built the middleware like proposed for openai and anthropic above. Fwiw in reality, we'd probably not support Daytona first-class, but if we can simplify that example enough it could go into docs, and the experiment was more to understand how the container provider abstraction works for a third-party container provider

@changeset-bot
Copy link

changeset-bot bot commented Oct 16, 2025

⚠️ No Changeset found

Latest commit: 8eff2cb

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@zhoushaw
Copy link

@pokey Any progress on this? This feature would be super useful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants