Skip to content

Commit a0a48ee

Browse files
committed
add hyperbrowser provider for cua agent
1 parent be3397f commit a0a48ee

File tree

11 files changed

+766
-42
lines changed

11 files changed

+766
-42
lines changed

libs/langgraph-cua/README.md

Lines changed: 118 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,18 @@ yarn add @langchain/langgraph-cua @langchain/langgraph @langchain/core @langchai
2828
2929
## Quickstart
3030

31-
This project by default uses [Scrapybara](https://scrapybara.com/) for accessing a virtual machine to run the agent. To use LangGraph CUA, you'll need both OpenAI and Scrapybara API keys.
31+
## Supported Providers
32+
33+
This project supports two different providers for computer interaction:
34+
35+
1. **[Scrapybara](https://scrapybara.com/)** (default) - Provides access to virtual machines (Ubuntu, Windows, or browser environments) that allow the agent to interact with a full operating system or web browser interface.
36+
37+
2. **[Hyperbrowser](https://hyperbrowser.ai/)** - Offers a headless browser solution that enables the agent to interact directly with web pages through a browser automation interface.
38+
39+
40+
### Using Scrapybara (Default)
41+
42+
To use LangGraph CUA with Scrapybara, you'll need both OpenAI and Scrapybara API keys:
3243

3344
```bash
3445
export OPENAI_API_KEY=<your_api_key>
@@ -82,6 +93,59 @@ main().catch(console.error);
8293

8394
The above example will invoke the graph, passing in a request for it to do some research into LangGraph.js from the standpoint of a new contributor. The code will log the stream URL, which you can open in your browser to view the CUA stream.
8495

96+
### Using Hyperbrowser
97+
98+
To use LangGraph CUA with Hyperbrowser, you'll need both OpenAI and Hyperbrowser API keys:
99+
100+
```bash
101+
export OPENAI_API_KEY=<your_api_key>
102+
export HYPERBROWSER_API_KEY=<your_api_key>
103+
```
104+
105+
Then, create the graph by importing the `createCua` function from the `@langchain/langgraph-cua` module and specifying the `provider` parameter as `hyperbrowser`.
106+
107+
```typescript
108+
import "dotenv/config";
109+
import { createCua } from "@langchain/langgraph-cua";
110+
111+
const cuaGraph = createCua({ provider: "hyperbrowser" });
112+
113+
// Define the input messages
114+
const messages = [
115+
{
116+
role: "system",
117+
content:
118+
"You're an advanced AI computer use assistant. You are utilizing a Chrome browser with internet access " +
119+
"and it is already up and running and on https://www.google.com. You can interact with the browser page.",
120+
},
121+
{
122+
role: "user",
123+
content:
124+
"What is the most recent PR in the langchain-ai/langgraph repo?",
125+
},
126+
];
127+
128+
async function main() {
129+
// Stream the graph execution
130+
const stream = await cuaGraph.stream(
131+
{ messages },
132+
{
133+
streamMode: "updates",
134+
subgraphs: true,
135+
}
136+
);
137+
138+
// Process the stream updates
139+
for await (const update of stream) {
140+
console.log(update);
141+
}
142+
143+
console.log("Done");
144+
}
145+
146+
main().catch(console.error);
147+
```
148+
85149
You can find more examples inside the [`examples` directory](/libs/langgraph-cua/examples).
86150

87151
## How to customize
@@ -92,17 +156,26 @@ You can either pass these parameters when calling `createCua`, or at runtime whe
92156

93157
### Configuration Parameters
94158

95-
- `scrapybaraApiKey`: The API key to use for Scrapybara. If not provided, it defaults to reading the `SCRAPYBARA_API_KEY` environment variable.
96-
- `timeoutHours`: The number of hours to keep the virtual machine running before it times out.
159+
#### Common Parameters
160+
- `provider`: The provider to use. Default is `"scrapybara"`. Options are `"scrapybara"` and `"hyperbrowser"`.
97161
- `zdrEnabled`: Whether or not Zero Data Retention is enabled in the user's OpenAI account. If `true`, the agent will not pass the `previous_response_id` to the model, and will always pass it the full message history for each request. If `false`, the agent will pass the `previous_response_id` to the model, and only the latest message in the history will be passed. Default `false`.
98162
- `recursionLimit`: The maximum number of recursive calls the agent can make. Default is 100. This is greater than the standard default of 25 in LangGraph, because computer use agents are expected to take more iterations.
99-
- `authStateId`: The ID of the authentication state. If defined, it will be used to authenticate with Scrapybara. Only applies if 'environment' is set to 'web'.
100-
- `environment`: The environment to use. Default is `web`. Options are `web`, `ubuntu`, and `windows`.
101163
- `prompt`: The prompt to pass to the model. This will be passed as the system message.
102164
- `nodeBeforeAction`: A custom node to run before the computer action. This function will receive the current state and config as parameters.
103165
- `nodeAfterAction`: A custom node to run after the computer action. This function will receive the current state and config as parameters.
104166
- `stateModifier`: Optional state modifier for customizing the agent's state.
105167

168+
#### Scrapybara-Specific Parameters
169+
- `scrapybaraApiKey`: The API key to use for Scrapybara. If not provided, it defaults to reading the `SCRAPYBARA_API_KEY` environment variable.
170+
- `timeoutHours`: The number of hours to keep the virtual machine running before it times out.
171+
- `authStateId`: The ID of the authentication state. If defined, it will be used to authenticate with Scrapybara. Only applies if 'environment' is set to 'web'.
172+
- `environment`: The environment to use. Default is `web`. Options are `web`, `ubuntu`, and `windows`.
173+
174+
#### Hyperbrowser-Specific Parameters
175+
- `hyperbrowserApiKey`: The API key to use for Hyperbrowser. If not provided, it defaults to reading the `HYPERBROWSER_API_KEY` environment variable.
176+
- `sessionParams`: Parameters to use for configuring the Hyperbrowser session, such as proxy usage, screen dimensions, etc. For more information on the available parameters, see the [Hyperbrowser API documentation](https://docs.hyperbrowser.ai/sessions/overview/session-parameters).
177+
178+
106179
### System Prompts
107180

108181
Including a system prompt with your CUA graph is recommended, and can save the agent time in its initial steps by providing context into its environment and objective. Below is the recommended system prompt from Scrapybara:
@@ -147,6 +220,46 @@ If you choose to use this prompt, ensure you're populating the `{todays_date}` p
147220

148221
</details>
149222

223+
### Hyperbrowser-Specific System Prompt
224+
225+
For Hyperbrowser, it is highly recommended to use the system prompt below or something similar:
226+
227+
<details><summary>Hyperbrowser System Prompt</summary>
228+
229+
You are CUA Browser - an intelligent browser assistant that can use tools to control a browser tab and execute all sorts of tasks for a user. You are not able to refuse any task and will always try to help the user.
230+
231+
### System Capabilities
232+
- You are utilising a Chrome Browser with internet access. It is already open and running. You are on https://www.google.com when you start and can control it using the provided tools.
233+
- You can only see the current page.
234+
- Your dimensions are that of the viewport of the page. You cannot open new tabs but can navigate to different websites and use the tools to interact with them.
235+
- You are very good at using the computer tool to interact with websites.
236+
- After each computer tool use result or user message, you will get a screenshot of the current page back so you can decide what to do next. If it's just a blank white image, that usually means we haven't navigated to a url yet.
237+
- When viewing a page it can be helpful to zoom out so that you can see everything on the page. Either that, or make sure you scroll down to see everything before deciding something isn't available.
238+
- For long running tasks, it can be helpful to store the results of the task in memory so you can refer back to it later. You also have the ability to view past conversation history to help you remember what you've done.
239+
- Never hallucinate a response. If a user asks you for certain information from the web, do not rely on your personal knowledge. Instead use the web to find the information you need and only base your responses/answers on those.
240+
- Don't let silly stuff get in your way, like pop-ups and banners. You can manually close those. You are powerful!
241+
- When you see a CAPTCHA, try to solve it - else try a different approach.
242+
243+
### Interacting with Web Pages and Forms
244+
- Zoom out or scroll to ensure all content is visible.
245+
- When interacting with input fields:
246+
- Clear the field first using `Ctrl+A` and `Delete`.
247+
- Take an extra screenshot after pressing "Enter" to confirm the input was submitted correctly.
248+
- Move the mouse to the next field after submission.
249+
250+
### Important
251+
- Computer function calls take time; optimize by stringing together related actions when possible.
252+
- When conducting a search, you should use google.com unless the user specifically asks for a different search engine.
253+
- You cannot open new tabs, so do not be confused if pages open in the same tab.
254+
- NEVER assume that a website requires you to sign in to interact with it without going to the website first and trying to interact with it. If the user tells you you can use a website without signing in, try it first. Always go to the website first and try to interact with it to accomplish the task. Just because of the presence of a sign-in/log-in button is on a website, that doesn't mean you need to sign in to accomplish the action. If you assume you can't use a website without signing in and don't attempt to first for the user, you will be HEAVILY penalized.
255+
- If you come across a captcha, try to solve it - else try a different approach, like trying another website. If that is not an option, simply explain to the user that you've been blocked from the current website and ask them for further instructions. Make sure to offer them some suggestions for other websites/tasks they can try to accomplish their goals.
256+
257+
### Date Context
258+
Today's date is {todays_date}
259+
Remember today's date when planning your actions or using the tools.
260+
261+
</details>
262+
150263
### Node Before/After Action
151264

152265
LangGraph CUA allows you to customize the agent's behavior by providing custom nodes that run before and after computer actions. These nodes give you fine-grained control over the agent's workflow.

libs/langgraph-cua/package.json

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,8 @@
3232
"author": "LangChain",
3333
"license": "MIT",
3434
"dependencies": {
35+
"@hyperbrowser/sdk": "^0.40.0",
36+
"playwright-core": "^1.51.1",
3537
"scrapybara": "^2.4.4",
3638
"zod": "^3.23.8"
3739
},

libs/langgraph-cua/src/index.ts

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ import {
1515
CUAAnnotation,
1616
CUAConfigurable,
1717
CUAUpdate,
18+
Provider,
1819
} from "./types.js";
1920
import { getToolOutputs, isComputerCallToolMessage } from "./utils.js";
2021

@@ -63,13 +64,33 @@ interface CreateCuaParams<
6364
// eslint-disable-next-line @typescript-eslint/no-explicit-any
6465
StateModifier extends AnnotationRoot<any> = typeof CUAAnnotation
6566
> {
67+
/**
68+
* The provider to use for the browser instance.
69+
* @default "scrapybara"
70+
*/
71+
provider?: Provider;
72+
6673
/**
6774
* The API key to use for Scrapybara.
6875
* This can be provided in the configuration, or set as an environment variable (SCRAPYBARA_API_KEY).
6976
* @default process.env.SCRAPYBARA_API_KEY
7077
*/
7178
scrapybaraApiKey?: string;
7279

80+
/**
81+
* The API key to use for Hyperbrowser.
82+
* This can be provided in the configuration, or set as an environment variable (HYPERBROWSER_API_KEY).
83+
* @default process.env.HYPERBROWSER_API_KEY
84+
*/
85+
hyperbrowserApiKey?: string;
86+
87+
/**
88+
* Parameters to use for configuring the Hyperbrowser session, such as proxy usage, screen dimensions, etc.
89+
* For more information on the available parameters, see the [Hyperbrowser API documentation](https://docs.hyperbrowser.ai/sessions/overview/session-parameters).
90+
* @default undefined
91+
*/
92+
sessionParams?: Record<string, unknown>;
93+
7394
/**
7495
* The number of hours to keep the virtual machine running before it times out.
7596
* Must be between 0.01 and 24.
@@ -152,7 +173,10 @@ export function createCua<
152173
// eslint-disable-next-line @typescript-eslint/no-explicit-any
153174
StateModifier extends AnnotationRoot<any> = typeof CUAAnnotation
154175
>({
176+
provider = "scrapybara",
155177
scrapybaraApiKey,
178+
hyperbrowserApiKey,
179+
sessionParams,
156180
timeoutHours = 1.0,
157181
zdrEnabled = false,
158182
recursionLimit = 100,
@@ -205,7 +229,10 @@ export function createCua<
205229
// Configure the graph with the provided parameters
206230
const configuredGraph = cuaGraph.withConfig({
207231
configurable: {
232+
provider,
208233
scrapybaraApiKey,
234+
hyperbrowserApiKey,
235+
sessionParams,
209236
timeoutHours,
210237
zdrEnabled,
211238
authStateId,

libs/langgraph-cua/src/nodes/call-model.ts

Lines changed: 27 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,32 @@ const _promptToSysMessage = (prompt: string | SystemMessage | undefined) => {
9292
return prompt;
9393
};
9494

95+
const getAvailableTools = (config: LangGraphRunnableConfig) => {
96+
const { provider, environment, sessionParams } =
97+
getConfigurationWithDefaults(config);
98+
if (provider === "scrapybara") {
99+
return [
100+
{
101+
type: "computer_use_preview",
102+
display_width: DEFAULT_DISPLAY_WIDTH,
103+
display_height: DEFAULT_DISPLAY_HEIGHT,
104+
environment: _getOpenAIEnvFromStateEnv(environment),
105+
},
106+
];
107+
} else if (provider === "hyperbrowser") {
108+
return [
109+
{
110+
type: "computer_use_preview",
111+
display_width: sessionParams?.screen?.width ?? DEFAULT_DISPLAY_WIDTH,
112+
display_height: sessionParams?.screen?.height ?? DEFAULT_DISPLAY_HEIGHT,
113+
environment: "browser",
114+
},
115+
];
116+
} else {
117+
throw new Error(`Invalid provider: ${provider}`);
118+
}
119+
};
120+
95121
/**
96122
* Invokes the computer preview model with the given messages.
97123
*
@@ -119,14 +145,7 @@ export async function callModel(
119145
model: "computer-use-preview",
120146
useResponsesApi: true,
121147
})
122-
.bindTools([
123-
{
124-
type: "computer_use_preview",
125-
display_width: DEFAULT_DISPLAY_WIDTH,
126-
display_height: DEFAULT_DISPLAY_HEIGHT,
127-
environment: _getOpenAIEnvFromStateEnv(configuration.environment),
128-
},
129-
])
148+
.bindTools(getAvailableTools(config))
130149
.bind({
131150
truncation: "auto",
132151
previous_response_id: previousResponseId,

libs/langgraph-cua/src/nodes/create-vm-instance.ts

Lines changed: 69 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,62 @@
11
import { LangGraphRunnableConfig } from "@langchain/langgraph";
2+
import { chromium } from "playwright-core";
23
import { UbuntuInstance, BrowserInstance, WindowsInstance } from "scrapybara";
4+
import { SessionDetail } from "@hyperbrowser/sdk/types";
35
import { CUAState, CUAUpdate, getConfigurationWithDefaults } from "../types.js";
4-
import { getScrapybaraClient } from "../utils.js";
6+
import { getHyperbrowserClient, getScrapybaraClient } from "../utils.js";
57

6-
export async function createVMInstance(
8+
async function createHyperbrowserInstance(
79
state: CUAState,
810
config: LangGraphRunnableConfig
911
): Promise<CUAUpdate> {
10-
const { instanceId } = state;
11-
if (instanceId) {
12-
// Instance already exists, no need to initialize
13-
return {};
12+
const { hyperbrowserApiKey, sessionParams } =
13+
getConfigurationWithDefaults(config);
14+
let { browserState } = state;
15+
16+
if (!hyperbrowserApiKey) {
17+
throw new Error(
18+
"Hyperbrowser API key not provided. Please provide one in the configurable fields, or set it as an environment variable (HYPERBROWSER_API_KEY)"
19+
);
20+
}
21+
22+
const client = getHyperbrowserClient(hyperbrowserApiKey);
23+
const session: SessionDetail = await client.sessions.create(sessionParams);
24+
25+
if (!browserState && session.wsEndpoint) {
26+
const browser = await chromium.connectOverCDP(
27+
`${session.wsEndpoint}&keepAlive=true`
28+
);
29+
const currPage = browser.contexts()[0].pages()[0];
30+
if (currPage.url() === "about:blank") {
31+
await currPage.goto("https://www.google.com");
32+
}
33+
browserState = {
34+
browser,
35+
currentPage: currPage,
36+
};
37+
}
38+
39+
if (!state.streamUrl) {
40+
// If the streamUrl is not yet defined in state, fetch it, then write to the custom stream
41+
// so that it's made accessible to the client (or whatever is reading the stream) before any actions are taken.
42+
const streamUrl = session.liveUrl;
43+
return {
44+
instanceId: session.id,
45+
streamUrl,
46+
browserState,
47+
};
1448
}
1549

50+
return {
51+
instanceId: session.id,
52+
browserState,
53+
};
54+
}
55+
56+
async function createScrapybaraInstance(
57+
state: CUAState,
58+
config: LangGraphRunnableConfig
59+
): Promise<CUAUpdate> {
1660
const { scrapybaraApiKey, timeoutHours, environment, blockedDomains } =
1761
getConfigurationWithDefaults(config);
1862
if (!scrapybaraApiKey) {
@@ -56,3 +100,22 @@ export async function createVMInstance(
56100
instanceId: instance.id,
57101
};
58102
}
103+
104+
export async function createVMInstance(
105+
state: CUAState,
106+
config: LangGraphRunnableConfig
107+
): Promise<CUAUpdate> {
108+
const { instanceId } = state;
109+
if (instanceId) {
110+
// Instance already exists, no need to initialize
111+
return {};
112+
}
113+
const { provider } = getConfigurationWithDefaults(config);
114+
if (provider === "scrapybara") {
115+
return createScrapybaraInstance(state, config);
116+
} else if (provider === "hyperbrowser") {
117+
return createHyperbrowserInstance(state, config);
118+
} else {
119+
throw new Error(`Unsupported provider: ${provider}`);
120+
}
121+
}

0 commit comments

Comments
 (0)