You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This project by default uses [Scrapybara](https://scrapybara.com/) for accessing a virtual machine to run the agent. To use LangGraph CUA, you'll need both OpenAI and Scrapybara API keys.
31
+
## Supported Providers
32
+
33
+
This project supports two different providers for computer interaction:
34
+
35
+
1.**[Scrapybara](https://scrapybara.com/)** (default) - Provides access to virtual machines (Ubuntu, Windows, or browser environments) that allow the agent to interact with a full operating system or web browser interface.
36
+
37
+
2.**[Hyperbrowser](https://hyperbrowser.ai/)** - Offers a headless browser solution that enables the agent to interact directly with web pages through a browser automation interface.
38
+
39
+
40
+
### Using Scrapybara (Default)
41
+
42
+
To use LangGraph CUA with Scrapybara, you'll need both OpenAI and Scrapybara API keys:
32
43
33
44
```bash
34
45
export OPENAI_API_KEY=<your_api_key>
@@ -82,6 +93,59 @@ main().catch(console.error);
82
93
83
94
The above example will invoke the graph, passing in a request for it to do some research into LangGraph.js from the standpoint of a new contributor. The code will log the stream URL, which you can open in your browser to view the CUA stream.
84
95
96
+
### Using Hyperbrowser
97
+
98
+
To use LangGraph CUA with Hyperbrowser, you'll need both OpenAI and Hyperbrowser API keys:
99
+
100
+
```bash
101
+
export OPENAI_API_KEY=<your_api_key>
102
+
export HYPERBROWSER_API_KEY=<your_api_key>
103
+
```
104
+
105
+
Then, create the graph by importing the `createCua` function from the `@langchain/langgraph-cua` module and specifying the `provider` parameter as `hyperbrowser`.
"You're an advanced AI computer use assistant. You are utilizing a Chrome browser with internet access "+
119
+
"and it is already up and running and on https://www.google.com. You can interact with the browser page.",
120
+
},
121
+
{
122
+
role: "user",
123
+
content:
124
+
"What is the most recent PR in the langchain-ai/langgraph repo?",
125
+
},
126
+
];
127
+
128
+
asyncfunction main() {
129
+
// Stream the graph execution
130
+
const stream =awaitcuaGraph.stream(
131
+
{ messages },
132
+
{
133
+
streamMode: "updates",
134
+
subgraphs: true,
135
+
}
136
+
);
137
+
138
+
// Process the stream updates
139
+
forawait (const update ofstream) {
140
+
console.log(update);
141
+
}
142
+
143
+
console.log("Done");
144
+
}
145
+
146
+
main().catch(console.error);
147
+
```
148
+
85
149
You can find more examples inside the [`examples` directory](/libs/langgraph-cua/examples).
86
150
87
151
## How to customize
@@ -92,17 +156,26 @@ You can either pass these parameters when calling `createCua`, or at runtime whe
92
156
93
157
### Configuration Parameters
94
158
95
-
-`scrapybaraApiKey`: The API key to use for Scrapybara. If not provided, it defaults to reading the `SCRAPYBARA_API_KEY` environment variable.
96
-
-`timeoutHours`: The number of hours to keep the virtual machine running before it times out.
159
+
#### Common Parameters
160
+
-`provider`: The provider to use. Default is `"scrapybara"`. Options are `"scrapybara"` and `"hyperbrowser"`.
97
161
-`zdrEnabled`: Whether or not Zero Data Retention is enabled in the user's OpenAI account. If `true`, the agent will not pass the `previous_response_id` to the model, and will always pass it the full message history for each request. If `false`, the agent will pass the `previous_response_id` to the model, and only the latest message in the history will be passed. Default `false`.
98
162
-`recursionLimit`: The maximum number of recursive calls the agent can make. Default is 100. This is greater than the standard default of 25 in LangGraph, because computer use agents are expected to take more iterations.
99
-
-`authStateId`: The ID of the authentication state. If defined, it will be used to authenticate with Scrapybara. Only applies if 'environment' is set to 'web'.
100
-
-`environment`: The environment to use. Default is `web`. Options are `web`, `ubuntu`, and `windows`.
101
163
-`prompt`: The prompt to pass to the model. This will be passed as the system message.
102
164
-`nodeBeforeAction`: A custom node to run before the computer action. This function will receive the current state and config as parameters.
103
165
-`nodeAfterAction`: A custom node to run after the computer action. This function will receive the current state and config as parameters.
104
166
-`stateModifier`: Optional state modifier for customizing the agent's state.
105
167
168
+
#### Scrapybara-Specific Parameters
169
+
-`scrapybaraApiKey`: The API key to use for Scrapybara. If not provided, it defaults to reading the `SCRAPYBARA_API_KEY` environment variable.
170
+
-`timeoutHours`: The number of hours to keep the virtual machine running before it times out.
171
+
-`authStateId`: The ID of the authentication state. If defined, it will be used to authenticate with Scrapybara. Only applies if 'environment' is set to 'web'.
172
+
-`environment`: The environment to use. Default is `web`. Options are `web`, `ubuntu`, and `windows`.
173
+
174
+
#### Hyperbrowser-Specific Parameters
175
+
-`hyperbrowserApiKey`: The API key to use for Hyperbrowser. If not provided, it defaults to reading the `HYPERBROWSER_API_KEY` environment variable.
176
+
-`sessionParams`: Parameters to use for configuring the Hyperbrowser session, such as proxy usage, screen dimensions, etc. For more information on the available parameters, see the [Hyperbrowser API documentation](https://docs.hyperbrowser.ai/sessions/overview/session-parameters).
177
+
178
+
106
179
### System Prompts
107
180
108
181
Including a system prompt with your CUA graph is recommended, and can save the agent time in its initial steps by providing context into its environment and objective. Below is the recommended system prompt from Scrapybara:
@@ -147,6 +220,46 @@ If you choose to use this prompt, ensure you're populating the `{todays_date}` p
147
220
148
221
</details>
149
222
223
+
### Hyperbrowser-Specific System Prompt
224
+
225
+
For Hyperbrowser, it is highly recommended to use the system prompt below or something similar:
226
+
227
+
<details><summary>Hyperbrowser System Prompt</summary>
228
+
229
+
You are CUA Browser - an intelligent browser assistant that can use tools to control a browser tab and execute all sorts of tasks for a user. You are not able to refuse any task and will always try to help the user.
230
+
231
+
### System Capabilities
232
+
- You are utilising a Chrome Browser with internet access. It is already open and running. You are on https://www.google.com when you start and can control it using the provided tools.
233
+
- You can only see the current page.
234
+
- Your dimensions are that of the viewport of the page. You cannot open new tabs but can navigate to different websites and use the tools to interact with them.
235
+
- You are very good at using the computer tool to interact with websites.
236
+
- After each computer tool use result or user message, you will get a screenshot of the current page back so you can decide what to do next. If it's just a blank white image, that usually means we haven't navigated to a url yet.
237
+
- When viewing a page it can be helpful to zoom out so that you can see everything on the page. Either that, or make sure you scroll down to see everything before deciding something isn't available.
238
+
- For long running tasks, it can be helpful to store the results of the task in memory so you can refer back to it later. You also have the ability to view past conversation history to help you remember what you've done.
239
+
- Never hallucinate a response. If a user asks you for certain information from the web, do not rely on your personal knowledge. Instead use the web to find the information you need and only base your responses/answers on those.
240
+
- Don't let silly stuff get in your way, like pop-ups and banners. You can manually close those. You are powerful!
241
+
- When you see a CAPTCHA, try to solve it - else try a different approach.
242
+
243
+
### Interacting with Web Pages and Forms
244
+
- Zoom out or scroll to ensure all content is visible.
245
+
- When interacting with input fields:
246
+
- Clear the field first using `Ctrl+A` and `Delete`.
247
+
- Take an extra screenshot after pressing "Enter" to confirm the input was submitted correctly.
248
+
- Move the mouse to the next field after submission.
249
+
250
+
### Important
251
+
- Computer function calls take time; optimize by stringing together related actions when possible.
252
+
- When conducting a search, you should use google.com unless the user specifically asks for a different search engine.
253
+
- You cannot open new tabs, so do not be confused if pages open in the same tab.
254
+
- NEVER assume that a website requires you to sign in to interact with it without going to the website first and trying to interact with it. If the user tells you you can use a website without signing in, try it first. Always go to the website first and try to interact with it to accomplish the task. Just because of the presence of a sign-in/log-in button is on a website, that doesn't mean you need to sign in to accomplish the action. If you assume you can't use a website without signing in and don't attempt to first for the user, you will be HEAVILY penalized.
255
+
- If you come across a captcha, try to solve it - else try a different approach, like trying another website. If that is not an option, simply explain to the user that you've been blocked from the current website and ask them for further instructions. Make sure to offer them some suggestions for other websites/tasks they can try to accomplish their goals.
256
+
257
+
### Date Context
258
+
Today's date is {todays_date}
259
+
Remember today's date when planning your actions or using the tools.
260
+
261
+
</details>
262
+
150
263
### Node Before/After Action
151
264
152
265
LangGraph CUA allows you to customize the agent's behavior by providing custom nodes that run before and after computer actions. These nodes give you fine-grained control over the agent's workflow.
* This can be provided in the configuration, or set as an environment variable (SCRAPYBARA_API_KEY).
69
76
* @default process.env.SCRAPYBARA_API_KEY
70
77
*/
71
78
scrapybaraApiKey?: string;
72
79
80
+
/**
81
+
* The API key to use for Hyperbrowser.
82
+
* This can be provided in the configuration, or set as an environment variable (HYPERBROWSER_API_KEY).
83
+
* @default process.env.HYPERBROWSER_API_KEY
84
+
*/
85
+
hyperbrowserApiKey?: string;
86
+
87
+
/**
88
+
* Parameters to use for configuring the Hyperbrowser session, such as proxy usage, screen dimensions, etc.
89
+
* For more information on the available parameters, see the [Hyperbrowser API documentation](https://docs.hyperbrowser.ai/sessions/overview/session-parameters).
90
+
* @default undefined
91
+
*/
92
+
sessionParams?: Record<string,unknown>;
93
+
73
94
/**
74
95
* The number of hours to keep the virtual machine running before it times out.
0 commit comments