Skip to content

Commit 15302a3

Browse files
support browser automation (#36043)
### Packages impacted by this PR ### Issues associated with this PR ### Describe the problem that is addressed by this PR ### What are the possible designs available to address the problem? If there are more than one possible design, why was the one in this PR chosen? ### Are there test cases added in this PR? _(If not, why?)_ ### Provide a list of related PRs _(if any)_ ### Command used to generate this PR:**_(Applicable only to SDK release request PRs)_ ### Checklists - [ ] Added impacted package name to the issue description - [ ] Does this PR needs any fixes in the SDK Generator?** _(If so, create an Issue in the [Autorest/typescript](https://github.com/Azure/autorest.typescript) repository and link it here)_ - [ ] Added a changelog (if necessary) --------- Co-authored-by: bobogogo1990 <[email protected]>
1 parent e6ecf12 commit 15302a3

File tree

9 files changed

+492
-7
lines changed

9 files changed

+492
-7
lines changed

sdk/ai/ai-agents/CHANGELOG.md

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,10 @@
11
# Release History
22

3-
## 1.2.0-beta.2 (Unreleased)
3+
## 1.2.0-beta.2 (2025-09-26)
44

55
### Features Added
66

7-
### Breaking Changes
8-
9-
### Bugs Fixed
10-
11-
### Other Changes
7+
- Add `ToolUtility.createBrowserAutomationTool` to support browser automation tool in agent
128

139
## 1.2.0-beta.1 (2025-09-18)
1410

sdk/ai/ai-agents/review/ai-agents-node.api.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1799,6 +1799,9 @@ export class ToolUtility {
17991799
static createBingGroundingTool(searchConfigurations: BingGroundingSearchConfiguration[]): {
18001800
definition: BingGroundingToolDefinition;
18011801
};
1802+
static createBrowserAutomationTool(connectionId: string): {
1803+
definition: BrowserAutomationToolDefinition;
1804+
};
18021805
static createCodeInterpreterTool(fileIds?: string[], dataSources?: Array<VectorStoreDataSource>): {
18031806
definition: CodeInterpreterToolDefinition;
18041807
resources: ToolResources;
Lines changed: 155 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,155 @@
1+
// Copyright (c) Microsoft Corporation.
2+
// Licensed under the MIT License.
3+
4+
/**
5+
* This sample demonstrates how to use agent operations with the Browser Automation tool from
6+
* the Azure Agents service.
7+
*
8+
* @summary demonstrates how to use agent operations with the Browser Automation tool.
9+
*/
10+
11+
import type {
12+
MessageTextContent,
13+
ThreadMessage,
14+
RunStepToolCallDetails,
15+
RunStepBrowserAutomationToolCall,
16+
MessageTextUrlCitationAnnotation,
17+
} from "@azure/ai-agents";
18+
import { AgentsClient, isOutputOfType, ToolUtility } from "@azure/ai-agents";
19+
import { DefaultAzureCredential } from "@azure/identity";
20+
import "dotenv/config";
21+
22+
const projectEndpoint = process.env["PROJECT_ENDPOINT"] || "<project endpoint>";
23+
const modelDeploymentName = process.env["MODEL_DEPLOYMENT_NAME"] || "gpt-4o";
24+
const azurePlaywrightConnectionId =
25+
process.env["AZURE_PLAYWRIGHT_CONNECTION_ID"] || "<connection id>";
26+
27+
export async function main(): Promise<void> {
28+
const connectionId = azurePlaywrightConnectionId;
29+
30+
// Initialize Browser Automation tool and add the connection id
31+
const browserAutomation = ToolUtility.createBrowserAutomationTool(connectionId);
32+
33+
// Create an Azure AI Agents Client
34+
const client = new AgentsClient(projectEndpoint, new DefaultAzureCredential());
35+
36+
// Create a new Agent that has the Browser Automation tool attached.
37+
const agent = await client.createAgent(modelDeploymentName, {
38+
name: "my-agent",
39+
instructions: `
40+
You are an Agent helping with browser automation tasks.
41+
You can answer questions, provide information, and assist with various tasks
42+
related to web browsing using the Browser Automation tool available to you.
43+
`,
44+
tools: [browserAutomation.definition],
45+
});
46+
47+
console.log(`Created agent, ID: ${agent.id}`);
48+
49+
// Create thread for communication
50+
const thread = await client.threads.create();
51+
console.log(`Created thread, ID: ${thread.id}`);
52+
53+
// Create message to thread
54+
const message = await client.messages.create(
55+
thread.id,
56+
"user",
57+
`
58+
Your goal is to report the percent of Microsoft year-to-date stock price change.
59+
To do that, go to the website finance.yahoo.com.
60+
At the top of the page, you will find a search bar.
61+
Enter the value 'MSFT', to get information about the Microsoft stock price.
62+
At the top of the resulting page you will see a default chart of Microsoft stock price.
63+
Click on 'YTD' at the top of that chart, and report the percent value that shows up just below it.
64+
`,
65+
);
66+
console.log(`Created message, ID: ${message.id}`);
67+
68+
// Create and process agent run in thread with tools
69+
console.log("Waiting for Agent run to complete. Please wait...");
70+
const run = await client.runs.createAndPoll(thread.id, agent.id, {
71+
pollingOptions: {
72+
intervalInMs: 2000,
73+
},
74+
});
75+
76+
console.log(`Run finished with status: ${run.status}`);
77+
78+
if (run.status === "failed") {
79+
console.log(`Run failed: ${JSON.stringify(run.lastError)}`);
80+
}
81+
82+
// Fetch run steps to get the details of the agent run
83+
const runStepsIterator = client.runSteps.list(thread.id, run.id);
84+
console.log("\nRun Steps:");
85+
86+
for await (const step of runStepsIterator) {
87+
console.log(`Step ${step.id} status: ${step.status}`);
88+
89+
if (isOutputOfType<RunStepToolCallDetails>(step.stepDetails, "tool_calls")) {
90+
console.log(" Tool calls:");
91+
const toolCalls = step.stepDetails.toolCalls;
92+
93+
for (const call of toolCalls) {
94+
console.log(` Tool call ID: ${call.id}`);
95+
console.log(` Tool call type: ${call.type}`);
96+
97+
if (isOutputOfType<RunStepBrowserAutomationToolCall>(call, "browser_automation")) {
98+
console.log(` Browser automation input: ${call.browserAutomation.input}`);
99+
console.log(` Browser automation output: ${call.browserAutomation.output}`);
100+
101+
console.log(" Steps:");
102+
for (const toolStep of call.browserAutomation.steps) {
103+
console.log(` Last step result: ${toolStep.lastStepResult}`);
104+
console.log(` Current state: ${toolStep.currentState}`);
105+
console.log(` Next step: ${toolStep.nextStep}`);
106+
console.log(); // add an extra newline between tool steps
107+
}
108+
}
109+
110+
console.log(); // add an extra newline between tool calls
111+
}
112+
}
113+
114+
console.log(); // add an extra newline between run steps
115+
}
116+
117+
// Optional: Delete the agent once the run is finished.
118+
// Comment out this line if you plan to reuse the agent later.
119+
await client.deleteAgent(agent.id);
120+
console.log("Deleted agent");
121+
122+
// Print the Agent's response message with optional citation
123+
const messagesIterator = client.messages.list(thread.id);
124+
const messages: ThreadMessage[] = [];
125+
126+
for await (const msg of messagesIterator) {
127+
messages.unshift(msg); // Add to beginning to maintain chronological order
128+
}
129+
130+
// Find the last assistant message
131+
const responseMessage = messages.find(
132+
(msg) => msg.role === "assistant" && msg.content.length > 0,
133+
);
134+
135+
if (responseMessage) {
136+
// Display URL citations if any
137+
for (const content of responseMessage.content) {
138+
if (isOutputOfType<MessageTextContent>(content, "text")) {
139+
console.log(`Agent response: ${content.text.value}`);
140+
for (const annotation of content.text.annotations || []) {
141+
if (isOutputOfType<MessageTextUrlCitationAnnotation>(annotation, "url_citation")) {
142+
console.log(
143+
`URL Citation: [${annotation.urlCitation.title}](${annotation.urlCitation.url})`,
144+
);
145+
}
146+
}
147+
}
148+
}
149+
}
150+
}
151+
152+
main().catch((err) => {
153+
console.error("The sample encountered an error:", err);
154+
process.exit(1);
155+
});

sdk/ai/ai-agents/samples/v1-beta/javascript/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ These sample programs show how to use the JavaScript client libraries for Azure
1919
| [agentsBasics.js][agentsbasics] | demonstrates how to use basic agent operations. |
2020
| [agentsBingGrounding.js][agentsbinggrounding] | demonstrates how to use agent operations with the Grounding with Bing Search tool. |
2121
| [agentsBingGroundingWithStreaming.js][agentsbinggroundingwithstreaming] | demonstrates how to use agent operations with the Grounding with Bing Search tool using streaming. |
22+
| [agentsBrowserAutomation.js][agentsbrowserautomation] | demonstrates how to use agent operations with the Browser Automation tool. |
2223
| [agentsConnectedAgents.js][agentsconnectedagents] | This sample demonstrates how to use Agent operations with the Connected Agent tool from the Azure Agents service. |
2324
| [agentsImageInputWithBase64.js][agentsimageinputwithbase64] | This sample demonstrates how to use basic agent operations with image input (base64 encoded) for the Azure Agents service. |
2425
| [agentsImageInputWithFile.js][agentsimageinputwithfile] | This sample demonstrates how to use basic agent operations using image file input for the Azure Agents service. |
@@ -90,6 +91,7 @@ Take a look at our [API Documentation][apiref] for more information about the AP
9091
[agentsbasics]: https://github.com/Azure/azure-sdk-for-js/blob/main/sdk/ai/ai-agents/samples/v1-beta/javascript/agentsBasics.js
9192
[agentsbinggrounding]: https://github.com/Azure/azure-sdk-for-js/blob/main/sdk/ai/ai-agents/samples/v1-beta/javascript/agentsBingGrounding.js
9293
[agentsbinggroundingwithstreaming]: https://github.com/Azure/azure-sdk-for-js/blob/main/sdk/ai/ai-agents/samples/v1-beta/javascript/agentsBingGroundingWithStreaming.js
94+
[agentsbrowserautomation]: https://github.com/Azure/azure-sdk-for-js/blob/main/sdk/ai/ai-agents/samples/v1-beta/javascript/agentsBrowserAutomation.js
9395
[agentsconnectedagents]: https://github.com/Azure/azure-sdk-for-js/blob/main/sdk/ai/ai-agents/samples/v1-beta/javascript/agentsConnectedAgents.js
9496
[agentsimageinputwithbase64]: https://github.com/Azure/azure-sdk-for-js/blob/main/sdk/ai/ai-agents/samples/v1-beta/javascript/agentsImageInputWithBase64.js
9597
[agentsimageinputwithfile]: https://github.com/Azure/azure-sdk-for-js/blob/main/sdk/ai/ai-agents/samples/v1-beta/javascript/agentsImageInputWithFile.js
Lines changed: 150 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,150 @@
1+
// Copyright (c) Microsoft Corporation.
2+
// Licensed under the MIT License.
3+
4+
/**
5+
* This sample demonstrates how to use agent operations with the Browser Automation tool from
6+
* the Azure Agents service.
7+
*
8+
* @summary demonstrates how to use agent operations with the Browser Automation tool.
9+
*/
10+
11+
const { AgentsClient, isOutputOfType, ToolUtility } = require("@azure/ai-agents");
12+
const { DefaultAzureCredential } = require("@azure/identity");
13+
require("dotenv/config");
14+
15+
const projectEndpoint = process.env["PROJECT_ENDPOINT"] || "<project endpoint>";
16+
const modelDeploymentName = process.env["MODEL_DEPLOYMENT_NAME"] || "gpt-4o";
17+
const azurePlaywrightConnectionId =
18+
process.env["AZURE_PLAYWRIGHT_CONNECTION_ID"] || "<connection id>";
19+
20+
async function main() {
21+
const connectionId = azurePlaywrightConnectionId;
22+
23+
// Initialize Browser Automation tool and add the connection id
24+
const browserAutomation = ToolUtility.createBrowserAutomationTool(connectionId);
25+
26+
// Create an Azure AI Agents Client
27+
const client = new AgentsClient(projectEndpoint, new DefaultAzureCredential());
28+
29+
// Create a new Agent that has the Browser Automation tool attached.
30+
const agent = await client.createAgent(modelDeploymentName, {
31+
name: "my-agent",
32+
instructions: `
33+
You are an Agent helping with browser automation tasks.
34+
You can answer questions, provide information, and assist with various tasks
35+
related to web browsing using the Browser Automation tool available to you.
36+
`,
37+
tools: [browserAutomation.definition],
38+
});
39+
40+
console.log(`Created agent, ID: ${agent.id}`);
41+
42+
// Create thread for communication
43+
const thread = await client.threads.create();
44+
console.log(`Created thread, ID: ${thread.id}`);
45+
46+
// Create message to thread
47+
const message = await client.messages.create(
48+
thread.id,
49+
"user",
50+
`
51+
Your goal is to report the percent of Microsoft year-to-date stock price change.
52+
To do that, go to the website finance.yahoo.com.
53+
At the top of the page, you will find a search bar.
54+
Enter the value 'MSFT', to get information about the Microsoft stock price.
55+
At the top of the resulting page you will see a default chart of Microsoft stock price.
56+
Click on 'YTD' at the top of that chart, and report the percent value that shows up just below it.
57+
`,
58+
);
59+
console.log(`Created message, ID: ${message.id}`);
60+
61+
// Create and process agent run in thread with tools
62+
console.log("Waiting for Agent run to complete. Please wait...");
63+
const run = await client.runs.createAndPoll(thread.id, agent.id, {
64+
pollingOptions: {
65+
intervalInMs: 2000,
66+
},
67+
});
68+
69+
console.log(`Run finished with status: ${run.status}`);
70+
71+
if (run.status === "failed") {
72+
console.log(`Run failed: ${JSON.stringify(run.lastError)}`);
73+
}
74+
75+
// Fetch run steps to get the details of the agent run
76+
const runStepsIterator = client.runSteps.list(thread.id, run.id);
77+
console.log("\nRun Steps:");
78+
79+
for await (const step of runStepsIterator) {
80+
console.log(`Step ${step.id} status: ${step.status}`);
81+
82+
if (isOutputOfType(step.stepDetails, "tool_calls")) {
83+
console.log(" Tool calls:");
84+
const toolCalls = step.stepDetails.toolCalls;
85+
86+
for (const call of toolCalls) {
87+
console.log(` Tool call ID: ${call.id}`);
88+
console.log(` Tool call type: ${call.type}`);
89+
90+
if (isOutputOfType(call, "browser_automation")) {
91+
console.log(` Browser automation input: ${call.browserAutomation.input}`);
92+
console.log(` Browser automation output: ${call.browserAutomation.output}`);
93+
94+
console.log(" Steps:");
95+
for (const toolStep of call.browserAutomation.steps) {
96+
console.log(` Last step result: ${toolStep.lastStepResult}`);
97+
console.log(` Current state: ${toolStep.currentState}`);
98+
console.log(` Next step: ${toolStep.nextStep}`);
99+
console.log(); // add an extra newline between tool steps
100+
}
101+
}
102+
103+
console.log(); // add an extra newline between tool calls
104+
}
105+
}
106+
107+
console.log(); // add an extra newline between run steps
108+
}
109+
110+
// Optional: Delete the agent once the run is finished.
111+
// Comment out this line if you plan to reuse the agent later.
112+
await client.deleteAgent(agent.id);
113+
console.log("Deleted agent");
114+
115+
// Print the Agent's response message with optional citation
116+
const messagesIterator = client.messages.list(thread.id);
117+
const messages = [];
118+
119+
for await (const msg of messagesIterator) {
120+
messages.unshift(msg); // Add to beginning to maintain chronological order
121+
}
122+
123+
// Find the last assistant message
124+
const responseMessage = messages.find(
125+
(msg) => msg.role === "assistant" && msg.content.length > 0,
126+
);
127+
128+
if (responseMessage) {
129+
// Display URL citations if any
130+
for (const content of responseMessage.content) {
131+
if (isOutputOfType(content, "text")) {
132+
console.log(`Agent response: ${content.text.value}`);
133+
for (const annotation of content.text.annotations || []) {
134+
if (isOutputOfType(annotation, "url_citation")) {
135+
console.log(
136+
`URL Citation: [${annotation.urlCitation.title}](${annotation.urlCitation.url})`,
137+
);
138+
}
139+
}
140+
}
141+
}
142+
}
143+
}
144+
145+
main().catch((err) => {
146+
console.error("The sample encountered an error:", err);
147+
process.exit(1);
148+
});
149+
150+
module.exports = { main };

sdk/ai/ai-agents/samples/v1-beta/typescript/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ These sample programs show how to use the TypeScript client libraries for Azure
1919
| [agentsBasics.ts][agentsbasics] | demonstrates how to use basic agent operations. |
2020
| [agentsBingGrounding.ts][agentsbinggrounding] | demonstrates how to use agent operations with the Grounding with Bing Search tool. |
2121
| [agentsBingGroundingWithStreaming.ts][agentsbinggroundingwithstreaming] | demonstrates how to use agent operations with the Grounding with Bing Search tool using streaming. |
22+
| [agentsBrowserAutomation.ts][agentsbrowserautomation] | demonstrates how to use agent operations with the Browser Automation tool. |
2223
| [agentsConnectedAgents.ts][agentsconnectedagents] | This sample demonstrates how to use Agent operations with the Connected Agent tool from the Azure Agents service. |
2324
| [agentsImageInputWithBase64.ts][agentsimageinputwithbase64] | This sample demonstrates how to use basic agent operations with image input (base64 encoded) for the Azure Agents service. |
2425
| [agentsImageInputWithFile.ts][agentsimageinputwithfile] | This sample demonstrates how to use basic agent operations using image file input for the Azure Agents service. |
@@ -102,6 +103,7 @@ Take a look at our [API Documentation][apiref] for more information about the AP
102103
[agentsbasics]: https://github.com/Azure/azure-sdk-for-js/blob/main/sdk/ai/ai-agents/samples/v1-beta/typescript/src/agentsBasics.ts
103104
[agentsbinggrounding]: https://github.com/Azure/azure-sdk-for-js/blob/main/sdk/ai/ai-agents/samples/v1-beta/typescript/src/agentsBingGrounding.ts
104105
[agentsbinggroundingwithstreaming]: https://github.com/Azure/azure-sdk-for-js/blob/main/sdk/ai/ai-agents/samples/v1-beta/typescript/src/agentsBingGroundingWithStreaming.ts
106+
[agentsbrowserautomation]: https://github.com/Azure/azure-sdk-for-js/blob/main/sdk/ai/ai-agents/samples/v1-beta/typescript/src/agentsBrowserAutomation.ts
105107
[agentsconnectedagents]: https://github.com/Azure/azure-sdk-for-js/blob/main/sdk/ai/ai-agents/samples/v1-beta/typescript/src/agentsConnectedAgents.ts
106108
[agentsimageinputwithbase64]: https://github.com/Azure/azure-sdk-for-js/blob/main/sdk/ai/ai-agents/samples/v1-beta/typescript/src/agentsImageInputWithBase64.ts
107109
[agentsimageinputwithfile]: https://github.com/Azure/azure-sdk-for-js/blob/main/sdk/ai/ai-agents/samples/v1-beta/typescript/src/agentsImageInputWithFile.ts

0 commit comments

Comments
 (0)