diff --git a/docs.json b/docs.json index fdcf122..dea7e7c 100644 --- a/docs.json +++ b/docs.json @@ -118,7 +118,14 @@ "integrations/notte", "integrations/stagehand", "integrations/valtown", - "integrations/vercel" + { + "group": "Vercel", + "pages": [ + "integrations/vercel/overview", + "integrations/vercel/marketplace", + "integrations/vercel/ai-sdk" + ] + } ] }, { diff --git a/integrations/vercel/ai-sdk.mdx b/integrations/vercel/ai-sdk.mdx new file mode 100644 index 0000000..9ca1a13 --- /dev/null +++ b/integrations/vercel/ai-sdk.mdx @@ -0,0 +1,446 @@ +--- +title: "Vercel AI SDK Tool" +description: "Build AI agents with browser automation using the @onkernel/ai-sdk package" +--- + +## Overview + +The `@onkernel/ai-sdk` package provides Vercel AI SDK-compatible tools for browser automation powered by Kernel. This package exposes a Playwright execution tool that allows Large Language Models (LLMs) to browse the web, interact with websites, and perform automation tasks through natural language instructions. + +With this tool, AI agents can execute Playwright code on Kernel's remote browsers, enabling powerful browser automation capabilities in your AI-powered applications. + +## Installation + +Install the package along with its peer dependencies: + +```bash +npm install @onkernel/ai-sdk zod +npm install ai @onkernel/sdk +``` + + + The `@onkernel/sdk` and `ai` packages are peer dependencies that must be installed separately. + + +## Prerequisites + +Before using the AI SDK tool, you'll need: + +1. **Kernel API Key** - Obtain from the [Kernel Dashboard](https://dashboard.onkernel.com) or through the [Vercel Marketplace integration](/integrations/vercel/marketplace) +2. **AI Model Provider** - An API key for your chosen LLM provider (OpenAI, Anthropic, etc.) +3. **Kernel Browser Session** - A running browser session created via the Kernel SDK + +## How It Works + +The `playwrightExecuteTool` creates a Vercel AI SDK tool that: + +1. Accepts natural language instructions from an LLM +2. Converts those instructions into Playwright code +3. Executes the code on a Kernel remote browser +4. Returns the results back to the LLM + +This enables AI agents to autonomously browse websites, extract data, and perform complex automation tasks. + +## Usage with `generateText()` + +The simplest way to use the AI SDK tool is with Vercel's `generateText()` function: + + +```typescript Basic Example +import { openai } from '@ai-sdk/openai'; +import { playwrightExecuteTool } from '@onkernel/ai-sdk'; +import { Kernel } from '@onkernel/sdk'; +import { generateText } from 'ai'; + +// 1) Create Kernel client and start a browser session +const client = new Kernel({ + apiKey: process.env.KERNEL_API_KEY, +}); + +const browser = await client.browsers.create({}); + +const sessionId = browser.session_id; +console.log('Browser session started:', sessionId); + +// 2) Create the Playwright execution tool +const playwrightTool = playwrightExecuteTool({ + client, + sessionId +}); + +// 3) Use with Vercel AI SDK +const result = await generateText({ + model: openai('gpt-5.1'), + prompt: 'Open example.com and click the first link', + tools: { + playwright_execute: playwrightTool, + }, +}); + +console.log('Result:', result.text); + +// 4) Clean up +await client.browsers.deleteByID(sessionId); +``` + +```typescript Advanced Example +import { openai } from '@ai-sdk/openai'; +import { playwrightExecuteTool } from '@onkernel/ai-sdk'; +import { Kernel } from '@onkernel/sdk'; +import { generateText } from 'ai'; + +const client = new Kernel({ + apiKey: process.env.KERNEL_API_KEY +}); + +const browser = await client.browsers.create({ + stealth: true, // Enable bot detection evasion +}); + +const playwrightTool = playwrightExecuteTool({ + client, + sessionId: browser.session_id, +}); + +try { + const result = await generateText({ + model: openai('gpt-5.1'), + prompt: 'Navigate to example.com, read the H1 text, and summarize the page content', + tools: { + playwright_execute: playwrightTool, + }, + }); + + console.log('Summary:', result.text); + + // Capture a screenshot of the final state + const screenshot = await client.browsers.computer.captureScreenshot( + browser.session_id + ); + + // Save screenshot + await fs.writeFile( + 'screenshot.png', + Buffer.from(await screenshot.arrayBuffer()) + ); + + // Log tool results + if (result.toolResults.length > 0) { + console.log('Executed actions:', result.toolResults); + } +} finally { + await client.browsers.deleteByID(browser.session_id); +} +``` + + +## Usage with `Agent()` Class + +For more complex, multi-step automation tasks, use the Vercel AI SDK's `Agent()` class. Agents can autonomously plan and execute a series of actions to accomplish a goal: + + +```typescript Agent Example +import { openai } from '@ai-sdk/openai'; +import { playwrightExecuteTool } from '@onkernel/ai-sdk'; +import { Kernel } from '@onkernel/sdk'; +import { Experimental_Agent as Agent, stepCountIs } from 'ai'; + +const kernel = new Kernel({ + apiKey: process.env.KERNEL_API_KEY +}); + +const browser = await kernel.browsers.create({}); + +// Initialize the AI agent with GPT-5.1 +const agent = new Agent({ + model: openai('gpt-5.1'), + tools: { + playwright_execute: playwrightExecuteTool({ + client: kernel, + sessionId: browser.session_id, + }), + }, + stopWhen: stepCountIs(20), // Maximum 20 steps + system: `You are a browser automation expert. You help users execute tasks in their browser using Playwright.`, +}); + +// Execute the agent with the user's task +const { text, steps, usage } = await agent.generate({ + prompt: 'Go to news.ycombinator.com, find the top 3 posts, and summarize them', +}); + +console.log('Agent response:', text); +console.log('Steps taken:', steps.length); +console.log('Token usage:', usage); + +await kernel.browsers.deleteByID(browser.session_id); +``` + +```typescript Next.js API Route +import { openai } from '@ai-sdk/openai'; +import { playwrightExecuteTool } from '@onkernel/ai-sdk'; +import { Kernel } from '@onkernel/sdk'; +import { Experimental_Agent as Agent, stepCountIs } from 'ai'; + +export const maxDuration = 300; // 5 minutes for long-running tasks + +export async function POST(req: Request) { + try { + const { sessionId, task } = await req.json(); + + if (!sessionId || !task) { + return Response.json( + { error: 'Missing sessionId or task' }, + { status: 400 } + ); + } + + const kernel = new Kernel({ + apiKey: process.env.KERNEL_API_KEY + }); + + // Initialize the AI agent + const agent = new Agent({ + model: openai('gpt-5.1'), + tools: { + playwright_execute: playwrightExecuteTool({ + client: kernel, + sessionId: sessionId, + }), + }, + stopWhen: stepCountIs(20), + system: `You are a browser automation expert. You help users execute tasks in their browser using Playwright.`, + }); + + // Execute the agent with the user's task + const { text, steps, usage } = await agent.generate({ + prompt: task, + }); + + // Collect all executed code from the steps + const executedCodes = steps + .filter((step) => step.toolResults && step.toolResults.length > 0) + .flatMap((step) => + step.toolResults!.map((toolResult) => { + const result = toolResult as any; + return { + code: result.executedCode || '', + success: result.success, + result: result.result, + error: result.error, + }; + }) + ) + .filter((item) => item.code); + + return Response.json({ + success: true, + response: text, + executedCodes, + stepCount: steps.length, + usage, + }); + } catch (error: any) { + console.error('Agent execution error:', error); + return Response.json( + { + success: false, + error: error.message || 'Failed to execute agent', + }, + { status: 500 } + ); + } +} +``` + + +## Tool Parameters + +The `playwrightExecuteTool` function accepts the following parameters: + +```typescript +function playwrightExecuteTool(options: { + client: Kernel; // Kernel SDK client instance + sessionId: string; // Existing browser session ID +}): Tool; +``` + +### Tool Input Schema + +The generated tool accepts the following input from the LLM: + +```typescript +{ + code: string; // Required: JavaScript/TypeScript code to execute + timeout_sec?: number; // Optional: Execution timeout in seconds (default: 60) +} +``` + +Under the hood, the tool calls `client.browsers.playwright.execute(sessionId, { code, timeout_sec })`. + +## Examples + +### Web Scraping + +```typescript +const result = await generateText({ + model: openai('gpt-5.1'), + prompt: 'Go to producthunt.com and extract the top 5 product names and descriptions', + tools: { + playwright_execute: playwrightTool, + }, +}); +``` + +### Form Automation + +```typescript +const agent = new Agent({ + model: openai('gpt-5.1'), + tools: { + playwright_execute: playwrightTool, + }, + stopWhen: stepCountIs(10), + system: 'You are a form filling assistant.', +}); + +const result = await agent.generate({ + prompt: 'Navigate to example.com/contact, fill out the contact form with name "John Doe" and email "john@example.com", and submit it', +}); +``` + +### Data Extraction + +```typescript +const result = await generateText({ + model: openai('gpt-5.1'), + prompt: 'Visit github.com/onkernel/kernel-nextjs-template, extract the README content, and count how many code examples are shown', + tools: { + playwright_execute: playwrightTool, + }, +}); +``` + +## Best Practices + +### 1. Session Management + +Always clean up browser sessions after use to avoid unnecessary costs: + +```typescript +try { + const result = await generateText({ + // ... configuration + }); + // Process results +} finally { + await client.browsers.deleteByID(sessionId); +} +``` + +### 2. Error Handling + +Implement robust error handling for production applications: + +```typescript +try { + const agent = new Agent({ + model: openai('gpt-5.1'), + tools: { + playwright_execute: playwrightTool, + }, + stopWhen: stepCountIs(20), + }); + + const result = await agent.generate({ + prompt: userTask, + }); + + return { success: true, data: result.text }; +} catch (error) { + console.error('Agent execution failed:', error); + return { success: false, error: error.message }; +} finally { + await client.browsers.deleteByID(sessionId); +} +``` + +### 3. Enable Stealth Mode + +For websites with bot detection, enable stealth mode: + +```typescript +const browser = await client.browsers.create({ + stealth: true, // Evade bot detection +}); +``` + +### 4. Live View for Debugging + +Live view is enabled by default for non-headless browsers. Access the live view URL from the browser object: + +```typescript +const browser = await client.browsers.create({}); + +console.log('Watch your browser:', browser.live_view_url); +``` + +## Troubleshooting + +### Tool Not Being Called + +If the LLM isn't using the tool, make your prompt more explicit: + +```typescript +const result = await generateText({ + model: openai('gpt-5.1'), + prompt: 'Use the Playwright tool to navigate to example.com and extract the page title', + tools: { + playwright_execute: playwrightTool, + }, +}); +``` + +### Timeout Errors + +If you're experiencing timeout errors, the LLM can request longer timeouts by including the `timeout_sec` parameter in the generated Playwright code execution request. + +## Additional Resources + + + + Official Vercel AI SDK documentation + + + Complete Kernel SDK API reference + + + Learn more about Kernel's Playwright execution + + + View source code and examples + + + +## Related + +- [Vercel Marketplace Integration](/integrations/vercel/marketplace) +- [Browser Creation](/browsers/create-a-browser) +- [Stealth Mode](/browsers/bot-detection/stealth) +- [Live View](/browsers/live-view) diff --git a/integrations/vercel.mdx b/integrations/vercel/marketplace.mdx similarity index 100% rename from integrations/vercel.mdx rename to integrations/vercel/marketplace.mdx diff --git a/integrations/vercel/overview.mdx b/integrations/vercel/overview.mdx new file mode 100644 index 0000000..6dab6d5 --- /dev/null +++ b/integrations/vercel/overview.mdx @@ -0,0 +1,94 @@ +--- +title: "Vercel Integration Overview" +description: "Integrate Kernel with Vercel for seamless browser automation in your web applications" +--- + +## Vercel + Kernel + +Kernel partners with Vercel to provide seamless browser automation capabilities for your Vercel applications. Our integration offers two powerful ways to add browser automation to your projects: + +### Vercel Marketplace Integration + +The [Vercel Marketplace integration](/integrations/vercel/marketplace) allows you to install and configure Kernel directly from the Vercel dashboard. This integration: + +- Automatically provisions a Kernel Organization and API key +- Syncs your `KERNEL_API_KEY` to your Vercel project's environment variables +- Manages billing directly through Vercel +- Syncs team members and roles between Vercel and Kernel + +[Learn more about the Marketplace integration →](/integrations/vercel/marketplace) + +### AI SDK Tool for Browser Automation + +The `@onkernel/ai-sdk` package provides a Vercel AI SDK-compatible tool that enables AI agents to execute Playwright code on Kernel remote browsers. This tool integrates seamlessly with: + +- **Vercel AI SDK's `generateText()`** - For direct LLM-powered browser automation +- **Vercel AI SDK's `Agent()` class** - For building autonomous browser automation agents + +With this tool, you can build AI-powered applications that browse the web, extract data, interact with websites, and perform complex automation tasks—all through natural language instructions. + +[Learn more about the AI SDK tool →](/integrations/vercel/ai-sdk) + +## Getting Started + +1. **Install the Kernel integration** from the [Vercel Marketplace](https://vercel.com/marketplace/kernel) +2. **Connect Kernel to your Vercel project** to sync your API key +3. **Install the AI SDK tool** in your application: + ```bash + npm install @onkernel/ai-sdk ai @onkernel/sdk + ``` +4. **Start building** browser automation agents + +## Use Cases + +- **Web Scraping & Data Extraction** - Extract data from websites using natural language +- **Automated Testing** - Build AI-powered test automation +- **Web Monitoring** - Monitor websites for changes or specific content +- **Form Automation** - Automate form submissions and data entry +- **Content Generation** - Generate screenshots, PDFs, or capture web content + +## Example + +Here's a quick example using the AI SDK tool with Vercel's `generateText()`: + +```typescript +import { openai } from '@ai-sdk/openai'; +import { playwrightExecuteTool } from '@onkernel/ai-sdk'; +import { Kernel } from '@onkernel/sdk'; +import { generateText } from 'ai'; + +const kernel = new Kernel({ apiKey: process.env.KERNEL_API_KEY }); +const browser = await kernel.browsers.create(); + +const result = await generateText({ + model: openai('gpt-5.1'), + prompt: 'Open example.com and summarize the main content', + tools: { + playwright_execute: playwrightExecuteTool({ + client: kernel, + sessionId: browser.session_id, + }), + }, +}); + +console.log(result.text); +``` + +## Next Steps + + + + Install Kernel from the Vercel Marketplace + + + Build AI agents with browser automation capabilities + +