Skip to content

Conversation

@VinciGit00
Copy link

Background

ScrapeGraph AI is an AI-powered web scraping service that provides multiple extraction methods including smart scraping, search scraping, markdown conversion, and multi-page crawling. This integration adds comprehensive documentation and examples for the ai-sdk-scrapegraphai-tools package, enabling developers to build intelligent agents that can gather, analyze, and act on web data automatically using the AI SDK.

This addresses the need for:

  • Comprehensive documentation for the ScrapeGraph AI tools package
  • Practical examples demonstrating real-world use cases
  • Integration guidance for developers using the AI SDK
  • Best practices for web scraping agents

Following the PR #10189 discussion, the ScrapeGraph team published the tools as an independent npm package (ai-sdk-scrapegraphai-tools), and this PR adds the necessary documentation and examples to the AI SDK repository.

Summary

This PR adds comprehensive documentation and examples for ScrapeGraph AI tools integration:

Documentation Added

  1. Provider Documentation (content/providers/03-community-providers/80-scrapegraph.mdx)

    • Complete API reference for all 14 tools
    • Installation and setup instructions
    • Usage examples for each tool
    • Best practices for rate limiting, error handling, and cost management
    • Use cases and combining multiple tools
  2. Cookbook Recipe (content/cookbook/05-node/57-web-scraping-scrapegraph-agent.mdx)

    • Building intelligent web scraping agents
    • 7 practical agent implementations:
      • Basic Web Scraping Agent
      • Product Research Agent
      • Competitive Analysis Agent
      • Documentation Crawler Agent
      • News Aggregation Agent
      • E-commerce Price Monitoring Agent
      • Advanced Multi-Step Agentic Scraping
    • Best practices and cost estimation
    • Error handling patterns

Examples Added

Created 7 working examples in examples/ai-core/src/generate-text/:

  • scrapegraph-smart-scraper.ts - AI-powered data extraction
  • scrapegraph-search-scraper.ts - Multi-source web search
  • scrapegraph-markdownify.ts - HTML to Markdown conversion
  • scrapegraph-multiple-tools.ts - Using multiple tools together
  • scrapegraph-product-research.ts - Product comparison agent
  • scrapegraph-crawl-docs.ts - Multi-page documentation crawler
  • scrapegraph-credits-check.ts - API health and credits monitoring

Additional Files

  • examples/scrapegraph-examples/README.md - Comprehensive guide for running examples
  • Updated examples/ai-core/package.json to include ai-sdk-scrapegraphai-tools dependency

Manual Verification

Documentation Verification

  1. ✅ All MDX files follow the existing documentation structure and conventions
  2. ✅ Code examples use proper syntax highlighting and follow AI SDK patterns
  3. ✅ Links reference correct paths and external resources
  4. ✅ No linter errors in any documentation files

Example Verification

  1. ✅ All 7 examples follow the existing pattern in examples/ai-core/src/generate-text/
  2. ✅ Examples use the run() helper consistent with other examples
  3. ✅ TypeScript compilation successful for all examples
  4. ✅ No linter errors in example code
  5. ✅ Package dependency correctly added to package.json

Testing Instructions

To manually test the examples (requires API keys):

# Set environment variables
export SGAI_APIKEY=your_scrapegraph_api_key
export OPENAI_API_KEY=your_openai_api_key

# Install dependencies (from repo root)
pnpm install

# Run any example
pnpm tsx examples/ai-core/src/generate-text/scrapegraph-smart-scraper.ts

Checklist

  • Tests have been added / updated (for bug fixes / features)
  • Documentation has been added / updated (for bug fixes / features)
  • A patch changeset for relevant packages has been added (for bug fixes / features - run pnpm changeset in the project root)
  • I have reviewed this pull request (self-review)

Note: Changeset may not be needed as this is documentation/examples only, not a package change. Please advise if needed.

Future Work

  • Add integration tests with actual ScrapeGraph AI API once test credentials are available
  • Create additional cookbook recipes for specific use cases:
    • E-commerce price monitoring agent with notifications
    • News aggregation and summarization pipeline
    • Documentation crawler for RAG systems with vector storage
    • Competitive analysis dashboard
  • Add video tutorials or interactive examples
  • Consider adding rate limiting utilities as helper functions
  • Add telemetry/observability integration examples
  • Create Next.js-specific examples with streaming UI

Related Issues

This PR provides documentation for the ai-sdk-scrapegraphai-tools package following the discussion in PR #10189, where the maintainers suggested publishing the tools as an independent package and adding documentation/examples to the main repository.

The tools package is now published at: https://www.npmjs.com/package/ai-sdk-scrapegraphai-tools


Additional Context:

  • All 14 ScrapeGraph AI tools are documented with usage examples
  • Cost information provided for each tool (1-10 credits per operation)
  • Best practices include rate limiting, error handling, caching, and structured output patterns
  • Examples cover beginner to advanced use cases

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai/provider documentation Improvements or additions to documentation provider/community

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant