Check site scrapeability in 2-5 seconds β’ Save 2+ hours of wasted coding
Real Browser β’ Screenshots β’ Console Errors β’ Anti-Bot Detection
π‘ First time? Run "Scrape-LE: Setup Browser" from Command Palette to install Chromium (~130MB one-time setup)
Before: Writing scraper code, deploying, then discovering Cloudflare blocked you (2 hours wasted)
# 2 hours of coding
scraper = MyScraper("https://example.com")
scraper.run() # Error: Cloudflare challenge detected!After: Check first, code later (2 seconds to validate)
β
Site reachable
β οΈ Cloudflare detected
β οΈ Rate limit: 100 requests/hour
β
robots.txt allows crawling
πΈ Screenshot saved
Time Saved: 2 hours of wasted coding β 2 seconds of validation β‘
- 2-5 seconds to validate - vs. 30+ minutes of trial and error
- Zero Config - Install Chromium β Press
Cmd+Alt+Sβ Get full report - Battle-Tested - 207 unit tests, 87% coverage, zero critical vulnerabilities
- Security-Hardened - 65 tests prevent command injection, shell metacharacter exploits
Perfect for validating scraper targets before writing code.
If Scrape-LE saves you time, a quick rating helps other developers discover it:
β Open VSX β’ VS Code Marketplace
- Real browser - Uses Playwright (Chromium) for accurate rendering
- Full-page screenshots - Visual confirmation of page state
- Anti-bot detection - Cloudflare, reCAPTCHA, hCaptcha, DataDome, Perimeter81
- Auth detection - Login forms, OAuth, SSO, API keys
- Rate limit detection - X-RateLimit headers, Retry-After, HTTP 429
- robots.txt parsing - Check crawling permissions
- Console errors - Catch JavaScript errors
- 13 languages - English, Chinese, German, Spanish, French, Indonesian, Italian, Japanese, Korean, Portuguese, Russian, Ukrainian, Vietnamese
- String-LE - Extract user-visible strings for i18n and validation β’ VS Code Marketplace
- Numbers-LE - Extract and analyze numeric data with statistics β’ VS Code Marketplace
- EnvSync-LE - Keep .env files in sync with visual diffs β’ VS Code Marketplace
- Paths-LE - Extract file paths from imports and dependencies β’ VS Code Marketplace
- URLs-LE - Audit API endpoints and external resources β’ VS Code Marketplace
- Colors-LE - Extract and analyze colors from stylesheets β’ VS Code Marketplace
- Dates-LE - Extract temporal data from logs and APIs β’ VS Code Marketplace
- Pre-Scraper Validation - Check if sites are reachable before writing scraper code
- Anti-Bot Detection - Identify Cloudflare, reCAPTCHA, hCaptcha before deployment
- Rate Limit Discovery - Find rate limits before hitting them in production
- robots.txt Compliance - Verify crawling is allowed by site policies
- Auth Wall Detection - Check if login or paywalls block access Disallow: /admin/, /api/internal/ Crawl-delay: 10 seconds Sitemap: https://example.com/sitemap.xml
## π Quick Start
1. **Install from Open VSX or VS Code Marketplace**
- Open VSX: [Install here](https://open-vsx.org/extension/OffensiveEdge/scrape-le)
- VS Code Marketplace: [Install here](https://marketplace.visualstudio.com/items?itemName=nolindnaidoo.scrape-le)
2. Open Command Palette (`Cmd/Ctrl + Shift + P`).
3. Run **"Scrape-LE: Check URL"** or press `Cmd+Alt+S` / `Ctrl+Alt+S`.
4. Enter URL and view detailed results.
**Need test URLs?** Check out [`sample/README.md`](sample/README.md) for 10 categorized test cases including static sites, SPAs, APIs, protected sites, and more.
### First-Time Setup
On first use, Scrape-LE automatically detects if Chromium is installed and prompts you to install it. This is a one-time setup (~130MB download).
**Automatic Installation:**
1. Run any check command
2. Click "Install Chromium" when prompted
3. Wait for installation to complete
**Manual Setup:**
```bash
bunx playwright install chromium
Or run from Command Palette: "Scrape-LE: Setup Browser"
scrape-le.browser.timeoutβ Navigation timeout (5s - 120s)scrape-le.browser.viewport.widthβ Viewport width (320px - 3840px)scrape-le.browser.viewport.heightβ Viewport height (240px - 2160px)scrape-le.screenshot.enabledβ Enable screenshot capturescrape-le.screenshot.pathβ Screenshot save locationscrape-le.checkConsoleErrorsβ Capture console errorsscrape-le.notificationsLevelβ Control notification verbosityscrape-le.statusBar.enabledβ Show status bar entry
scrape-le.detections.antiBotβ Detect anti-bot systems (Cloudflare, reCAPTCHA, hCaptcha, DataDome, Perimeter81)scrape-le.detections.rateLimitβ Detect rate limiting headersscrape-le.detections.robotsTxtβ Check robots.txt policiesscrape-le.detections.authenticationβ Detect authentication walls
Production Scraper Validation
{
"scrape-le.browser.timeout": 30000,
"scrape-le.screenshot.enabled": true,
"scrape-le.detections.antiBot": true,
"scrape-le.detections.rateLimit": true,
"scrape-le.detections.robotsTxt": true,
"scrape-le.notificationsLevel": "important"
}Quick Reachability Check
{
"scrape-le.browser.timeout": 10000,
"scrape-le.screenshot.enabled": false,
"scrape-le.detections.antiBot": false,
"scrape-le.detections.rateLimit": false,
"scrape-le.detections.robotsTxt": false,
"scrape-le.notificationsLevel": "silent"
}Development Mode
{
"scrape-le.browser.timeout": 60000,
"scrape-le.screenshot.enabled": true,
"scrape-le.checkConsoleErrors": true,
"scrape-le.detections.antiBot": true,
"scrape-le.detections.authentication": true,
"scrape-le.notificationsLevel": "all"
}- Browser launch requires ~130MB Chromium installation (one-time)
- Timeout ranges from 5s to 120s; adjust based on target site complexity
- Screenshots saved to
.vscode/scrape-le/by default - Large pages may take longer to capture full screenshots
- Anti-bot detection uses heuristics; some systems may not be detected
- robots.txt fetch has 5-second timeout
- Authentication detection checks HTTP status, forms, and keywords
Scrape-LE performance varies by target website and network. See detailed benchmarks.
| Scenario | Page Size | Duration | Memory | Status |
|---|---|---|---|---|
| Simple HTML | < 100 KB | < 2s | < 20 MB | β |
| Complex | 500 KB - 1 MB | 3-5s | 30-50 MB | β |
| Heavy JS (SPA) | 1-3 MB | 5-10s | 50-100 MB | |
| Image-heavy | 2-5 MB | 5-15s | 60-120 MB |
Browser: Launch 1-2s, screenshot 200-800ms PNG / 150-600ms JPEG
Detection: Anti-bot 85-90% accuracy (< 100ms), Rate limits 80-85% (< 50ms)
Full Metrics: docs/PERFORMANCE.md β’ Network-dependent performance
- Timeout Configuration: Adjust based on target site complexity
- Screenshot Impact: Adds 1-2s to overall check time
- Detection Suite: Adds 500ms-2s for all checks combined
VS Code 1.70.0+ β’ Platform Windows, macOS, Linux
Memory 1GB recommended β’ Storage 150MB (includes Chromium)
100% local processing. URLs only sent to sites you specify. No analytics or tracking.
13 languages: English, German, Spanish, French, Indonesian, Italian, Japanese, Korean, Portuguese (Brazil), Russian, Ukrainian, Vietnamese, Chinese (Simplified)
"Executable doesn't exist" error?
Run "Scrape-LE: Setup Browser" from Command Palette to install Chromium
Check times out?
Increase timeout: scrape-le.browser.timeout (default 30s) or check network connection
Need help?
Check Issues or enable verbose logging: scrape-le.notificationsLevel: "all"
Need to install Chromium?
No, Scrape-LE handles it automatically on first use (~130MB download)
Works with localhost?
Yes, supports localhost, local IPs, and any accessible URL
Works with React/Vue/Angular?
Yes, uses real browser so SPAs render properly
Will sites detect this?
Uses headless Chromium which some sites detect. Use responsibly and check robots.txt
207 unit tests β’ 87% function coverage, 91% line coverage
Powered by Vitest β’ Run with bun test --coverage
- 65 security tests for command injection & URL validation
- 46 detection logic tests for anti-bot, auth, rate limits, robots.txt
- Comprehensive coverage of browser automation, screenshot capture, and error handling
Copyright Β© 2025
@OffensiveEdge. All rights reserved.

