Complete reference for every feature in Scrapeman. For installation and project overview, see the README.
- Getting Started
- Request Building
- Environment Variables
- Scoped Variables and Auth Inheritance
- Auth Schemes
- Collections and File Format
- Local History
- Response Viewer
- Code Export
- Load Runner <<<<<<< HEAD
- WebSocket =======
- Collection Runner
ed92491 (feat: collection runner — sequential/parallel, CSV iterations, JSON/CSV/HTML export)
- Import and Export
- Proxy and Scrape.do Mode
- Scraping-first features
- Cookie Jar
- In-App Git Integration
- Collection Search
- Keyboard Shortcuts
- UX Details
macOS (Homebrew):
brew tap scrape-do/scrapeman && brew install --cask scrapemanDirect download:
Pre-built installers for every tagged release:
- macOS
.dmg(Apple Silicon + Intel) - Windows
.exeNSIS installer (x64) - Linux
.AppImageand.deb(x64)
Download from github.com/scrape-do/scrapeman/releases.
- Open Scrapeman. A new empty tab is ready with the cursor in the URL bar.
- Paste a URL:
https://httpbin.org/get - Method defaults to
GET. PressCmd+Enter(Mac) orCtrl+Enter(Win/Linux) to send. - The response appears in the right panel: status, headers, body (auto-detected as JSON and rendered in the Pretty view).
- Press
Cmd+S. If the request has never been saved, a dialog asks for a name and folder. - The file is written to your workspace folder as a
.smanfile. - The request appears in the sidebar collection tree.
The URL bar supports {{var}} syntax with live highlighting. Variables resolve from the active environment, collection variables, and built-in dynamics.
An autocomplete popover appears when you type {{ showing all available variables with their current resolved values.
- Params — key-value table with two-way URL sync. Editing a param updates the URL query string and vice versa.
- Headers — key-value table. Auto-headers (Content-Type, Accept-Encoding, User-Agent) are shown with a toggle to disable or override each one. A bulk-edit toggle (pencil icon, top-right of the table) switches the view to a textarea where each line is
Key: Value. Prefix a line with//to disable it. Switching back to the table is lossless — disabled state and{{var}}placeholders are preserved. - Body — modes: none, raw (JSON, XML, HTML, text, JavaScript), form-urlencoded, multipart form-data, binary file, GraphQL (planned). When mode is JSON, a Beautify button appears in the type bar to format the body with 2-space indent. The shortcut Shift+Cmd+F (macOS) / Shift+Ctrl+F (Windows/Linux) triggers beautify while the body editor is focused. Bodies containing
{{variable}}placeholders are not formatted (use environments to resolve them first). - Auth — see Auth Schemes below.
- Settings — per-request proxy, timeout, redirect, TLS, HTTP version, and Scrape.do native mode. See Proxy and Scrape.do Mode.
- Code — see Code Export below.
In both Headers and Params tables:
- Shift+Enter — insert a new empty row below the current one, focus moves to the new Key cell.
- Tab from the last row's Key cell — if the Key cell is non-empty, a new empty row is appended automatically. No need to click "Add row."
Scrapeman automatically sets three headers based on the request:
- Content-Type — derived from the body mode (application/json for JSON body, application/x-www-form-urlencoded for form, etc.)
- Accept-Encoding —
gzip, br, deflateso the response is auto-decompressed by undici. - User-Agent —
Scrapeman/<version>.
Each auto-header can be disabled or overridden per-request from the Headers tab. If you manually set the same header, your value wins.
Environments are stored as .env.yaml files under .scrapeman/environments/ in your workspace.
# .scrapeman/environments/dev.env.yaml
name: Development
variables:
- key: baseUrl
value: https://api.example.com
secret: false
- key: token
value: sk-live-abc123
secret: true{{var}} syntax works across: URL, params, headers, body, auth fields, proxy fields, and Scrape.do fields.
Scope precedence (highest wins):
- Folder chain (deepest folder overrides shallower)
- Active environment
- Collection variables
- Global variables
- Built-in dynamics
See Scoped Variables and Auth Inheritance for details on each scope.
These re-resolve on every send:
| Variable | Output |
|---|---|
{{random}} |
Random 8-char alphanumeric string |
{{uuid}} |
UUID v4 |
{{timestamp}} |
Unix timestamp in milliseconds |
{{timestampSec}} |
Unix timestamp in seconds |
{{isoDate}} |
ISO 8601 date string |
{{randomInt}} |
Random integer 0-9999 |
Variables with secret: true are masked in the UI (shown as ••••••). They resolve normally at send time. History entries preserve the template ({{token}}) on disk, never the resolved secret value.
Variables resolve in this order (rightmost wins):
Global → Collection → Environment → Folder chain (root → leaf)
Stored at .scrapeman/globals.yaml. Available to every request in the workspace, across all environments. Edit via the workspace settings gear icon in the sidebar.
# .scrapeman/globals.yaml
scrapeman: "2.0"
variables:
- key: COMPANY_ID
value: acme
enabled: true
secret: falseStored at .scrapeman/collection.yaml. Apply to all requests. Lower precedence than environment variables. Also holds the collection-level default auth. Edit via the workspace settings gear icon → "Collection settings…".
# .scrapeman/collection.yaml
scrapeman: "2.0"
variables:
- key: API_VERSION
value: v2
enabled: true
secret: false
auth:
type: bearer
token: "{{BEARER_TOKEN}}"Each folder can have a _folder.yaml file with variables and an optional auth block. Right-click a folder in the sidebar and select "Folder settings…" to open the two-tab dialog (Variables / Auth).
# api/users/_folder.yaml
scrapeman: "2.0"
variables:
- key: RESOURCE
value: users
enabled: true
secret: false
auth:
type: apiKey
key: X-Api-Key
value: "{{API_KEY}}"
in: headerFolder variables from ancestor folders are merged first (root to leaf), so a deeper folder's variable wins over a shallower one's.
When a request's Auth tab is set to "None" (or the Inherit option), Scrapeman walks the folder chain from the request's folder up to the workspace root. The first ancestor with an auth block in its _folder.yaml is used. If no folder defines auth, the collection default auth is applied.
The Auth tab shows an "Inherited from /path" label when inheritance is active. Set an explicit auth type on the request to override it.
Six auth types are built in. Select from the Auth tab in the request builder.
No auth headers are sent. This is the default.
Provide username and password. Scrapeman encodes them as Authorization: Basic <base64(user:pass)> at send time.
Provide a token string. Sent as Authorization: Bearer <token>. The token can be a {{var}} reference.
Provide key, value, and placement (header or query). If placement is header, the key-value pair is injected as a request header. If query, it is appended to the URL as a query parameter.
Three flows are supported. Select from the Flow dropdown in the Auth tab.
Client credentials
Configure: Token URL, Client ID, Client Secret, Scope (optional), Audience (optional).
Scrapeman fetches the token automatically before sending the request. Tokens are cached until expiry and then refreshed. Concurrent requests share one in-flight token fetch.
Authorization code / Authorization code + PKCE
Configure: Token URL, Auth URL, Client ID, Client Secret (optional for PKCE flows), Scope, Audience (optional).
Click "Get token" to start the flow. Scrapeman opens your default browser to the auth URL, spins up a local loopback server on a random port, and waits for the redirect callback. Once the code arrives, Scrapeman exchanges it for a token and caches the result. The token is then applied automatically to all subsequent requests.
Authorization code + PKCE generates a code_verifier / code_challenge pair using S256. No client secret is required for pure PKCE flows.
The redirect URI is always http://127.0.0.1:<port>/callback (ephemeral port). Register this pattern in your authorization server if it requires an exact match.
OIDC discovery
Set the Discovery URL field to a .well-known/openid-configuration endpoint and click "Load". Scrapeman reads the document and auto-fills Token URL, Auth URL, and the supported scopes list.
Token placement
By default the access token is sent as Authorization: Bearer <token>. The accessTokenPlacement field in the .sman file accepts three variants:
auth:
type: oauth2
# Default — sends as Authorization: Bearer <token>
accessTokenPlacement:
in: header
name: Authorization
prefix: Bearer
# Custom header name / prefix
accessTokenPlacement:
in: header
name: X-Auth-Token
prefix: ""
# Query parameter
accessTokenPlacement:
in: query
name: access_token
# Form body (POST/PUT/PATCH with formUrlEncoded body only)
accessTokenPlacement:
in: body
name: access_tokenToken inspector
When the token response contains an access_token or id_token that is a JWT, Scrapeman shows a collapsible Token Inspector panel below the token management buttons. The inspector decodes the header and payload and shows a live countdown to the exp claim. No signature is verified — the panel is for display only.
Configure:
- Access Key ID and Secret Access Key
- Session Token (optional, for temporary credentials)
- Region (e.g.,
us-east-1) - Service (e.g.,
s3,execute-api)
Scrapeman signs the request using the aws4 library. The signature covers method, URL, headers, and body.
Every request is one .sman file (YAML content, custom extension):
# products/list.sman
scrapeman: "2.0"
meta:
name: List products
method: GET
url: "https://api.example.com/products?page={{page}}"
headers:
Accept: application/json
auth:
type: bearer
token: "{{token}}"Key order is stable (deterministic serializer), so git diffs are clean and human-readable.
Files saved by earlier versions used .req.yaml with scrapeman: "1.0". Scrapeman still reads those files transparently; when you save one, it is rewritten as .sman next to the old file and the .req.yaml is removed. If both extensions happen to exist with the same stem, the .sman wins and the .req.yaml is hidden from the sidebar.
Payloads 4KB or larger are automatically promoted to a sidecar file under files/<slug>.body.<ext>. The .sman file references the sidecar by path. This keeps the main file small and diffs focused on metadata changes.
Scrapeman writes only inside the workspace folder you choose. History, cookies, and state live in the app data directory, never the workspace. The workspace is safe to commit to git.
You can keep more than one workspace open in the same window and switch between them from the sidebar header. Click the workspace name (just above the Files / Git tab strip) to drop down a list of every workspace you have opened this session.
The dropdown lets you:
- Switch to another open workspace (the current workspace's open tabs, active environment, and sidebar view get snapshotted in memory and restored when you switch back).
- Open another workspace folder via the standard picker.
- Close the current workspace, or close any other open workspace from its row (× on hover).
When more than one workspace is open, the title bar shows the active workspace name in bold along with an "(N open)" counter.
Persistence: the list of open workspaces and the last-active path persist to localStorage, so the same set reopens on next launch. Per-workspace UI snapshots (open tabs, active env, sidebar view) live in memory only — they are not saved across restarts. File-backed tabs reopen because they live on disk; unsaved draft tabs do not survive a quit.
Out of scope for now: side-by-side workspace columns, dragging requests between workspaces, workspace-specific keyboard shortcuts. The undo-close-tab stack (⌘⇧T) is shared across workspaces in this release.
Right-click a request in the sidebar and select "Stop syncing to git" to exclude it from version control. Backed by .git/info/exclude (never pushed to remote). Shortcut: Cmd+Shift+H on the active tab. A crossed-eye icon marks unsynced requests.
Every sent request is captured to a per-workspace JSONL file under the app data directory (never the workspace folder).
Template-preserving: {{token}} stays as {{token}} on disk. Secrets are never baked into history.
Compressed: body preview fields are gzipped on disk when 256 bytes or larger (5-10x smaller).
The sidebar History panel shows recent requests with:
- Method badge (GET/POST/PUT/etc.)
- Status pill (200 green, 4xx red, etc.)
- Relative time ("2 min ago")
Click any entry to restore it into a new tab. Duplicate restores are detected and skipped.
The history panel has a search bar that filters by request name and URL.
Scrapeman auto-detects the response content type: JSON, HTML, XML, JavaScript, CSS, image, PDF, text, or binary.
| Content type | Available modes |
|---|---|
SSE (text/event-stream) |
Events (default), Raw |
| JSON | Raw, Pretty (CodeMirror syntax-highlighted, formatted), Tree (collapsible with JSONPath copy) |
| HTML | Raw, Pretty (CodeMirror syntax-highlighted), Preview (sandboxed iframe) |
| XML | Raw, Pretty (CodeMirror syntax-highlighted, indented) |
| JavaScript | Raw, Pretty (CodeMirror syntax-highlighted) |
| CSS | Raw, Pretty (CodeMirror syntax-highlighted) |
| Image | Raw (hex), Preview (rendered) |
| Raw, Preview (Chromium PDF viewer) | |
| Text/binary | Raw |
Pretty mode uses CodeMirror with the one-dark theme in dark mode and a neutral light theme otherwise. Syntax coloring follows the language grammar: key/value colors for JSON, tag/attribute for XML, keyword/string for JavaScript, and selector/property for CSS.
Lazy parsing: the default view is Raw. JSON.parse and tree rendering only happen when you switch to Pretty or Tree view.
When the response is text/event-stream (or ExecutedResponse.sseEvents is populated), the viewer defaults to Events mode:
- Each event is shown in its own block with
id,event,retryfields in the header row and thedatabody below. - If
datais valid JSON, it renders as a collapsible JSON tree (same component used in the Tree view). Otherwise it displays as monospace text. - Auto-scroll: the list follows new events as they arrive. Click "Scroll: off" to pause and browse earlier events; click again to resume.
- Export JSON: saves the full
sseEventsarray as a.jsonfile via the system save dialog. - Raw and Pretty modes remain accessible for inspecting the raw stream text.
Detection fires on either Content-Type: text/event-stream in the response headers, or when the executor sets sseEvents on the response (which it does for all SSE responses regardless of the header).
Search within the response body with highlight, previous/next navigation. The search persists across sends and auto-re-runs when a new response arrives.
- Debounced input: the match scan runs 150 ms after you stop typing so keystrokes feel instant even on large bodies.
- Virtualized rendering: Raw views render only the visible lines — 5 MB bodies scroll and search without jank.
- Enter / Shift+Enter: jump to next / previous match; the viewport scrolls to the active match automatically.
- Large body warning: switching to Pretty mode on a body over 500 KB shows a banner suggesting Raw for best performance. Applies to all syntax-highlighted kinds (JSON, HTML, XML, JavaScript, CSS).
Every response shows: HTTP status, TTFB (time to first byte), download time, body size, and protocol (HTTP/1.1 or h2).
The Dev Tools tab appears next to Body and Headers after every response. It shows:
Timing waterfall — horizontal bar chart with one segment per measured phase: DNS lookup, TCP connect, TLS handshake, time to first byte, and download. Each segment is proportional to its share of total time and labelled with its exact millisecond value.
Request metadata
| Field | Description |
|---|---|
| URL | The final URL after variable resolution, Scrape.do composition, and auth |
| HTTP version | Protocol used for the response (http/1.1 or h2) |
| Remote address | IP and port of the server that answered |
| Compression | Wire size from Content-Length vs decoded size, e.g. 12.4 KB → 48.1 KB (3.9× decoded). Shown only when Content-Length is present and differs from decoded size. |
Sent headers — all headers actually sent on the wire after auto-header merge and auth injection.
Redirect chain — each 3xx hop before the final response, shown as a vertical list:
301 https://example.com/old → /new
302 https://example.com/new → /final
TLS certificate — subject CN, issuer CN, validity dates, and SHA-256 fingerprint. Days-remaining warning appears when fewer than 30 days remain. Shown only for HTTPS responses. Displays "TLS info unavailable" when the certificate could not be read (e.g. resumed TLS session).
Script console — output from pre/post-request scripts (issue #20, not yet shipped). Shows "No script output" when no script was attached or the script produced no output.
Generate code from the current request in four languages:
| Language | Library |
|---|---|
| curl | curl CLI |
| JavaScript | fetch API |
| Python | requests |
| Go | net/http |
Each generator respects method, URL, params, headers, body, and Basic/Bearer auth.
Variable toggle: switch between "inline resolved values" and "keep {{var}} templates" in the generated code.
Copy the generated code to clipboard with one click. The Code tab in the request builder shows a read-only preview.
Stress-test any request with bounded concurrency.
- Total requests — how many times to send
- Concurrency — how many in-flight at once
- Per-iteration delay (optional)
Each iteration re-resolves {{random}}, {{timestamp}}, and other dynamics, so every request is unique.
Config is per-tab. Each request tab has its own load test configuration. Switching tabs does not reset another tab's settings.
While running, you see:
- Sent / remaining / requests per second
- Latency: p50, p95, p99
- Success rate
- Status histogram (200, 201, 400, 500, ...)
- Error kind breakdown (timeout, connection refused, etc.)
Hover any metric to see a description of what it measures.
Load test state (config, progress, event log) is stored per tab in the application state. A test started in Tab A continues running when you switch to Tab B — progress is preserved and visible when you return to Tab A.
Set expected status codes (e.g., 200, 201) and an optional body-contains substring. Requests that fail validation are flagged in the console log.
- Stop mid-run with partial results preserved.
- Console log with color-coded rows: green for success, yellow for validation fail, red for network error.
<<<<<<< HEAD <<<<<<< HEAD
The "WebSocket" tab on any request tab opens a bidirectional WebSocket client. It does not replace the HTTP request builder — both live in the same tab.
Enter a ws:// or wss:// URL and click Connect (or press Enter). The status dot in the top bar changes:
- Gray — closed
- Yellow — connecting or closing
- Green — open
- Red — closed after error
Click Disconnect to close the connection with code 1000.
Type in the send area at the bottom. Press Send or ⌘↵ to send. The message appears in the timeline with a ↑ direction indicator.
Each message row shows:
- ↓ — inbound message
- ↑ — outbound message
- ● — ping sent (application-level keep-alive)
- ○ — pong received (with round-trip latency in ms)
- — — status event (connected, disconnected, error)
JSON payloads have an expand toggle that renders the collapsible tree viewer inline.
Auto-scroll keeps the timeline pinned to the bottom as new messages arrive. Scrolling up manually pauses it; clicking the Auto-scroll button resumes it.
Click Export to download the full timeline as a JSON file.
By default the client sends an application-level ping message every 30 seconds. Servers that echo it back are used to measure round-trip latency. The pong row shows the latency in milliseconds.
This is not a WebSocket protocol-level ping frame — it is a text message with a sentinel value, because the undici WebSocket implementation does not expose raw ping frame APIs to userland.
Switching to another tab does not close the socket. The connection stays open in the background. Switching back resumes the live timeline from where you left off.
The WebSocket client routes through the same proxy configuration used for HTTP requests. Set a proxy URL in the connection options, and all WebSocket handshake and frames will go through it. This includes Scrape.do proxy endpoints for scraping targets that require it.
Run all requests in a folder together — useful for smoke-testing a workflow, seeding test data, or running a multi-step scraping sequence.
Right-click any folder in the sidebar and choose Run folder…. The runner panel opens pre-filled with that folder's request list.
Sequential — requests fire one at a time in the order they appear in the folder tree. Each request waits for its response before the next starts. Use this for workflows where order matters (e.g., login → fetch → clean up).
Parallel — requests in each iteration fire simultaneously, up to the configured Concurrency limit. Iterations are still sequential (one iteration completes before the next starts).
| Option | Default | Notes |
|---|---|---|
| Mode | Sequential | Sequential or Parallel |
| Concurrency | 5 | Parallel mode only; max simultaneous in-flight requests |
| Delay (ms) | 0 | Wait after each request completes, before the next starts |
| Iterations | 1 | How many full-collection passes to run |
| CSV file | — | Replaces the iterations counter; see data-driven below |
Upload a CSV file to run the collection once per data row. The header row defines variable names; each subsequent row becomes one iteration's variable bag, merged on top of the active environment.
Example:
user_id,token
42,abc123
99,xyz789With this CSV, the collection runs twice. In iteration 1, {{user_id}} resolves to 42 and {{token}} to abc123. In iteration 2 they resolve to 99 and xyz789.
Each completed request shows:
- Pass (✓) or fail (✗) indicator
- Iteration number, request name, method, HTTP status, duration
- Click any row to expand it: URL, response headers, body preview, and error detail
A live progress bar tracks completed / total across all iterations.
Click Stop at any time. In-flight requests are cancelled via AbortSignal; partial results are preserved in the results list and are still exportable.
After a run completes, three export buttons appear:
- JSON — full
RunnerResultobject including all per-request results, timings, and metadata - CSV — one row per request: iteration, name, method, URL, status, duration, ok, error
- HTML — self-contained styled report; dark theme, summary cards, full results table
The native file-save dialog prompts for a destination on each export.
If a request in the folder has scrapeDo.enabled: true, the runner honours it — just as the single-request executor does. The runner does not force Scrape.do on globally; per-request settings are respected.
ed92491 (feat: collection runner — sequential/parallel, CSV iterations, JSON/CSV/HTML export) =======
Each request has two optional JavaScript scripts: one that runs before the request is sent and one that runs after the response is received. Open the Scripts tab in the request builder to write them.
The script runs after variable resolution and auth wiring, immediately before the HTTP send. Use the req object to mutate the request on the fly.
// Add a computed header.
req.setHeader("X-Timestamp", String(bru.timestamp()));
// Read a request-scoped variable set by a previous step.
const token = bru.getVar("authToken");
req.setHeader("Authorization", "Bearer " + token);The script runs after the full response body has been received. Use the res object to inspect the result and write assertions or variables.
// Assert the status code.
test("status is 200", () => {
expect(res.getStatus()).toBe(200);
});
// Store a value from the response body for the next request.
const body = res.getBody(); // auto-parsed JSON when content-type is application/json
await bru.setEnvVar("userId", body.id);req (pre-request only)
| Method | Description |
|---|---|
req.url |
Current request URL (read) |
req.method |
HTTP method (read) |
req.getHeader(key) |
Get a request header value (case-insensitive) |
req.setHeader(key, value) |
Set or overwrite a request header |
req.getBody() |
Get the raw body string (text/json/xml modes) |
req.setBody(value) |
Replace the request body |
res (post-response only)
| Method | Description |
|---|---|
res.getStatus() |
HTTP status code |
res.getHeader(key) |
Get a response header (case-insensitive) |
res.getHeaders() |
All response headers as a key→value object |
res.getBody() |
Response body — auto-parsed JSON when content-type includes "json" |
bru (both scripts)
| Method | Description |
|---|---|
bru.getVar(name) |
Get a request-scoped variable (cleared after the request) |
bru.setVar(name, value) |
Set a request-scoped variable |
bru.getEnvVar(name) |
Get a variable from the active environment |
bru.setEnvVar(name, value) |
Write a variable to the active environment file |
bru.getCollectionVar(name) |
Get a collection-level variable |
bru.setCollectionVar(name, value) |
Set a collection-level variable |
bru.getGlobalVar(name) |
Get a global variable |
bru.setGlobalVar(name, value) |
Set a global variable |
bru.sendRequest({ method, url, headers?, body? }) |
Fire a sub-request; returns { status, headers, body } |
bru.random() |
Random alphanumeric string |
bru.timestamp() |
Current Unix timestamp in ms |
bru.isoDate() |
Current date as ISO 8601 string |
bru.randomInt(min?, max?) |
Random integer in range |
console
console.log, .info, .warn, .error — all output appears in the Scripts tab of the response panel after the request completes.
test and expect
test("name", () => {
expect(value).toBe(expected);
expect(value).toEqual(expected); // deep equality
expect(value).toBeTruthy();
expect(value).toBeFalsy();
expect(str).toContain(substr);
});Failed assertions render in the Scripts response tab with a red marker and the failure message. The request itself is not aborted when an assertion fails.
JavaScript runs in a Node vm context with a scoped API. require, process, and import are not available. The timeout is 5000 ms per script; an infinite loop is killed after that.
Scripts are stored in the .sman file under the scripts: key as YAML literal blocks, so multi-line code produces clean git diffs.
bru.runRequest("Other request name")— chaining by request name.- TypeScript inside scripts.
- Visualizer (
vis.set(...)) API.
0541c7f (feat(scripts): pre-request and post-response script sandbox)
Scrapeman reads collections from these formats:
OpenAPI 3.0.x / 3.1.x and Swagger 2.0 (importOpenApiSpec)
- Accepts JSON or YAML string — format is auto-detected from content
- Each
paths[*][method]operation becomes one request operationIdis used as the request name, thensummary, thenMETHOD /path- Operations grouped by
tags[0]into folders; untagged operations go to the workspace root - Server URL written to
base_urlenvironment variable; request URLs are{{base_url}}/path - Parameters:
in: querytoparams,in: headertoheaders,in: pathsubstituted as{{paramName}}in the URL - Request body: prefers
application/json, falls back throughapplication/xml,text/plain,application/x-www-form-urlencoded,multipart/form-data. Usesexample/exampleswhen present; otherwise generates a minimal example from the schema (max depth 5) $refresolution: local refs (#/components/schemas/Foo,#/definitions/Foo) are resolved. Remote URL refs are skipped with a warning- Auth schemes (
http:bearer,http:basic,apiKey,oauth2) mapped to request auth; secrets written as{{VAR_NAME}}placeholders in a generated environment (with empty values to fill in) - UI: "Import OpenAPI / Swagger" in the command palette or workspace menu. Enter a URL to fetch, or paste JSON/YAML. Preview shows endpoint count, tag list, and auth types before import
Postman Collection v2.1 (importPostmanCollection)
- Reads the standard Postman JSON export format
- Preserves folder hierarchy, auth (basic/bearer/apikey/oauth2/awsSigV4), headers, body modes (raw/json/xml/urlencoded/formdata/binary/graphql), and variables
- Unsupported features (scripts, unknown auth types) generate warnings
Bruno .bru folders (importBrunoFolder)
- Reads a directory of
.brufiles (Bruno's INI-like format) - Parses method blocks, headers, auth (bearer/basic), body (json/xml/text/form-urlencoded/multipart), and query/path params
- Folder hierarchy matches the directory structure
Insomnia v4 JSON (importInsomniaExport)
- Reads Insomnia v4 export files (
_type: "export",__export_format: 4) - Walks resources by type: request, request_group, environment, cookie_jar
- Maps _id/parentId to folder tree, maps all 5 auth types
- Cookie jars and workspaces generate warnings
HAR 1.2 (importHar)
- Reads Chrome DevTools HAR exports
- Each
log.entries[].requestbecomes one request - Handles JSON, XML, form, HTML, and text body types
- Skips HTTP/2 pseudo-headers
curl (already shipped before M9)
- Paste a curl command or import from file
- Parses -X, -H, -d, --data-*, -u, --cookie, -F, --proxy
HAR 1.2 (exportHar)
- Converts history entries to HAR format
- Maps request, response, timings, and query parameters
- Round-trip tested: import then export then re-import matches
Postman v2.1 exporter — planned (T093)
.sman bundle — planned (T097/T098): ZIP-based portable bundle containing .sman files, environments, and body sidecars. See planning/issues/sman-bundle-format.md.
Configure per-request in the Settings tab:
- Protocol: HTTP or HTTPS
- Host and Port
- Auth: username + password (basic auth to the proxy)
The proxy is applied via undici's ProxyAgent.
Flip the Scrape.do toggle in the Settings tab to route the request through Scrape.do's infrastructure:
- Residential rotation — automatic IP rotation from the residential pool
- JS rendering — headless browser renders the target page
- Geo targeting — route through a specific country
- Ban retry — automatic retry on detection/block responses
The main process rewrites the URL to api.scrape.do and injects the configured parameters. Your Scrape.do token is stored as a secret environment variable.
The Settings tab has a "User-Agent" section with a preset picker. Select a preset to change the User-Agent header sent with every request.
| Preset key | Label |
|---|---|
scrapeman |
Scrapeman <version> (default) |
chrome-macos |
Chrome 124 macOS |
chrome-windows |
Chrome 124 Windows |
firefox-macos |
Firefox 125 macOS |
firefox-windows |
Firefox 125 Windows |
safari-macos |
Safari 17 macOS |
safari-ios |
Safari 17 iOS |
googlebot |
Googlebot 2.1 |
curl |
curl 8.7 |
The selected UA string is shown as a preview below the picker. If you manually set a User-Agent header in the Headers tab, that value overrides the preset.
After every request, Scrapeman checks the response for anti-bot signals and displays a banner above the response body when one is found. The banner is dismissable and resets on the next send.
Detected signals:
| Signal | Trigger |
|---|---|
| Cloudflare | cf-ray header present, or HTTP 403 with Cloudflare browser-check body |
| Rate limited | HTTP 429 or Retry-After header |
| CAPTCHA | Body contains hcaptcha, recaptcha, captcha-container, or turnstile |
| Bot block | HTTP 403 with body matching access denied, bot detected, automated access, or automated request |
When a Retry-After header is present, the number of seconds to wait is shown in the banner.
Cloudflare is checked before ratelimit, ratelimit before CAPTCHA, CAPTCHA before bot block. Only one signal is shown per response.
Per-request rate limiting controls the delay the Collection Runner and Load Runner insert between requests. It has no effect on single-send.
Configure in the Settings tab under "Rate limit":
- Fixed delay — wait this many milliseconds after each request.
- Jitter min / max — add a random extra delay between min and max ms on top of the fixed delay.
Run-level delay (from the Load Runner config) and per-request rate limit stack: if the run-level delay is greater than 0, the request rate limit is not added on top.
Example: fixed delay 500ms, jitter 0–200ms → each request waits 500-700ms before the next.
Supply multiple proxy URLs and let Scrapeman rotate through them automatically.
In the Settings tab, toggle "Rotate through multiple proxies" and add proxy URLs one per line. Choose a strategy:
- Round-robin — cycles through the list in order; the position is shared across all concurrent slots in a run.
- Random — picks a random proxy for each request.
When a rotate list is non-empty, the single "URL" field is ignored. If the list is empty, the single URL is used as before.
Scrapeman uses tough-cookie (RFC 6265 compliant) for cookie management.
- Cookies survive app restarts. The jar is written to disk synchronously on every change.
- Cookies are scoped per workspace and per active environment.
- Set-Cookie headers from responses are automatically captured.
- Cookie headers are automatically injected on matching requests.
Open the Cookies panel from the sidebar (or the keyboard shortcut) to inspect, edit, and manage the jar for the current workspace and environment.
Filter: Type in the domain filter at the top to narrow the list. Clearing it restores all domains.
Add a cookie manually: Click + Add to open an inline form. Fields: name, value, domain, path (default /), expires (ISO date or blank for session), httpOnly, Secure, SameSite. Save inserts the cookie into the jar immediately.
Edit a cookie: Click any cookie row to open the same form pre-filled. Saving replaces the existing entry (delete + re-insert under the hood). The existing delete button (×) is still available on hover.
httpOnly masking: Cookies with httpOnly: true show •••••••• for the value by default. Click the eye icon to reveal the real value.
Export JSON: Exports the currently visible cookies (respecting the domain filter) as a pretty-printed JSON array and triggers a browser download (cookies.json). The shape matches CookieEntry from the Scrapeman type definitions.
Export Netscape: Exports cookies in Netscape cookies.txt format (tab-separated: domain, flag, path, secure, expires, name, value). Compatible with Playwright, Selenium, and curl (--cookie cookies.txt).
Import: Click Import and paste either:
- A
document.cookiestring:name1=val1; name2=val2— you must also enter the domain these cookies belong to. - A Netscape cookies.txt body — domain is read from each line; the domain field is ignored.
The format is detected automatically (presence of tabs signals Netscape format). Each parsed cookie is inserted into the jar immediately.
A VS Code-style git panel built on simple-git:
- Status bar: current branch name shown at the bottom of the window.
- Source Control panel: staged/unstaged file list in the sidebar.
- Stage/unstage: individual files or all at once.
- Commit: write a message and commit from the UI.
- Push/Pull: uses OS credential store (no SSH key management UI). Pull defaults to fast-forward; if branches have diverged a dialog prompts you to choose Rebase or Merge commit.
- Diff viewer: click a changed file to see a line-by-line diff (green/red, VS Code style).
- Per-request sync toggle:
Cmd+Shift+Hto exclude a request from git tracking.
The sidebar has a search input at the top of the collection tree.
- Real-time filter: type and the tree filters instantly (case-insensitive, substring match on request name and URL).
- Method prefix: type
GET /usersto filter by method and name/URL together. - Folder behavior: folders with zero matching descendants auto-hide. Matched folders auto-expand.
- Shortcuts:
Cmd+Shift+Ffocuses the search from anywhere.Cmd+Ffocuses it when the sidebar has focus.Escapeclears the filter and returns focus to the tree. - Empty state: "No requests match" message with a clear-filter button.
All shortcuts use Cmd on macOS and Ctrl on Windows/Linux.
| Shortcut | Action |
|---|---|
Cmd+N |
New tab (auto-focuses URL bar) |
Cmd+T |
New tab |
Cmd+W |
Close active tab (with dirty guard) |
Cmd+Shift+W |
Close all tabs (with dirty guard) |
Cmd+Enter |
Send request |
Cmd+S |
Save request |
Cmd+Shift+F |
Focus collection search |
Cmd+Shift+H |
Toggle git sync on active request |
| Shortcut | Action |
|---|---|
| Middle-click on tab | Close tab (with dirty guard) |
Cmd+1 through Cmd+9 |
Switch to tab N |
Open with Cmd+K. Type to filter any listed command.
| Command | Action |
|---|---|
| Add URL parameter | Switch to Params tab, focus the first empty Key cell (adds a row if all rows are filled) |
| Shortcut | Action |
|---|---|
Shift+Enter |
Insert new row below, focus Key cell |
Tab (from last row Key) |
Auto-append new row |
Tab into empty table |
Create first row and focus its Key cell |
When closing a tab with unsaved changes (any close method: Cmd+W, middle-click, close button, context menu, Cmd+Shift+W):
- A confirmation dialog shows the tab name and asks to save, discard, or cancel.
- "Don't ask again for this session" checkbox: when checked, subsequent dirty closes discard immediately. Resets on app restart.
- Built on undici 7 (Node.js official HTTP client).
- HTTP/1.1 and HTTP/2: toggle via
allowH2in per-request settings. Uses ALPN negotiation. - All HTTP methods: GET, POST, PUT, PATCH, DELETE, HEAD, OPTIONS, and custom verbs (PROPFIND, QUERY, etc.).
- Timeouts: connect, read, and total timeouts with AbortSignal cancellation.
- Response body cap: 200 MB max.
- Auto-decompression: gzip, brotli, and deflate responses decode automatically via Accept-Encoding.
- Method badge on each tab (GET green, POST blue, PUT yellow, DELETE red).
- Dirty indicator (dot) on unsaved tabs.
- Middle-click to close.
- Resizable + orientable split: toggle between horizontal and vertical layout, persisted in localStorage.
- Inter (variable) — UI text
- Geist Mono — code blocks, URL bar, response body, terminal-style panels
- Dark mode with CSS custom properties.
- System preference fallback.
- Design tokens:
--bg-white-0through--bg-sub-300,--text-strong-950through--text-disabled-300,--primary-base(#FF6C37, Scrape.do orange),--success-base,--error-base.
- Electron 33 + Vite (via electron-vite) + React 18 + TypeScript 5
- Tailwind CSS + Radix UI primitives (ContextMenu, Dialog, DropdownMenu, Tooltip)
- Zustand for renderer state
- undici 7 for HTTP (ProxyAgent, allowH2, AbortSignal)
- tough-cookie 5 for RFC 6265 cookie jar
- yaml for file format parse (custom deterministic serializer)
- aws4 for Signature v4 signing
- chokidar 4 for workspace file watching
- pnpm workspaces monorepo
Apache 2.0. See LICENSE. The name "Scrapeman" and the Scrape.do logo are trademarks of Scrape.do and are not covered by the license.