Skip to content

Telemetry & Error Reporting (Library, API Server, Notebooks) #1409

@aravindkarnam

Description

@aravindkarnam

Add opt-in telemetry across environments: Python library, Docker/API server, and interactive notebooks. Goal: capture exceptions/crashes to improve stability. Telemetry module must be provider-agnostic (Sentry as first backend, easily swappable).

1. Python Library & CLI (pip install)

  1. Trigger: on first exception/crash.
  2. Prompt user (CLI):
We noticed an error. Help improve Crawl4AI by sending crash reports?
[1] Yes (send this error only)  
[2] Yes, always (send all errors)  
[3] No  
(Optional: enter email so we can follow up, press Enter to skip)
  1. Persistence: choice + email stored in ~/.crawl4ai/config.json.
  2. Control: CLI command
    crwl telemetry enable|disable [--email [email protected]]

2. Docker / API Server

  1. Default: telemetry enabled.
  2. Control: CRAWL4AI_TELEMETRY=0 disables.
  3. Behavior: exceptions auto-sent, no interactive prompt.

3. Jupyter / Google Colab / Notebook Environments

  1. First Exception:
  • Try showing inline rich-text prompt via IPython.display + widgets:
🚨 Crawl4AI error detected.
Help us improve by sending a crash report?
[Yes] [Yes, always] [No]
(Optional email: ________)
  1. Persist choice in ~/.crawl4ai/config.json.

  2. If widgets unavailable (e.g. Colab stripped UI):

  • Fallback passive print:
Crawl4AI error detected. Telemetry is OFF.
To enable: crawl4ai.telemetry.enable(email="[email protected]", always=True)
  1. Control: same API:
import crawl4ai
crawl4ai.telemetry.enable(email="[email protected]", always=True)
crawl4ai.telemetry.disable()

4. Implementation Guidelines

Telemetry should not be tightly coupled to Sentry. Create a pluggable provider interface

# telemetry/base.py
class TelemetryProvider:
    def __init__(self, **kwargs):
        pass

    def send_exception(self, exc: Exception, context: dict = None):
        """Send an exception with optional context (email, environment)."""
        raise NotImplementedError

    def send_event(self, event_name: str, payload: dict = None):
        """Send a generic telemetry event (non-crash)."""
        raise NotImplementedError

# telemetry/sentry_provider.py
from telemetry.base import TelemetryProvider
import sentry_sdk

class SentryProvider(TelemetryProvider):
    def __init__(self, dsn: str, **kwargs):
        sentry_sdk.init(dsn=dsn)

    def send_exception(self, exc: Exception, context: dict = None):
        with sentry_sdk.push_scope() as scope:
            if context and "email" in context:
                scope.set_user({"email": context["email"]})
            sentry_sdk.capture_exception(exc)

    def send_event(self, event_name: str, payload: dict = None):
        sentry_sdk.capture_message(f"{event_name}: {payload}")

5. Sentry DSN Handling

The Sentry DSN is owned by the Crawl4AI maintainers. DSN should not be stored in user config.

Strategy:

  • Hardcode DSN inside the library (safe, DSN public part is meant to be embedded).
  • Allow override with environment variable CRAWL4AI_SENTRY_DSN for maintainers (e.g. in Docker, CI/CD).
  • User config (~/.crawl4ai/config.json) only stores:
{
  "telemetry": {
    "enabled": true,
    "email": "[email protected]"
  }
}

6. Common Rules

  • Only exceptions and crashes are reported.
  • No URLs, request data, or PII collected.
  • Email is optional, stored only if explicitly provided.
  • Implementation lives in telemetry.py
  • README must document usage for CLI, Docker, and Notebooks.

Acceptance Criteria:

  1. Library: first crash shows prompt, choice respected thereafter.
  2. Docker/API: telemetry enabled by default, disable via env var.
  3. Notebook: inline prompt if possible, fallback passive message + API.
  4. Choices + email saved and respected across runs.
  5. No PII leakage.
  6. Telemetry decoupled from Sentry via provider abstraction.
  7. DSN baked into library, override possible via env var, never stored in user config.

Metadata

Metadata

Assignees

Labels

⚙️ In-progressIssues, Features requests that are in Progress✨ EnhancementImprovement on an existing feature

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions