-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Open
Labels
⚙️ In-progressIssues, Features requests that are in ProgressIssues, Features requests that are in Progress✨ EnhancementImprovement on an existing featureImprovement on an existing feature
Milestone
Description
Add opt-in telemetry across environments: Python library, Docker/API server, and interactive notebooks. Goal: capture exceptions/crashes to improve stability. Telemetry module must be provider-agnostic (Sentry as first backend, easily swappable).
1. Python Library & CLI (pip install)
- Trigger: on first exception/crash.
- Prompt user (CLI):
We noticed an error. Help improve Crawl4AI by sending crash reports?
[1] Yes (send this error only)
[2] Yes, always (send all errors)
[3] No
(Optional: enter email so we can follow up, press Enter to skip)
- Persistence: choice + email stored in ~/.crawl4ai/config.json.
- Control: CLI command
crwl telemetry enable|disable [--email [email protected]]
2. Docker / API Server
- Default: telemetry enabled.
- Control: CRAWL4AI_TELEMETRY=0 disables.
- Behavior: exceptions auto-sent, no interactive prompt.
3. Jupyter / Google Colab / Notebook Environments
- First Exception:
- Try showing inline rich-text prompt via IPython.display + widgets:
🚨 Crawl4AI error detected.
Help us improve by sending a crash report?
[Yes] [Yes, always] [No]
(Optional email: ________)
-
Persist choice in ~/.crawl4ai/config.json.
-
If widgets unavailable (e.g. Colab stripped UI):
- Fallback passive print:
Crawl4AI error detected. Telemetry is OFF.
To enable: crawl4ai.telemetry.enable(email="[email protected]", always=True)
- Control: same API:
import crawl4ai
crawl4ai.telemetry.enable(email="[email protected]", always=True)
crawl4ai.telemetry.disable()
4. Implementation Guidelines
Telemetry should not be tightly coupled to Sentry. Create a pluggable provider interface
# telemetry/base.py
class TelemetryProvider:
def __init__(self, **kwargs):
pass
def send_exception(self, exc: Exception, context: dict = None):
"""Send an exception with optional context (email, environment)."""
raise NotImplementedError
def send_event(self, event_name: str, payload: dict = None):
"""Send a generic telemetry event (non-crash)."""
raise NotImplementedError
# telemetry/sentry_provider.py
from telemetry.base import TelemetryProvider
import sentry_sdk
class SentryProvider(TelemetryProvider):
def __init__(self, dsn: str, **kwargs):
sentry_sdk.init(dsn=dsn)
def send_exception(self, exc: Exception, context: dict = None):
with sentry_sdk.push_scope() as scope:
if context and "email" in context:
scope.set_user({"email": context["email"]})
sentry_sdk.capture_exception(exc)
def send_event(self, event_name: str, payload: dict = None):
sentry_sdk.capture_message(f"{event_name}: {payload}")
5. Sentry DSN Handling
The Sentry DSN is owned by the Crawl4AI maintainers. DSN should not be stored in user config.
Strategy:
- Hardcode DSN inside the library (safe, DSN public part is meant to be embedded).
- Allow override with environment variable CRAWL4AI_SENTRY_DSN for maintainers (e.g. in Docker, CI/CD).
- User config (~/.crawl4ai/config.json) only stores:
{
"telemetry": {
"enabled": true,
"email": "[email protected]"
}
}
6. Common Rules
- Only exceptions and crashes are reported.
- No URLs, request data, or PII collected.
- Email is optional, stored only if explicitly provided.
- Implementation lives in telemetry.py
- README must document usage for CLI, Docker, and Notebooks.
Acceptance Criteria:
- Library: first crash shows prompt, choice respected thereafter.
- Docker/API: telemetry enabled by default, disable via env var.
- Notebook: inline prompt if possible, fallback passive message + API.
- Choices + email saved and respected across runs.
- No PII leakage.
- Telemetry decoupled from Sentry via provider abstraction.
- DSN baked into library, override possible via env var, never stored in user config.
Metadata
Metadata
Assignees
Labels
⚙️ In-progressIssues, Features requests that are in ProgressIssues, Features requests that are in Progress✨ EnhancementImprovement on an existing featureImprovement on an existing feature