Skip to content

Add AI agent detection to user-agent string#1327

Open
simonfaltum wants to merge 9 commits intomainfrom
simonfaltum/agent-detection
Open

Add AI agent detection to user-agent string#1327
simonfaltum wants to merge 9 commits intomainfrom
simonfaltum/agent-detection

Conversation

@simonfaltum
Copy link
Member

@simonfaltum simonfaltum commented Mar 12, 2026

Summary

  • Detect known AI coding agents via environment variables and append agent/<name> to the User-Agent HTTP header
  • Canonical agent list (synced with Go SDK): Antigravity, Claude Code, Cline, Codex, Copilot CLI, Cursor, Gemini CLI, OpenCode, OpenClaw
  • Ambiguity guard: reports only when exactly one agent is detected, returns empty on zero or multiple matches

The COPILOT_CLI env var was confirmed by direct testing in Copilot CLI.
The OPENCLAW_SHELL env var uses context-qualified values (exec, acp, acp-client); any non-empty value triggers detection.

Part of cross-SDK agent tracking effort. Go SDK implementation: databricks/databricks-sdk-go#1537.

Test plan

  • 14 new agent detection tests (each agent individually, no agent, multiple agents, empty value, UA string inclusion/exclusion)
  • Existing CI/CD and user-agent tests still pass
  • Cache resets updated in test_config.py and test_core.py to prevent cross-test contamination

Detect known AI coding agents (Claude Code, Cursor, Cline, Codex,
Gemini CLI, OpenCode, Antigravity) via environment variables and
append agent/<name> to the User-Agent header. Uses the same detection
logic as the Go SDK: report when exactly one agent is detected,
return empty on zero or multiple matches (ambiguity guard).

Co-authored-by: Isaac
Verifies that agent_provider() caches its result on first call and
returns the cached value on subsequent calls, even if the environment
changes.

Co-authored-by: Isaac
Copy link
Member Author

@simonfaltum simonfaltum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review (automated, 2 agents)

Verdict: Approved

0 Critical | 0 Major | 1 Gap | 2 Nit | 2 Suggestion

See inline comments for details.

del os.environ["CURSOR_AGENT"]
os.environ["CLAUDECODE"] = "1"

assert useragent.agent_provider() == "cursor"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Gap (Nit)] No test for to_string() with multiple agents

There are tests for to_string() with one agent and zero agents, and a unit test for agent_provider() returning "" with multiple agents. But there's no end-to-end test confirming to_string() excludes agent/ when multiple agents are set. Coverage gap is marginal since if agent: on "" is trivially falsy.

Suggestion: Add test_user_agent_string_multiple_agents that sets two agent vars and asserts "agent/" not in ua. Low priority.

Comment on lines 68 to +83
original_env = os.environ.copy()
os.environ.clear()

# Clear cached CICD provider.
# Clear cached CICD and agent providers.
from databricks.sdk import useragent

useragent._cicd_provider = None
useragent._agent_provider = None

yield

# Restore env vars.
os.environ = original_env
# Restore env vars and reset caches.
os.environ.clear()
os.environ.update(original_env)
useragent._cicd_provider = None
useragent._agent_provider = None
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Nit] Fixture name clear_cicd is misleading

This fixture now resets both _cicd_provider and _agent_provider, and its comments say "Clear cached CICD and agent providers." The name no longer reflects its scope.

Suggestion: Rename to clear_ua_caches or clean_useragent_env.

Comment on lines +141 to 148
notebook_content = io.BytesIO(b"""
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
me = w.current_user.me()
print(me.user_name)"""
)
print(me.user_name)""")

from databricks.sdk.service.workspace import Language

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Nit] Unrelated formatting changes bundled in the PR

This file and similar changes in test_config.py and test_core.py include cosmetic reformatting of multi-line string arguments unrelated to agent detection. Makes the diff noisier.

Suggestion: Consider splitting formatting changes into a separate commit, or note them in the PR description.

Comment on lines +79 to +83
# Restore env vars and reset caches.
os.environ.clear()
os.environ.update(original_env)
useragent._cicd_provider = None
useragent._agent_provider = None
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Fixture teardown os.environ restore is a correctness improvement worth noting

The old os.environ = original_env replaced the os._Environ mapping object with a plain dict (from .copy()), subtly breaking os.putenv sync. The new .clear() + .update() approach correctly modifies the mapping in place. This is a genuine fix.

Suggestion: Worth calling out in the PR description as an intentional fix so reviewers don't mistake it for a no-op refactor.

@simonfaltum
Copy link
Member Author

Saw this build / fmt error

Run git diff --exit-code
diff --git a/tests/integration/test_auth.py b/tests/integration/test_auth.py
index be30922..ba5c42e 100644
--- a/tests/integration/test_auth.py
+++ b/tests/integration/test_auth.py
@@ -138,11 +138,13 @@ def _test_runtime_auth_from_jobs_inner(w, env_or_skip, random, dbr_versions, lib
 
     my_name = w.current_user.me().user_name
     notebook_path = f"/Users/{my_name}/notebook-native-auth"
-    notebook_content = io.BytesIO(b"""
+    notebook_content = io.BytesIO(
+        b"""
 from databricks.sdk import WorkspaceClient
 w = WorkspaceClient()
 me = w.current_user.me()
-print(me.user_name)""")
+print(me.user_name)"""
+    )
 
     from databricks.sdk.service.workspace import Language
 
diff --git a/tests/test_config.py b/tests/test_config.py
index c95e3ce..7a2286d 100644
--- a/tests/test_config.py
+++ b/tests/test_config.py
@@ -137,7 +137,8 @@ def write_large_dummy_executable(path: pathlib.Path):
 
     # Generate a long random string to inflate the file size.
     random_string = "".join(random.choice(string.ascii_letters) for i in range(1024 * 1024))
-    cli.write_text("""#!/bin/sh
+    cli.write_text(
+        """#!/bin/sh
 cat <<EOF
 {
 "access_token": "...",
@@ -146,7 +147,9 @@ cat <<EOF
 }
 EOF
 exit 0
-""" + random_string)
+"""
+        + random_string
+    )
     cli.chmod(0o755)
     assert cli.stat().st_size >= (1024 * 1024)
     return cli
diff --git a/tests/test_core.py b/tests/test_core.py
index 604e175..2e64a15 100644
--- a/tests/test_core.py
+++ b/tests/test_core.py
@@ -85,7 +85,8 @@ def write_large_dummy_executable(path: pathlib.Path):
 
     # Generate a long random string to inflate the file size.
     random_string = "".join(random.choice(string.ascii_letters) for i in range(1024 * 1024))
-    cli.write_text("""#!/bin/sh
+    cli.write_text(
+        """#!/bin/sh
 cat <<EOF
 {
 "access_token": "token",
@@ -94,7 +95,9 @@ cat <<EOF
 }
 EOF
 exit 0
-""" + random_string)
+"""
+        + random_string
+    )
     cli.chmod(0o755)
     assert cli.stat().st_size >= (1024 * 1024)
     return cli
Error: Process completed with exit code 1.

Rename clear_cicd fixture to clean_useragent_env to reflect its
expanded scope. Add test_user_agent_string_multiple_agents for
end-to-end coverage. Revert unrelated formatting changes in
test_auth.py, test_config.py, test_core.py to fix CI fmt check.

Co-authored-by: Isaac
@simonfaltum simonfaltum marked this pull request as ready for review March 12, 2026 13:56
We have data from direct testing in Copilot CLI confirming it sets
COPILOT_CLI=1 in its environment. Add it to the canonical agent list.

Co-authored-by: Isaac
Tests in test_config.py and test_core.py only reset _cicd_provider,
not the new _agent_provider cache or the _extra list. This caused
order-dependent failures when these tests ran after test_user_agent.py.

Switch from direct assignment to monkeypatch.setattr so all three
module globals (_extra, _cicd_provider, _agent_provider) are properly
reset and restored on teardown.
@github-actions
Copy link

If integration tests don't run automatically, an authorized user can run them manually by following the instructions below:

Trigger:
go/deco-tests-run/sdk-py

Inputs:

  • PR number: 1327
  • Commit SHA: c49bad8474c03245995ec1bfc29cdafbf9323a2a

Checks will be approved automatically on success.

@simonfaltum simonfaltum deployed to test-trigger-is March 15, 2026 16:26 — with GitHub Actions Active
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant