Add AI agent detection to user-agent string#1327
Conversation
Detect known AI coding agents (Claude Code, Cursor, Cline, Codex, Gemini CLI, OpenCode, Antigravity) via environment variables and append agent/<name> to the User-Agent header. Uses the same detection logic as the Go SDK: report when exactly one agent is detected, return empty on zero or multiple matches (ambiguity guard). Co-authored-by: Isaac
Verifies that agent_provider() caches its result on first call and returns the cached value on subsequent calls, even if the environment changes. Co-authored-by: Isaac
simonfaltum
left a comment
There was a problem hiding this comment.
Review (automated, 2 agents)
Verdict: Approved
0 Critical | 0 Major | 1 Gap | 2 Nit | 2 Suggestion
See inline comments for details.
| del os.environ["CURSOR_AGENT"] | ||
| os.environ["CLAUDECODE"] = "1" | ||
|
|
||
| assert useragent.agent_provider() == "cursor" |
There was a problem hiding this comment.
[Gap (Nit)] No test for to_string() with multiple agents
There are tests for to_string() with one agent and zero agents, and a unit test for agent_provider() returning "" with multiple agents. But there's no end-to-end test confirming to_string() excludes agent/ when multiple agents are set. Coverage gap is marginal since if agent: on "" is trivially falsy.
Suggestion: Add test_user_agent_string_multiple_agents that sets two agent vars and asserts "agent/" not in ua. Low priority.
| original_env = os.environ.copy() | ||
| os.environ.clear() | ||
|
|
||
| # Clear cached CICD provider. | ||
| # Clear cached CICD and agent providers. | ||
| from databricks.sdk import useragent | ||
|
|
||
| useragent._cicd_provider = None | ||
| useragent._agent_provider = None | ||
|
|
||
| yield | ||
|
|
||
| # Restore env vars. | ||
| os.environ = original_env | ||
| # Restore env vars and reset caches. | ||
| os.environ.clear() | ||
| os.environ.update(original_env) | ||
| useragent._cicd_provider = None | ||
| useragent._agent_provider = None |
There was a problem hiding this comment.
[Nit] Fixture name clear_cicd is misleading
This fixture now resets both _cicd_provider and _agent_provider, and its comments say "Clear cached CICD and agent providers." The name no longer reflects its scope.
Suggestion: Rename to clear_ua_caches or clean_useragent_env.
| notebook_content = io.BytesIO(b""" | ||
| from databricks.sdk import WorkspaceClient | ||
| w = WorkspaceClient() | ||
| me = w.current_user.me() | ||
| print(me.user_name)""" | ||
| ) | ||
| print(me.user_name)""") | ||
|
|
||
| from databricks.sdk.service.workspace import Language | ||
|
|
There was a problem hiding this comment.
[Nit] Unrelated formatting changes bundled in the PR
This file and similar changes in test_config.py and test_core.py include cosmetic reformatting of multi-line string arguments unrelated to agent detection. Makes the diff noisier.
Suggestion: Consider splitting formatting changes into a separate commit, or note them in the PR description.
| # Restore env vars and reset caches. | ||
| os.environ.clear() | ||
| os.environ.update(original_env) | ||
| useragent._cicd_provider = None | ||
| useragent._agent_provider = None |
There was a problem hiding this comment.
[Suggestion] Fixture teardown os.environ restore is a correctness improvement worth noting
The old os.environ = original_env replaced the os._Environ mapping object with a plain dict (from .copy()), subtly breaking os.putenv sync. The new .clear() + .update() approach correctly modifies the mapping in place. This is a genuine fix.
Suggestion: Worth calling out in the PR description as an intentional fix so reviewers don't mistake it for a no-op refactor.
|
Saw this build / fmt error |
Rename clear_cicd fixture to clean_useragent_env to reflect its expanded scope. Add test_user_agent_string_multiple_agents for end-to-end coverage. Revert unrelated formatting changes in test_auth.py, test_config.py, test_core.py to fix CI fmt check. Co-authored-by: Isaac
We have data from direct testing in Copilot CLI confirming it sets COPILOT_CLI=1 in its environment. Add it to the canonical agent list. Co-authored-by: Isaac
Tests in test_config.py and test_core.py only reset _cicd_provider, not the new _agent_provider cache or the _extra list. This caused order-dependent failures when these tests ran after test_user_agent.py. Switch from direct assignment to monkeypatch.setattr so all three module globals (_extra, _cicd_provider, _agent_provider) are properly reset and restored on teardown.
…ection # Conflicts: # tests/test_config.py
Co-authored-by: Isaac
|
If integration tests don't run automatically, an authorized user can run them manually by following the instructions below: Trigger: Inputs:
Checks will be approved automatically on success. |
Summary
agent/<name>to the User-Agent HTTP headerThe
COPILOT_CLIenv var was confirmed by direct testing in Copilot CLI.The
OPENCLAW_SHELLenv var uses context-qualified values (exec,acp,acp-client); any non-empty value triggers detection.Part of cross-SDK agent tracking effort. Go SDK implementation: databricks/databricks-sdk-go#1537.
Test plan