You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have verified that this discussion would not be more appropriate as an issue in a specific repository
I have searched existing discussions to avoid duplicates
Discussion Topic
I built an MCP server that maps Chrome's Accessibility Tree to a virtual filesystem. Instead of screenshots and pixel coordinates, agents use ls, cd, grep, click, and type to navigate pages — the same way you'd work in a terminal.
Most browser automation feeds agents raw HTML or screenshots. The model burns through tool calls just figuring out what's on the page. Chrome's AX tree already solves this — it's a structured, role-annotated representation of the DOM that strips out layout noise and keeps semantics. DOMShell flattens it aggressively and maps it to a filesystem metaphor so agents can scope their work the way you'd cd into a directory.
In controlled testing (Claude, 4 web tasks, 8 trials), this cut average API calls per task from 8.6 to 4.3 compared to screenshot-based browsing. Full experiment data →
What It Looks Like
Agent: "Find the references on this Wikipedia article and extract the links"
1. ls → page sections (navigation/, main/, complementary/)
2. cd main/article → scope to article content
3. grep -r references → find the references section
4. cd references → enter it
5. find --type link --meta → all links with URLs in one call
38 tools across three security tiers (read-only by default, write requires --allow-write). Architecture and full tool list in the README.
What I'm Working On
Currently testing with smaller local models (Qwen3-4B, Llama3.2-3B) to see how the filesystem metaphor holds up with tighter context windows. Also exploring headless mode for CI pipelines. Would love feedback on the tool design or if anyone's tried a similar AX-tree approach.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Pre-submission Checklist
Discussion Topic
I built an MCP server that maps Chrome's Accessibility Tree to a virtual filesystem. Instead of screenshots and pixel coordinates, agents use
ls,cd,grep,click, andtypeto navigate pages — the same way you'd work in a terminal.GitHub: github.com/apireno/DOMShell
npm:
npx @apireno/domshellWhy the Accessibility Tree?
Most browser automation feeds agents raw HTML or screenshots. The model burns through tool calls just figuring out what's on the page. Chrome's AX tree already solves this — it's a structured, role-annotated representation of the DOM that strips out layout noise and keeps semantics. DOMShell flattens it aggressively and maps it to a filesystem metaphor so agents can scope their work the way you'd
cdinto a directory.In controlled testing (Claude, 4 web tasks, 8 trials), this cut average API calls per task from 8.6 to 4.3 compared to screenshot-based browsing. Full experiment data →
What It Looks Like
Five calls. No screenshots. No coordinate math.
Quick Start
{ "mcpServers": { "domshell": { "command": "npx", "args": ["-y", "@apireno/domshell", "--allow-write"] } } }38 tools across three security tiers (read-only by default, write requires
--allow-write). Architecture and full tool list in the README.What I'm Working On
Currently testing with smaller local models (Qwen3-4B, Llama3.2-3B) to see how the filesystem metaphor holds up with tighter context windows. Also exploring headless mode for CI pipelines. Would love feedback on the tool design or if anyone's tried a similar AX-tree approach.
Beta Was this translation helpful? Give feedback.
All reactions