Build Apps For AI Agent. #5050
Closed
zhangweijp
started this conversation in
Ideas
Replies: 1 comment 1 reply
-
|
This has nothing to do with Preact and is just AI junk. Don't create discussions like this, it'll just result in them being deleted & your account banned from the organization. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello everyone, i build a agent runtime based on Preact & LinkeDOM that can connects AI Agent to Agent Oriented Text-based UI Apps(I called it "Agent Apps"). These applications are installed within the LLM context, allowing the AI Agent to interact with the applications in real-time and dynamically through the tools exposed by those applications.
Rethinking UI for the Age of AI Agents
For decades, user interfaces have been designed exclusively for humans — graphical, interactive, and optimized for eyes and hands. But as Large Language Models (LLMs) emerge as a new class of users, we face a fundamental question:
This is the question that Agent-Oriented TUI (AOTUI) was built to answer.
What: A New Interface Paradigm
Agent-Oriented Text-based User Interface (AOTUI) is an interface paradigm where LLM agents are first-class citizens.
Instead of rendering pixels for human eyes, AOTUI renders semantic Markdown text for LLM context windows. Instead of mouse clicks, agents invoke typed function calls. Instead of visual cues (colors, layouts, avatars), data is referenced through textual references.
In short: AOTUI is what a user interface looks like when you design it for a model, not a human.
Why: The Fundamental Mismatch
Every design decision in a traditional GUI exists to serve three specific human capabilities:
CSS exists because humans see. Mouse events exist because humans have hands. Animations exist because humans perceive change continuously.
LLMs have none of these capabilities.
The Core Insight: Humans and LLMs experience reality through fundamentally different modalities. This difference demands a fundamentally different interface paradigm.
How: From Constraints to Design
Let's walk through each constraint — and notice that most of them are actually good news.
No Vision → No Rendering Complexity Needed
Without the need to produce pixels for human eyes, we don't need a full rendering engine, pixel-perfect layouts, or CSS. A semantic text format is both sufficient and preferable. Instead of fighting this constraint, we embrace it: render Markdown, not pixels.
No Continuous Perception → Simpler State Model
An LLM doesn't watch a UI stream change over time. It reads a complete snapshot of the current state, reasons, and acts. This actually simplifies the state model considerably — no animations, no partial states, no transitions. Each interaction is a clean read → reason → act cycle. Also good news.
No Hands → The Real Problem
Here's where it gets harder.
No keyboard? Not actually a problem. A keyboard gives humans a way to input text. LLMs natively output text. They don't need a keyboard — they are the keyboard.
No mouse? This is the real problem. Without a mouse, the LLM cannot point, select, or trigger actions in any traditional UI. This is the capability gap that AOTUI was built to close.
To understand why, we need to look at what a mouse actually does.
What a Mouse Actually Does
Every mouse interaction is, at its core, one of two operations:
Let's trace a concrete example.
You want to message JY Chen on WeChat.
Under the hood, your interactions translated to something like:
You only ever saw a name and an avatar. You never touched a user ID or a server address. The visual interface captured the complexity and hid it from you — surfacing only what you needed to make a decision, and silently binding the rest.
An LLM has no such bridge. It can't see avatars. It can't click. AOTUI's job is to rebuild that bridge in text.
How AOTUI Rebuilds the Bridge
AOTUI solves all three parts of the problem — recognition, selection, and triggering — for agents operating without a mouse.
1. Recognition: Label Data in Text
Instead of rendering an avatar, AOTUI exposes data as labeled text within a structured View:
A View is a clearly bounded, self-contained unit of context — the text equivalent of a screen or panel. The LLM "recognizes" JY Chen by reading the label, exactly as a human recognizes him by seeing the avatar.
2. Selection: Typed References
The label alone isn't enough. The LLM also needs a way to reference the selected data when invoking an action. AOTUI embeds typed references directly alongside each label:
The format is
[Human-readable label](Type:reference). When the LLM wants to "select" JY Chen, it uses the referencecontacts[2]as a parameter. At execution time, the runtime resolves this path against its index — retrieving the full underlying data object (user_id,serverId,encryptKey, and whatever else the application needs) — and passes it to the function.The LLM never sees any of that. Just like you never saw
jy_chen_id_392. AOTUI shields the LLM from implementation details it has no reason to care about, while still giving it precise, unambiguous references to act on.3. Triggering: Tools as Typed Function Calls
LLMs natively produce structured function calls — this is precisely what tool-calling is designed for. AOTUI maps every interactive element to a typed Tool the LLM can invoke:
No mouse needed. The LLM calls the function.
**Design Principle**: Tools trigger *state transitions*; they don't return large data payloads. Data always flows through View updates in the next Snapshot — not through tool call results.Full Example: Messaging JY Chen
Let's replay the WeChat scenario, now fully in AOTUI.
Step 1 — Application sends a Snapshot
Step 2 — LLM receives instruction: "Send 'Hello!' to JY Chen"
The LLM reads the snapshot, recognizes
contacts[2]as JY Chen, and constructs the call:{ "tool": "send_message", "arguments": { "recipient": "contacts[2]", "message": "Hello!" } }Step 3 — Application resolves and executes
contacts[2]→{ id: "jy_chen_id_392", name: "JY Chen" }→ message sent.Step 4 — Updated Snapshot arrives
The LLM now operates on fresh context. No rendered pixels. No mouse. No CSS. Just clean structured text and typed function calls — flowing through a read → reason → act cycle.
Implementation Architecture
You might ask: "If we're building for text-only LLMs, why use HTML and JavaScript at all?"
Because the web ecosystem is mature. AOTUI uses HTML as an intermediate representation — developers write familiar JSX/Preact components, which render to HTML in a lightweight virtual DOM, which is then transformed into LLM-readable Markdown:
This architecture lets developers use familiar tools while the framework handles the complexity of semantic text generation.
Summary
Type:reference)View AgentOrientedTUI Repo
Beta Was this translation helpful? Give feedback.
All reactions