Build Apps For AI Agent. #5050

zhangweijp · 2026-03-09T06:36:03Z

zhangweijp
Mar 9, 2026

Hello everyone, i build a agent runtime based on Preact & LinkeDOM that can connects AI Agent to Agent Oriented Text-based UI Apps(I called it "Agent Apps"). These applications are installed within the LLM context, allowing the AI Agent to interact with the applications in real-time and dynamically through the tools exposed by those applications.

Rethinking UI for the Age of AI Agents

For decades, user interfaces have been designed exclusively for humans — graphical, interactive, and optimized for eyes and hands. But as Large Language Models (LLMs) emerge as a new class of users, we face a fundamental question:

What does a user interface look like when the user isn't human?

This is the question that Agent-Oriented TUI (AOTUI) was built to answer.

What: A New Interface Paradigm

Agent-Oriented Text-based User Interface (AOTUI) is an interface paradigm where LLM agents are first-class citizens.

Instead of rendering pixels for human eyes, AOTUI renders semantic Markdown text for LLM context windows. Instead of mouse clicks, agents invoke typed function calls. Instead of visual cues (colors, layouts, avatars), data is referenced through textual references.

In short: AOTUI is what a user interface looks like when you design it for a model, not a human.

Why: The Fundamental Mismatch

Every design decision in a traditional GUI exists to serve three specific human capabilities:

Human Capability	What it enables in GUI
Vision	Perceive colors, layouts, spatial relationships, icons
Hands	Click, drag, and type via mouse and keyboard
Continuous perception	Experience a live, flowing UI — hover states, animations, real-time feedback

CSS exists because humans see. Mouse events exist because humans have hands. Animations exist because humans perceive change continuously.

LLMs have none of these capabilities.

What LLMs lack	Implication
No vision	CSS, colors, and layouts are invisible — entirely meaningless tokens
No hands	No mouse to hover, click, or drag
No continuous perception	Doesn't experience a flowing stream of UI — only reads a single, static snapshot at each moment

The Core Insight: Humans and LLMs experience reality through fundamentally different modalities. This difference demands a fundamentally different interface paradigm.

How: From Constraints to Design

Let's walk through each constraint — and notice that most of them are actually good news.

No Vision → No Rendering Complexity Needed

Without the need to produce pixels for human eyes, we don't need a full rendering engine, pixel-perfect layouts, or CSS. A semantic text format is both sufficient and preferable. Instead of fighting this constraint, we embrace it: render Markdown, not pixels.

No Continuous Perception → Simpler State Model

An LLM doesn't watch a UI stream change over time. It reads a complete snapshot of the current state, reasons, and acts. This actually simplifies the state model considerably — no animations, no partial states, no transitions. Each interaction is a clean read → reason → act cycle. Also good news.

No Hands → The Real Problem

Here's where it gets harder.

No keyboard? Not actually a problem. A keyboard gives humans a way to input text. LLMs natively output text. They don't need a keyboard — they are the keyboard.

No mouse? This is the real problem. Without a mouse, the LLM cannot point, select, or trigger actions in any traditional UI. This is the capability gap that AOTUI was built to close.

To understand why, we need to look at what a mouse actually does.

What a Mouse Actually Does

Every mouse interaction is, at its core, one of two operations:

Selection — choosing which data to operate on
Triggering — invoking a command

Let's trace a concrete example.

You want to message JY Chen on WeChat.

Recognition: WeChat renders JY Chen's avatar and name. You recognize that card as JY Chen — not by any internal ID, but by the visual representation the app provides.
Selection: You click the card. Visually, it's a tap on a UI element. But what actually happens behind the scenes is that the app binds the underlying data object to your current context — JY Chen's internal user ID, server address, and other metadata you've never seen and don't need to care about.
Triggering: You type "Hello!" and click Send. The app constructs a function call using the data captured in step 2 as a parameter — you supplied the message text, the app supplied everything else.

Under the hood, your interactions translated to something like:

// Step 2: clicking the card silently bound this data
selectedContact = { id: "jy_chen_id_392", serverId: "sz-01", encryptKey: "..." }

// Step 3: clicking Send invoked this call — using the bound data
sendMessage(recipient: selectedContact, message: "Hello!")

You only ever saw a name and an avatar. You never touched a user ID or a server address. The visual interface captured the complexity and hid it from you — surfacing only what you needed to make a decision, and silently binding the rest.

An LLM has no such bridge. It can't see avatars. It can't click. AOTUI's job is to rebuild that bridge in text.

How AOTUI Rebuilds the Bridge

AOTUI solves all three parts of the problem — recognition, selection, and triggering — for agents operating without a mouse.

1. Recognition: Label Data in Text

Instead of rendering an avatar, AOTUI exposes data as labeled text within a structured View:

<view id="contacts" type="ContactList" name="Contact List" app_id="wechat">
  ## Contacts

  - [Wills Guo](Contact:contacts[0]) — Online
  - [Emma Chen](Contact:contacts[1]) — Away
  - [JY Chen](Contact:contacts[2]) — Online
</view>

A View is a clearly bounded, self-contained unit of context — the text equivalent of a screen or panel. The LLM "recognizes" JY Chen by reading the label, exactly as a human recognizes him by seeing the avatar.

2. Selection: Typed References

The label alone isn't enough. The LLM also needs a way to reference the selected data when invoking an action. AOTUI embeds typed references directly alongside each label:

[JY Chen](Contact:contacts[2])

The format is [Human-readable label](Type:reference). When the LLM wants to "select" JY Chen, it uses the reference contacts[2] as a parameter. At execution time, the runtime resolves this path against its index — retrieving the full underlying data object (user_id, serverId, encryptKey, and whatever else the application needs) — and passes it to the function.

The LLM never sees any of that. Just like you never saw jy_chen_id_392. AOTUI shields the LLM from implementation details it has no reason to care about, while still giving it precise, unambiguous references to act on.

3. Triggering: Tools as Typed Function Calls

LLMs natively produce structured function calls — this is precisely what tool-calling is designed for. AOTUI maps every interactive element to a typed Tool the LLM can invoke:

### Available Tools
- `open_chat(contact: Contact)` — Open a conversation
- `send_message(recipient: Contact, message: string)` — Send a message

No mouse needed. The LLM calls the function.

**Design Principle**: Tools trigger *state transitions*; they don't return large data payloads. Data always flows through View updates in the next Snapshot — not through tool call results.

Full Example: Messaging JY Chen

Let's replay the WeChat scenario, now fully in AOTUI.

Step 1 — Application sends a Snapshot

<view id="contacts" type="ContactList" name="Contact List" app_id="wechat">
  ## Contacts

  - [Wills Guo](Contact:contacts[0]) — Online
  - [Emma Chen](Contact:contacts[1]) — Away
  - [JY Chen](Contact:contacts[2]) — Online

  ### Available Tools
  - `open_chat(contact: Contact)` — Open a conversation
  - `send_message(recipient: Contact, message: string)` — Send a message
</view>

Step 2 — LLM receives instruction: "Send 'Hello!' to JY Chen"

The LLM reads the snapshot, recognizes contacts[2] as JY Chen, and constructs the call:

{
  "tool": "send_message",
  "arguments": { "recipient": "contacts[2]", "message": "Hello!" }
}

Step 3 — Application resolves and executes

contacts[2] → { id: "jy_chen_id_392", name: "JY Chen" } → message sent.

Step 4 — Updated Snapshot arrives

<view id="chat_jy" type="ChatDetail" name="Chat with JY Chen">
  ## Conversation with [JY Chen](Contact:contacts[2])

  - [You](User:currentUser): Hello! — Just now

  ### Available Tools
  - `send_message(message: string)` — Send another message
  - `close_view()` — Close this chat
</view>

The LLM now operates on fresh context. No rendered pixels. No mouse. No CSS. Just clean structured text and typed function calls — flowing through a read → reason → act cycle.

Implementation Architecture

You might ask: "If we're building for text-only LLMs, why use HTML and JavaScript at all?"

Because the web ecosystem is mature. AOTUI uses HTML as an intermediate representation — developers write familiar JSX/Preact components, which render to HTML in a lightweight virtual DOM, which is then transformed into LLM-readable Markdown:

Developer writes:          Runtime renders:         LLM receives:
Preact JSX components  →   HTML in Worker DOM   →   Markdown Snapshot
<View id="contacts">       <div data-view="...">     <view id="contacts">
  {useViewTypeTool(...)}     <span data-tool="...">   ## Contacts
</View>                    </div>                    </view>

This architecture lets developers use familiar tools while the framework handles the complexity of semantic text generation.

Summary

GUI Concept	AOTUI Equivalent
Visual page	View (semantic container)
CSS / HTML rendering	Markdown text
Avatar / color / position	Textual reference (`Type:reference`)
Mouse click	Tool (function call)
Continuous UI stream	Discrete Snapshot
Screen real estate	Context window tokens

View AgentOrientedTUI Repo

rschristian · 2026-03-10T06:09:17Z

rschristian
Mar 10, 2026
Maintainer

This has nothing to do with Preact and is just AI junk.

Don't create discussions like this, it'll just result in them being deleted & your account banned from the organization.

1 reply

zhangweijp Mar 12, 2026
Author

OK sorry

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Build Apps For AI Agent. #5050

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Build Apps For AI Agent. #5050

Uh oh!

zhangweijp Mar 9, 2026

Rethinking UI for the Age of AI Agents

What: A New Interface Paradigm

Why: The Fundamental Mismatch

How: From Constraints to Design

No Vision → No Rendering Complexity Needed

No Continuous Perception → Simpler State Model

No Hands → The Real Problem

What a Mouse Actually Does

How AOTUI Rebuilds the Bridge

Full Example: Messaging JY Chen

Implementation Architecture

Summary

Replies: 1 comment · 1 reply

Uh oh!

rschristian Mar 10, 2026 Maintainer

Uh oh!

zhangweijp Mar 12, 2026 Author

zhangweijp
Mar 9, 2026

Replies: 1 comment 1 reply

rschristian
Mar 10, 2026
Maintainer

zhangweijp Mar 12, 2026
Author