Status: stable reference for the current implemented v2 feature surface
This document is a practical map for retesting the current Fred v2 world.
It is not a deep architecture note. The goal is simpler:
- list the important v2 capabilities
- explain, in one line, how each capability fits the v2 runtime model
- point to the best current demo, profile, or agent to see it in action
Use it when you want to review the v2 runtime feature by feature instead of reading the codebase first.
The current v2 world has two real execution shapes:
-
ReActAgentDefinitionBest for assistants whose job is mainly:- understand the request
- use tools
- answer or produce an artifact
-
GraphAgentDefinitionBest for agents whose value comes from a controlled business journey:- route the request
- gather the right context
- pause for decisions
- execute the permitted action
- keep the next turn coherent
Current concrete v2 agents and demos:
Basic ReAct V2RAG Expert V2Tracking Graph Demo V2Artifact Report Demo V2
Current built-in ReAct profiles:
generic_assistantgeorgescustodiansentinellog_geniusgeo_demo
One runtime detail is important for interpreting the current system correctly:
- v2 still keeps a warm per-session agent cache
- v2 also writes durable checkpoints
So a hot session can benefit from in-memory reuse, while restart tolerance and pause/resume still rely on the durable checkpoint path.
For each feature below, the important questions are:
- What business problem does this capability solve?
- How is it expressed in v2?
- Which agent or profile is the best way to test it?
- What should you expect to see if it works correctly?
What it is:
- the simplest authoring model for a useful assistant
How it fits v2:
- the definition declares role, prompt, and declared Fred tool refs
- the shared
ReActRuntimeowns execution
Best thing to test:
Basic ReAct V2
What to try:
- ask a plain conversational question with no tools needed
What you should observe:
- no custom runtime code is needed in the agent definition
- the answer behaves like a normal assistant
- the info bubble should show token usage if the model provider returns it
What it is:
- business starting points for a generic ReAct agent
How it fits v2:
- profiles do not create a new runtime
- they preload a business identity: prompt, tags, MCP defaults, approval policy
Best thing to test:
- create
Basic ReAct V2from:custodiansentinelgeorgesgeo_demo
What to try:
- compare two different profiles created from the same base agent type
What you should observe:
- same runtime category
- different business behavior and default capabilities
What it is:
- the admin equips an agent with MCP servers without changing its code
How it fits v2:
- the definition stays pure
ToolProviderPortand the Fred MCP bridge resolve actual tools at runtime
Best thing to test:
Basic ReAct V2withcustodianorsentinel
What to try:
- attach MCP servers in settings
- start a fresh conversation
- ask the agent to use those tools explicitly
What you should observe:
- tool calls appear in the conversation
- the agent code did not need to hard-code server URLs
Important distinction:
- runtime-provided MCP tools are one way Fred can expose capabilities at runtime
- but when Fred already provides a stable business tool ref, agent authors should prefer that tool ref over a direct MCP dependency
Example:
- for standard corpus retrieval, prefer
knowledge.search - do not bind a new product agent directly to
mcp-knowledge-flow-mcp-textunless the needed capability is not yet exposed by Fred as a first-class tool
What it is:
- the user must approve sensitive actions before execution
How it fits v2:
- approval is a policy on the definition side
- the runtime pauses and resumes through the shared HITL contract
Best thing to test:
Basic ReAct V2withcustodian
What to try:
- ask for a mutating file/corpus action
What you should observe:
- a human approval card
- no tool execution before approval
- successful resume after approval
What it is:
- document-grounded answers with citations
How it fits v2:
- the agent declares a business tool,
knowledge.search - tool results carry structured
sources - the final message preserves them
Why this matters:
- the authoring contract is "this agent needs corpus retrieval"
- not "this agent talks to this exact MCP endpoint"
- this keeps retrieval transport-agnostic and lets Fred preserve chat-selected library scope, selected documents, and knowledge mode centrally
Best thing to test:
RAG Expert V2
What to try:
- ingest a small document with specific facts
- ask narrow factual questions
What you should observe:
- a grounded answer
- visible source metadata on the final answer
- different behavior if the selected knowledge scope changes
Why this is its own v2 definition and not just a profile:
- a profile is a business starting point for the generic ReAct assistant
RAG Expert V2also carries RAG-specific fields and explicit grounding guardrails- so it is still ReAct, but it is a stronger product definition than a simple preset
What it is:
- an agent can return a real map, not just text describing coordinates
How it fits v2:
GeoPartis a structured UI output- the runtime carries it as a first-class part
Best thing to test:
Basic ReAct V2withgeo_demo- or
Tracking Graph Demo V2
What to try:
Show Paris and Lyon on a map- or ask the postal graph to show the parcel route / pickup points
What you should observe:
- a real map rendered in the UI
- not just a blob of GeoJSON or prose
What it is:
- an agent generates a report or note, stores it through Fred, and returns a link
How it fits v2:
ArtifactPublisherPortis the platform capability- the UI-facing output is still a normal
LinkPart
Best thing to test:
Artifact Report Demo V2
What to try:
- ask for a downloadable summary, report, or brief
What you should observe:
- the agent publishes a file
- the chat returns a real download link
- the agent does not invent a URL itself
What it is:
- admins upload templates or supporting files, and the agent can fetch them later
How it fits v2:
ResourceReaderPortis the platform capability- graph nodes can fetch text or bytes directly
- ReAct agents can use the narrow built-in tool
resources.fetch_text
Best thing to test:
Artifact Report Demo V2
What to try:
- upload a text template in agent config storage, for example
report-template.md - ask the report demo to use that template for the generated deliverable
What you should observe:
- the template is fetched through Fred storage
- the agent does not hard-code any storage URL
- the published artifact reflects the fetched template or style guide
What it is:
- a real business flow with typed state, routing, tool calls, HITL, and rich output
How it fits v2:
- the author declares the graph
GraphRuntimeowns execution, pause/resume, and state continuity
Best thing to test:
Tracking Graph Demo V2
What to try:
- ask about your parcel
- choose the parcel when asked
- ask for diagnosis, map, reroute, or reschedule
What you should observe:
- structured workflow progression
- MCP calls to both business and IoT services
- human decisions at the right points
- a clean final business outcome
What it is:
- the graph remembers the selected business object across turns
How it fits v2:
- the runtime persists the last completed graph state
- the definition decides what to carry into the next turn
Best thing to test:
Tracking Graph Demo V2
What to try:
- ask if you have a parcel in progress
- choose one parcel
- ask a follow-up question about delivery or rerouting
What you should observe:
- the graph should reuse the parcel context
- it should not ask you to pick the parcel again unless the situation is ambiguous again
What it is:
- pause/resume should be explicit and transport-safe
How it fits v2:
awaiting_humancarries acheckpoint_id- the runtime validates resume against that checkpoint identity
- the durable checkpoint backend now runs over Fred's shared SQL engine
(
storage.postgres), with SQLite fallback still possible in local dev
Best thing to test:
- any v2 HITL flow:
custodianTracking Graph Demo V2
What to try:
- trigger a pause
- resume it
- then continue the conversation
What you should observe:
- successful resume
- no dependence on one executor object surviving in memory
What it is:
- a safe structural description of a v2 agent
How it fits v2:
inspectis the canonical introspection surface- it is separate from execution
Best thing to test:
- any v2 agent
What to try:
- call the backend inspect endpoint for the agent:
GET /agentic/v1/agents/{agent_id}/inspect
What you should observe:
execution_category- fields
- declared tool refs
- preview
- default MCP servers when relevant
Important note:
- for ReAct agents, inspection is usually text-first
- for graph agents, inspection can include Mermaid
What it is:
- shared tracing conventions emitted by v2 runtime, not by each agent author
How it fits v2:
- runtime emits consistent spans for major execution steps
- Langfuse metadata includes Fred business context (
agent_name,team_id,user_name,fred_session_id)
Best thing to test:
- one ReAct agent (
Basic ReAct V2) - one graph agent (
Tracking Graph Demo V2)
What to try:
- run one exchange with tool calls
- filter Langfuse by
agent_name+fred_session_id - check that model/tool/runtime spans are visible under the exchange trace
What you should observe:
- consistent runtime naming conventions:
- graph runtime:
v2.graph.*spans (v2.graph.node,v2.graph.model,v2.graph.tool, ...) - ReAct runtime:
v2.react.modelspan for model calls (operation=model_call)
- graph runtime:
- easier per-session filtering than raw opaque ids
- if only a top-level span is visible, classify it as instrumentation gap and fix runtime instrumentation
What it is:
- an admin-facing capability to summarize one conversation performance from Langfuse traces
How it fits v2:
- exposed as a first-class tool ref (
traces.summarize_conversation) - same runtime tool path as other platform capabilities
- intentionally implemented as a Fred built-in runtime tool, not as an MCP tool
- current backend path is Langfuse Public API over HTTP (
/api/public/traces,/api/public/traces/{trace_id}) usingLANGFUSE_HOST,LANGFUSE_PUBLIC_KEY, andLANGFUSE_SECRET_KEY
Best thing to test:
log_geniusprofile
What to try:
- ask for performance analysis of the current conversation
- compare the summary with Langfuse dashboard spans
What you should observe:
- bottleneck classification with explicit totals (
model_total_ms,tool_total_ms,await_human_total_ms) - explicit
instrumentation_gapdetection when breakdown is missing - concrete next actions instead of ambiguous “slow model” guesses
If you want a clean manual review of the v2 world, this order is efficient:
-
Basic ReAct V2- plain conversation
- token usage
-
ReAct profiles
georgescustodiansentinel
-
ReAct HITL
custodian
-
RAG and sources
RAG Expert V2
-
Geo output
geo_demo
-
Artifact publishing and template fetch
Artifact Report Demo V2
-
Graph workflow, graph HITL, graph continuity
Tracking Graph Demo V2
-
Inspect endpoint
- one ReAct agent
- one Graph agent
-
Observability and trace triage
- verify Langfuse metadata filtering (
agent_name,team_id,user_name,fred_session_id) - run
traces.summarize_conversationon a real exchange - confirm no misleading bottleneck when instrumentation is missing
- verify Langfuse metadata filtering (
If all of the tests above feel coherent, then the important v2 claim becomes credible:
- the runtime is no longer just “a nicer way to write agents”
- it is becoming a real capability platform
The features above are the ones that matter most for that claim:
- tools
- HITL
- sources
- maps
- artifact publication
- resource fetching
- graph workflows
- inspection
- observability
If those all feel natural in the same model, then the v2 direction is strong.
For a scan of what still remains in the legacy agents and whether it deserves promotion into a real runtime feature, see LEGACY_FEATURE_SCAN.md.