Final GSOC Report by mclemcrew · Pull Request #56 · afhverjuekki/tolvera

mclemcrew · 2025-08-27T03:43:54Z

Overview

This PR introduces a multi-agent LLM pipeline that enables natural language synthesis of particle behaviors for Tölvera. The system transforms high-level descriptions like "red predators chase blue prey" into executable Taichi kernels, featuring automatic species detection, state generation, and debugging capabilities. The implementation includes both a terminal UI and a demo system, supporting multiple LLM providers (Gemini, Claude, OpenAI).

Architectural Overview

The system implements a 5-phase pipeline that processes natural language through specialized components:

1. Analysis Phase (BehaviorAnalyzer)

When a user provides a natural language description, the BehaviorAnalyzer performs initial analysis to identify species, detect required states, and decompose complex behaviors into implementable components. This phase determines the synthesis strategy and routes behaviors to appropriate generation pathways.

2. Orchestration Phase (BehaviorOrchestrator)

The central coordinator manages the entire synthesis workflow, delegating tasks to specialized components while maintaining state consistency. It coordinates state creation, component generation, and agent registration, ensuring all pieces work together.

3. Code Generation Phase (CodeGenerator)

The CodeGenerator synthesizes Taichi kernel and functions using structured LLM outputs. It uses dynamic context selection to inject relevant documentation and patterns (simplified RAG), handling both simple single-particle behaviors and complex multi-component interactions. The generator produces type-safe code with error handling.

4. Template Rendering Phase (TemplateRenderer)

Using Jinja2 templates, the system assembles generated code into complete, executable sketches. Templates are organized by component type (kernels, sketches, states, initialization), allowing for consistent formatting and proper integration of all generated components.

5. Refinement Phase (SketchRefiner)

The final phase applies architectural patterns and corrections to ensure generated code follows best practices. This includes fixing common Taichi pitfalls (like return statements in conditionals), optimizing performance patterns, and ensuring code quality for the overall Tölvera sketch.

Key Components & Implementation Details

Core Orchestration System

behavior_orchestrator.py: Central coordinator managing the synthesis pipeline, delegating to specialized components, and maintaining workflow state. Implements intelligent routing based on behavior complexity and type detection.
code_generator.py: Generates Taichi functions from natural language using structured LLM outputs. Features state analysis, synthesis delegation, and automatic parameter detection with type inference.
behavior_analyzer.py: Decomposes complex descriptions into implementable components, identifies species configurations, and determines required states. Uses pattern matching to detect common artificial life behaviors.

Data Models & Registry

data_models.py: Pydantic models ensuring structured data exchange between LLM and application logic. Defines schemas for behavior synthesis requests/responses, agent functions, states, and species configurations.
behavior_registry.py: Maintains a registry of synthesized expert functions, tracking their types, weights, and species associations for kernel generation.

State & Species Management

state_manager.py: Manages custom particle and global states with automatic detection of required fields. Generates initialization code and tracks available states throughout the synthesis process.
species_manager.py: Analyzes descriptions for species mentions using NLP techniques, manages configurations and mappings, and generates species-aware initialization code with support for multiple patterns (random, grid, clustered).

Context & Prompt System

context_selector.py: Implements LLM-powered context selection with a two-tier strategy: static "base context" of core APIs and dynamically selected task-specific supplementary context. This dramatically improves generation quality by providing relevant examples and patterns while reducing overall token consumption.
prompt_loader.py: Centralized prompt management system with variable substitution, integrating with the context selector for dynamic prompt building. Separates prompt content from application logic for improved maintainability.

Template System

templates/: Organized Jinja2 templates for code generation:
- kernel/: Integration, drawing, and utility kernel templates
- sketch/: Complete sketch file templates
- state/: State initialization and temporal update templates
- init/: Particle and system initialization templates
- drawing/: Visual effect templates

User Interfaces

tolvera_textual_ui.py: Rich terminal UI featuring:
- Interactive chat panel for iterative refinement
- Real-time code preview with syntax highlighting
- Diff viewer showing code changes
- Model selection across providers
- Integrated tutorial system
- Sketch management (save/load/export)
tolvera_llm_demo.py: Demo system with 8 showcases:
- Basic single-particle behaviors
- Complex behavior decomposition
- Visual effects and drawing behaviors
- Multi-species interactions
- Automatic species detection
- State generation for complex behaviors
- Artificial life pattern recognition
- Custom behavior input

Debug & Analysis Tools

trace_html_report.py: Generates interactive HTML reports visualizing the entire synthesis pipeline, including hierarchical trace visualization,
timing metrics, LLM call details, and generated code at each stage.
tracing.py: Comprehensive trace collection system capturing detailed execution flow, performance metrics, and LLM interactions for debugging and
optimization.

Key Features

Automatic Species Detection

The system automatically identifies species from natural language descriptions, assigning colors, names, and relationships without explicit configuration.

Automatic State Generation

Complex behaviors requiring custom states (like cellular automata) trigger automatic state field creation with proper initialization code.

Multi-Type Behavior Support

Single-particle behaviors (gravity, random walk)
Particle-particle interactions (chase, flock, repel)
Drawing behaviors (trails, glows, connections)
Complex decomposed behaviors with multiple components

Multi-Provider LLM Support

Unified interface supporting Gemini, Claude, and OpenAI models with provider-specific optimizations and structured output handling.

Error Correction Pipeline

Error detection and correction system handling common Taichi pitfalls, including the critical "return inside non-static if" issue that causes immediate crashes.

Debugging

Real-time console output with colored tracing
JSON export for detailed analysis
Interactive HTML reports with full synthesis pipeline visualization
Mermaid diagrams for workflow visualization

Testing and Usage

Quick Test

  # Run non-interactive demo (automatically selects first option)
  echo '1' | poetry run python examples/tolvera_llm_demo.py

Interactive Terminal UI

  # Launch the full-featured TUI for the complete experience
  poetry run python src/tolvera/llm/ui_scripts/tolvera_textual_ui.py

Examining the Synthesis Process

Generated sketches include comprehensive debug output:

examples/generated_sketches/ - Python sketch files
examples/generated_sketches/traces/ - Debug traces
- .json - Raw trace data
- .html - Interactive reports showing the complete "chain of thought"
- .md - Mermaid workflow diagrams

The HTML trace reports are particularly valuable for reviewers or maintainers, providing visibility into:

Selected context and documentation
Generated prompts at each stage
LLM responses and structured outputs
Error corrections applied
Performance metrics

Future Work & Considerations

Immediate Opportunities

Expanded Context Library: Adding more artificial life patterns, physics simulations, and visual effect templates to improve generation quality for specialized domains.
Performance Optimization: Implementing prompt caching and context reuse strategies to reduce latency in interactive sessions. Kernel dictionary // reusable kernels/functions

Longer-term Enhancements

Multi-modal Input: Supporting visual references and sketches as additional input modalities
Fine-tuned Models: Training specialized models on Taichi code patterns for improved generation accuracy

…ek11

into final-gsoc-report

…ndows)

mclemcrew added 30 commits April 7, 2025 20:43

mvp for tv.llm implementation

9fd40be

updated comments

f78fe99

don't enable the llm module unless required within a script

801e853

remove unused safety catches for importing llm module

f5d1b48

updated unnecessary messages and comments

05c08fe

Merge branch 'afhverjuekki:main' into tv-llm-poc

5cdb4e5

update readme

5e06b95

testing poe examples here

469f4a5

example updates

42de77e

updates on names

135044e

updates for poe example

e52174d

working demo

861dd88

updates before generation to setup/update modifications

4980406

demo ready

de0d5c8

update wording

348810a

updates to code style

4a677ae

updates for interaction demo

0c7965e

updates for particle interaction

82cdf9e

updates for error handling and storing kernals

d1117e6

add kernal repo file

8cf4c65

updates for behaviors and species

14370f1

updates for specific species forces

e0384f6

boundary fix

8d04c1e

test states

ad76472

updates for state generation

9cfeaf6

update for integration template

9f0182b

updates for a-life examples

e2056d8

updates for new system

7b75608

updates for demo

e216f64

updates

0b673b9

mclemcrew added 30 commits August 14, 2025 16:53

updates for code launch

8a10840

updates to synthesis approach

16cd77d

updates to readme

31700e2

update readme

18e363f

updates for structural changes

901b0be

Update README.md

8cea247

update model names

daa314c

Merge branch 'week11' of https://github.com/mclemcrew/tolvera into we…

099c52c

…ek11

updates for structure changes

ec738ef

updates for diff coloring

3f93095

updates for decomposer and report

0f65e1c

updates for new prompt loading and context loading

22a5b9b

updates to project setup

2f63f32

updates to structure

a92746e

updates for legacy code

0152e85

updates for final-gsoc week

e66df46

updates for states, templates, prompts, and contexts adjustments

4e5471a

updates to ui

7fcdee9

updates for directory structure and fix README.md

566c57b

Update README.md

ae08678

remove unnecessary files

4357ef5

Merge branch 'final-gsoc-report' of https://github.com/mclemcrew/tolvera

ba2d66e

into final-gsoc-report

updates for documentation

be899c7

udpates for docs

8a86803

remove error pattern analyzer

22fe931

readme for ui_scripts directory

94f2f09

updates for refocus on terminal for mac (not sure if it works with wi…

f985f6c

…ndows)

updates for llm docs

1037e70

updates for llm documentation

4a04203

updates for llm demo

e0a2383

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Final GSOC Report#56

Final GSOC Report#56
mclemcrew wants to merge 64 commits intoafhverjuekki:mainfrom
mclemcrew:final-gsoc-report

mclemcrew commented Aug 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

mclemcrew commented Aug 27, 2025

Overview

Architectural Overview

1. Analysis Phase (BehaviorAnalyzer)

2. Orchestration Phase (BehaviorOrchestrator)

3. Code Generation Phase (CodeGenerator)

4. Template Rendering Phase (TemplateRenderer)

5. Refinement Phase (SketchRefiner)

Key Components & Implementation Details

Core Orchestration System

Data Models & Registry

State & Species Management

Context & Prompt System

Template System

User Interfaces

Debug & Analysis Tools

Key Features

Automatic Species Detection

Automatic State Generation

Multi-Type Behavior Support

Multi-Provider LLM Support

Error Correction Pipeline

Debugging

Testing and Usage

Quick Test

Interactive Terminal UI

Examining the Synthesis Process

Future Work & Considerations

Immediate Opportunities

Longer-term Enhancements

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant