Skip to content

Final GSOC Report#56

Draft
mclemcrew wants to merge 64 commits intoafhverjuekki:mainfrom
mclemcrew:final-gsoc-report
Draft

Final GSOC Report#56
mclemcrew wants to merge 64 commits intoafhverjuekki:mainfrom
mclemcrew:final-gsoc-report

Conversation

@mclemcrew
Copy link

Overview

This PR introduces a multi-agent LLM pipeline that enables natural language synthesis of particle behaviors for Tölvera. The system transforms high-level descriptions like "red predators chase blue prey" into executable Taichi kernels, featuring automatic species detection, state generation, and debugging capabilities. The implementation includes both a terminal UI and a demo system, supporting multiple LLM providers (Gemini, Claude, OpenAI).

Architectural Overview

The system implements a 5-phase pipeline that processes natural language through specialized components:

1. Analysis Phase (BehaviorAnalyzer)

When a user provides a natural language description, the BehaviorAnalyzer performs initial analysis to identify species, detect required states, and decompose complex behaviors into implementable components. This phase determines the synthesis strategy and routes behaviors to appropriate generation pathways.

2. Orchestration Phase (BehaviorOrchestrator)

The central coordinator manages the entire synthesis workflow, delegating tasks to specialized components while maintaining state consistency. It coordinates state creation, component generation, and agent registration, ensuring all pieces work together.

3. Code Generation Phase (CodeGenerator)

The CodeGenerator synthesizes Taichi kernel and functions using structured LLM outputs. It uses dynamic context selection to inject relevant documentation and patterns (simplified RAG), handling both simple single-particle behaviors and complex multi-component interactions. The generator produces type-safe code with error handling.

4. Template Rendering Phase (TemplateRenderer)

Using Jinja2 templates, the system assembles generated code into complete, executable sketches. Templates are organized by component type (kernels, sketches, states, initialization), allowing for consistent formatting and proper integration of all generated components.

5. Refinement Phase (SketchRefiner)

The final phase applies architectural patterns and corrections to ensure generated code follows best practices. This includes fixing common Taichi pitfalls (like return statements in conditionals), optimizing performance patterns, and ensuring code quality for the overall Tölvera sketch.

Key Components & Implementation Details

Core Orchestration System

  • behavior_orchestrator.py: Central coordinator managing the synthesis pipeline, delegating to specialized components, and maintaining workflow state. Implements intelligent routing based on behavior complexity and type detection.
  • code_generator.py: Generates Taichi functions from natural language using structured LLM outputs. Features state analysis, synthesis delegation, and automatic parameter detection with type inference.
  • behavior_analyzer.py: Decomposes complex descriptions into implementable components, identifies species configurations, and determines required states. Uses pattern matching to detect common artificial life behaviors.

Data Models & Registry

  • data_models.py: Pydantic models ensuring structured data exchange between LLM and application logic. Defines schemas for behavior synthesis requests/responses, agent functions, states, and species configurations.
  • behavior_registry.py: Maintains a registry of synthesized expert functions, tracking their types, weights, and species associations for kernel generation.

State & Species Management

  • state_manager.py: Manages custom particle and global states with automatic detection of required fields. Generates initialization code and tracks available states throughout the synthesis process.
  • species_manager.py: Analyzes descriptions for species mentions using NLP techniques, manages configurations and mappings, and generates species-aware initialization code with support for multiple patterns (random, grid, clustered).

Context & Prompt System

  • context_selector.py: Implements LLM-powered context selection with a two-tier strategy: static "base context" of core APIs and dynamically selected task-specific supplementary context. This dramatically improves generation quality by providing relevant examples and patterns while reducing overall token consumption.
  • prompt_loader.py: Centralized prompt management system with variable substitution, integrating with the context selector for dynamic prompt building. Separates prompt content from application logic for improved maintainability.

Template System

  • templates/: Organized Jinja2 templates for code generation:
    • kernel/: Integration, drawing, and utility kernel templates
    • sketch/: Complete sketch file templates
    • state/: State initialization and temporal update templates
    • init/: Particle and system initialization templates
    • drawing/: Visual effect templates

User Interfaces

  • tolvera_textual_ui.py: Rich terminal UI featuring:
    • Interactive chat panel for iterative refinement
    • Real-time code preview with syntax highlighting
    • Diff viewer showing code changes
    • Model selection across providers
    • Integrated tutorial system
    • Sketch management (save/load/export)
  • tolvera_llm_demo.py: Demo system with 8 showcases:
    • Basic single-particle behaviors
    • Complex behavior decomposition
    • Visual effects and drawing behaviors
    • Multi-species interactions
    • Automatic species detection
    • State generation for complex behaviors
    • Artificial life pattern recognition
    • Custom behavior input

Debug & Analysis Tools

  • trace_html_report.py: Generates interactive HTML reports visualizing the entire synthesis pipeline, including hierarchical trace visualization,
    timing metrics, LLM call details, and generated code at each stage.
  • tracing.py: Comprehensive trace collection system capturing detailed execution flow, performance metrics, and LLM interactions for debugging and
    optimization.

Key Features

Automatic Species Detection

The system automatically identifies species from natural language descriptions, assigning colors, names, and relationships without explicit configuration.

Automatic State Generation

Complex behaviors requiring custom states (like cellular automata) trigger automatic state field creation with proper initialization code.

Multi-Type Behavior Support

  • Single-particle behaviors (gravity, random walk)
  • Particle-particle interactions (chase, flock, repel)
  • Drawing behaviors (trails, glows, connections)
  • Complex decomposed behaviors with multiple components

Multi-Provider LLM Support

Unified interface supporting Gemini, Claude, and OpenAI models with provider-specific optimizations and structured output handling.

Error Correction Pipeline

Error detection and correction system handling common Taichi pitfalls, including the critical "return inside non-static if" issue that causes immediate crashes.

Debugging

  • Real-time console output with colored tracing
  • JSON export for detailed analysis
  • Interactive HTML reports with full synthesis pipeline visualization
  • Mermaid diagrams for workflow visualization

Testing and Usage

Quick Test

  # Run non-interactive demo (automatically selects first option)
  echo '1' | poetry run python examples/tolvera_llm_demo.py

Interactive Terminal UI

  # Launch the full-featured TUI for the complete experience
  poetry run python src/tolvera/llm/ui_scripts/tolvera_textual_ui.py

Examining the Synthesis Process

Generated sketches include comprehensive debug output:

  • examples/generated_sketches/ - Python sketch files
  • examples/generated_sketches/traces/ - Debug traces
    • .json - Raw trace data
    • .html - Interactive reports showing the complete "chain of thought"
    • .md - Mermaid workflow diagrams

The HTML trace reports are particularly valuable for reviewers or maintainers, providing visibility into:

  • Selected context and documentation
  • Generated prompts at each stage
  • LLM responses and structured outputs
  • Error corrections applied
  • Performance metrics

Future Work & Considerations

Immediate Opportunities

  1. Expanded Context Library: Adding more artificial life patterns, physics simulations, and visual effect templates to improve generation quality for specialized domains.
  2. Performance Optimization: Implementing prompt caching and context reuse strategies to reduce latency in interactive sessions. Kernel dictionary // reusable kernels/functions

Longer-term Enhancements

  • Multi-modal Input: Supporting visual references and sketches as additional input modalities
  • Fine-tuned Models: Training specialized models on Taichi code patterns for improved generation accuracy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant