Skip to content

SteveLeve/if-gym

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

1 Commit
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

IF Gym ๐Ÿ‹๏ธ

AI Agent Framework for Playing Interactive Fiction Games

IF Gym is an experimental framework for training and evaluating AI agents on interactive fiction (text adventure) games. Inspired by OpenAI Gym, it provides a standardized environment for testing LLM reasoning, planning, and problem-solving capabilities.

Why Interactive Fiction?

Text adventures offer a perfect testbed for AI agents:

  • Complex reasoning required - Puzzles demand multi-step planning
  • Natural language interface - Direct test of language understanding
  • Rich state space - Thousands of possible game states
  • Clear objectives - Measurable success criteria
  • Resource constrained - Can test local/smaller models
  • Diverse challenges - Different games test different capabilities

Project Status

๐Ÿšง Active Development - Core framework in place, first agents coming soon

Current Features

  • โœ… Core agent/game abstractions
  • โœ… Monorepo structure
  • โœ… TypeScript with strict typing
  • ๐Ÿšง ifvms.js adapter (coming soon)
  • ๐Ÿšง OpenAI agent (GPT-4, GPT-3.5)
  • ๐Ÿšง Anthropic agent (Claude 3.5 Sonnet)
  • ๐Ÿšง Local model agent (Ollama/llama.cpp)

Quick Start

# Install dependencies
pnpm install

# Build all packages
pnpm build

# Run an agent on a game (coming soon)
pnpm play --game games/zork1.z3 --agent gpt-4

# Run benchmark suite (coming soon)
pnpm benchmark --suite zork-opening

Architecture

if-gym/
โ”œโ”€โ”€ packages/
โ”‚   โ”œโ”€โ”€ core/          # Agent/game interfaces, session management
โ”‚   โ”œโ”€โ”€ interpreters/  # Game engine adapters (ifvms, frotz)
โ”‚   โ”œโ”€โ”€ agents/        # LLM implementations (OpenAI, Anthropic, local)
โ”‚   โ”œโ”€โ”€ evaluators/    # Metrics, scoring, analysis
โ”‚   โ””โ”€โ”€ cli/           # Command-line interface
โ”œโ”€โ”€ games/             # IF game files (.z3, .z5, .z8)
โ”œโ”€โ”€ benchmarks/        # Benchmark definitions
โ””โ”€โ”€ experiments/       # Results and analysis

Novel Research Directions

1. Multi-Model Comparison

Compare reasoning strategies across different LLMs:

  • GPT-4 vs Claude 3.5 Sonnet vs Llama 70B
  • How do different models approach puzzles?
  • Which architectures excel at planning?

2. Resource-Constrained Agents

Test smaller, local models:

  • Can 8B parameter models compete?
  • Efficiency vs capability trade-offs
  • Running agents on limited hardware

3. Reasoning Trace Analysis

Capture and study agent thinking:

  • How do successful agents plan?
  • Common failure patterns
  • Transfer learning across games

4. Agent Evaluation Metrics

Novel performance measures:

  • Commands per puzzle solved
  • Exploration efficiency
  • Backtracking frequency
  • Novel command creativity

Example Usage (Planned)

import { GameSession } from '@if-gym/core';
import { IFVMSAdapter } from '@if-gym/interpreters';
import { GPT4Agent } from '@if-gym/agents';

// Create game instance
const game = new IFVMSAdapter({
  gamePath: 'games/zork1.z3'
});

// Create agent
const agent = new GPT4Agent({
  model: 'gpt-4',
  apiKey: process.env.OPENAI_API_KEY,
  captureReasoning: true
});

// Run session
const session = new GameSession(game, agent, {
  maxTurns: 100,
  verbose: true
});

const result = await session.run();

console.log(`Completed in ${result.turns} turns`);
console.log(`Success: ${result.metrics.success}`);

Games

IF Gym works with any Z-machine game:

  • Zork I, II, III - Classic cave adventures
  • Hitchhiker's Guide - Douglas Adams' comedy masterpiece
  • Planetfall - Sci-fi adventure
  • Trinity - Nuclear war drama
  • Photopia - Experimental narrative

And hundreds more from the IF Archive.

Development Roadmap

Phase 1: Core Framework (In Progress)

  • Project scaffolding
  • Core abstractions (Agent, Game, Session)
  • ifvms.js adapter
  • Basic CLI

Phase 2: First Agents

  • Random baseline agent
  • GPT-4 agent with CoT prompting
  • Claude 3.5 Sonnet agent
  • Evaluation metrics

Phase 3: Experimentation

  • Zork I benchmark suite
  • Multi-agent comparison
  • Reasoning trace analysis
  • Results visualization

Phase 4: Local Models

  • Ollama integration
  • llama.cpp support
  • Efficiency comparisons
  • Resource usage metrics

Phase 5: Advanced Features

  • Multi-game transfer learning
  • Agent memory/learning
  • Collaborative agents
  • Visual game representation

Contributing

IF Gym is an experimental research project. Contributions welcome!

Areas of interest:

  • New agent implementations
  • Interpreter adapters
  • Evaluation metrics
  • Benchmark suites
  • Analysis tools

Related Work

  • Jericho - Reinforcement learning for IF (Python)
  • TextWorld - Procedural IF generation (Python)
  • ifvms.js - Modern Z-machine interpreter (TypeScript)

IF Gym differs by focusing on LLM agents and comparing different models' reasoning strategies.

License

MIT

Acknowledgments

  • Infocom for creating the Z-machine and classic games
  • Graham Nelson for the Z-Machine Standards Document
  • The IF community for preserving gaming history
  • ifvms.js for the excellent TypeScript interpreter

IF Gym - Where AI agents learn to explore, puzzle-solve, and adventure ๐ŸŽฎ๐Ÿค–

About

AI agent framework for playing interactive fiction games

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors