Agent Episodic Memory

A Minecraft-playing agent with episodic memory using Retrieval-Augmented Generation (RAG). The agent is tasked to collect wood in the MineRL environment using vision-language models and vector database retrieval for episodic memory.

Overview

This project implements an Minecraft agent that learns from past experiences using episodic memory. The agent uses:

MineCLIP: A CLIP-based vision model fine-tuned for Minecraft video understanding
Qwen2.5-VL: A vision-language model for generating descriptions and action sequences
ChromaDB: Vector database for storing and retrieving episodic memories
LangChain/LangGraph: LLM orchestration framework for agent decision-making

The agent can operate in multiple modes:

Baseline: No memory, pure LLM-based decision making
RAG v1: Text-based retrieval using action or description embeddings
RAG v4 (Multimodal): Fused video + text embeddings for improved retrieval

Project Structure

|── .data                     # Data folders (MineRL dataset, database seed, ...)
|── .ckpts                    # Model weights
├── Agent/                    # Agent implementations
│   ├── agent.py              # OpenAI GPT-based agent
│   ├── agent_local.py        # Local HuggingFace model agent
│   ├── agent_multimodal.py   # Multimodal fused embedding agent
│   ├── RAG_server.py         # RAG retrieval server
│   └── utils/                # Agent utilities and prompts
├── MineCLIP/                 # MineCLIP vision model
│   ├── mineclip/             # Model architecture
│   └── utils/                # Vision utilities
├── DatasetProcessing/        # Data pipeline
│   ├── pipeline/             # 6-step processing pipeline
│   ├── run_pipeline.py       # Main pipeline runner
│   └── process_sliding_window_pipeline.py
├── Database/                 # Vector database management
│   ├── seed_collections.py   # ChromaDB seeding
│   └── recall_testing.py     # Retrieval evaluation
├── EnvServer/                # MineRL environment servers
│   ├── env_server.py         # Single environment
│   └── multi_env_server.py   # Multi-environment support
├── Scripts/                  # SLURM batch scripts
└── docker-compose.yml        # Container orchestration

Installation

Prerequisites

Python 3.12+
Docker and Docker Compose
CUDA-capable GPU (recommended: 24GB+ VRAM for multimodal agent)

Setup

Clone the repository:

git clone https://github.com/yourusername/agent-episodic-memory.git
cd agent-episodic-memory

Install dependencies:

pip install -e .
# or using uv
uv sync

Copy and configure environment variables:

cp .env.example .env
# Edit .env with your configuration

Download MineCLIP model weights (see Data & Model Weights)
Start the Docker services:

docker-compose up -d

Data & Model Weights

MineCLIP Checkpoints

Download the MineCLIP model checkpoints and place them in the .ckpts/ directory:

Model	Description	Download
`attn.pth`	Attention-based temporal pooling (634MB)	MineCLIP GitHub
`avg.pth`	Average temporal pooling (601MB)	MineCLIP GitHub

The checkpoints can be downloaded here.

MineRL Dataset

The project uses the MineRLTreechop-v0 dataset which can be downloaded here.

Dataset	Description	Download
MineRLTreechop-v0	Human demonstrations of tree chopping	MineRL Dataset

Dataset structure after download:

.data/
└── MineRLTreechop-v0/
    ├── v3_absolute_grape_changeling-1_1050-2666/
    │   ├── rendered.npz     # Video frames
    │   └── metadata.json    # Actions and rewards
    └── ... (more episodes)

Running the Agent

Configuration

Key environment variables in .env:

# LLM Configuration
LOCAL_MODEL_NAME=Qwen/Qwen2.5-7B-Instruct
USE_OPENAI_LLM=false  # Set to true for OpenAI models

# Agent Settings
USE_RAG=true          # Enable episodic memory
MAX_FRAMES=500        # Maximum steps per episode
TEST_RUNS=5           # Number of evaluation runs

# Server Configuration
MINERL_SERVER_URL=http://127.0.0.1:5002
CHROMA_HOST=http://127.0.0.1:8000

Running Agents

Local HuggingFace Agent:

python Agent/agent_local.py

OpenAI Agent:

export OPENAI_API_KEY=sk-your-key-here
python Agent/agent.py

Multimodal Agent (with fused embeddings):

python Agent/agent_multimodal.py

SLURM Cluster Scripts

For HPC environments:

# Run local agent
sbatch Scripts/run_local_agent.sbatch

# Run multimodal agent (requires A6000 or better)
sbatch Scripts/run_multimodal_agent.sbatch

# Run RAG experiments as array job
sbatch Scripts/run_rag_array.sbatch

Data Processing Pipeline

The 6-step pipeline processes raw MineRL episodes into vector database entries:

Step 1: Sliding window chunking (16-frame windows, stride 8)
Step 2: Video embedding using MineCLIP
Step 3: Description generation using Qwen VLM
Step 4: Text embedding using MineCLIP
Step 5: Action sequence generation using Qwen VLM
Step 6: CSV export for ChromaDB

Running the Pipeline

# Full pipeline
python DatasetProcessing/run_pipeline.py \
    --input-dir .data/MineRLTreechop-v0 \
    --output-dir .data/pipeline_output

# With specific episodes
python DatasetProcessing/run_pipeline.py \
    --input-dir .data/MineRLTreechop-v0 \
    --max-episodes 10

# Sliding window pipeline (recommended)
python DatasetProcessing/process_sliding_window_pipeline.py \
    --input-dir .data/MineRLTreechop-v0 \
    --output-dir .data/sliding_window_dataset

Individual Pipeline Steps

# Step 2: Embed videos
python DatasetProcessing/pipeline/step2_video_embedding.py \
    --input-dir .data/chunked_dataset \
    --checkpoint .ckpts/attn.pth

# Step 3: Generate descriptions
python DatasetProcessing/pipeline/step3_llm_description.py \
    --input-dir .data/chunked_dataset_with_embeddings

# Export to CSV
python DatasetProcessing/export_to_csv.py

Docker Services

# Start all services
docker-compose up -d

# Start only ChromaDB
docker-compose up -d chromadb

# Start MineRL multi-environment server
docker-compose up -d minerl-multi-env

# View logs
docker-compose logs -f

# Stop services
docker-compose down

Service Ports

Service	Port	Description
ChromaDB	8000	Vector database HTTP API
MineRL Multi-Env	5002	Multi-environment REST API

RAG Configurations

The project supports 4 RAG versions (configured in rag_configs.json):

Version	Embedding Method	Description
v1	Action-based	Uses last action from metadata
v2	Full description	LLM-generated episode description
v3	Chunk description	LLM-generated chunk-level description
v4	Fused multimodal	Combined video + text embeddings (best performance)

Evaluation

Results are saved to Agent/Results/ with metrics including:

Wood collected per episode
Success rate
Average episode length
Retrieval statistics (when using RAG)

Name		Name	Last commit message	Last commit date
Latest commit History 119 Commits
Agent		Agent
Database		Database
DatasetProcessing		DatasetProcessing
EnvServer		EnvServer
MineCLIP		MineCLIP
Scripts		Scripts
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
docker-compose.yml		docker-compose.yml
example_images.json		example_images.json
pyproject.toml		pyproject.toml
test_setup.json		test_setup.json
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agent Episodic Memory

Overview

Project Structure

Installation

Prerequisites

Setup

Data & Model Weights

MineCLIP Checkpoints

MineRL Dataset

Running the Agent

Configuration

Running Agents

SLURM Cluster Scripts

Data Processing Pipeline

Running the Pipeline

Individual Pipeline Steps

Docker Services

Service Ports

RAG Configurations

Evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Agent Episodic Memory

Overview

Project Structure

Installation

Prerequisites

Setup

Data & Model Weights

MineCLIP Checkpoints

MineRL Dataset

Running the Agent

Configuration

Running Agents

SLURM Cluster Scripts

Data Processing Pipeline

Running the Pipeline

Individual Pipeline Steps

Docker Services

Service Ports

RAG Configurations

Evaluation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages