This project implements an advanced autonomous agent that leverages Monte Carlo Tree Search (MCTS) and Large Language Models (LLMs) to solve complex programming tasks. The agent intelligently explores the vast search space of possible code solutions, breaking down problems, generating code, evaluating its correctness, and dynamically repairing errors to arrive at a functional final script.
The core script, mcts_llm_polish.py, orchestrates this entire process.
The agent's methodology is inspired by how a human programmer might tackle a difficult task: planning, writing code, testing, debugging, and sometimes rethinking the overall approach. This is achieved through a symbiotic relationship between a structured search algorithm (MCTS) and a creative generator (LLM).
-
Monte Carlo Tree Search (MCTS): MCTS is a powerful decision-making algorithm, famously used in game AI like AlphaGo. Here, instead of a game board, the "game" is to write a correct program. Each node in the tree represents a specific state of the code (a sequence of completed steps), and the moves are the possible code snippets for the next step. MCTS guides the agent to spend more time exploring promising code paths.
-
LLM as a Cognitive Engine: The LLM is used for all "creative" and "reasoning" tasks:
- Task Decomposition: Breaking a high-level goal (e.g., "Analyze this satellite data") into a sequence of smaller, manageable coding steps.
- Code Generation: Writing Python code for a specific sub-task.
- Code Repair: Debugging and fixing code that produces errors.
- Strategic Re-planning: If the agent gets stuck, it can ask the LLM to re-decompose the remaining tasks from the current state, finding an alternative path forward.
The script operates in a loop, performing multiple "rollouts" to build and evaluate the search tree. Each rollout consists of four main phases:
Starting from the root (an empty script), the agent traverses the tree by selecting the most promising child node at each level. The selection is guided by the P-UCB (Polynomial Upper Confidence Bound) formula, which smartly balances:
- Exploitation: Choosing nodes that have historically led to high rewards (good code).
- Exploration: Choosing nodes that have been visited less often, to avoid getting stuck in a local optimum.
When the agent reaches a leaf node (a point where the code path hasn't been extended), it expands the tree.
- The agent commits the code in the current node as the "final" solution for that step.
- It then calls the LLM to generate multiple code candidates (
top_k) for the next task step. - Each candidate becomes a new child node, which is checked for immediate syntax or runtime errors.
This is the evaluation phase. From a newly expanded node, the agent simulates the future to estimate the quality of the current code path.
- It asks the LLM to generate a quick, "long-exposure" completion for the next few steps.
- The combined code (from the root to the current node + the simulated future code) is executed in a secure, sandboxed environment with time and memory limits.
- A reward is calculated based on the percentage of steps that execute successfully. A fully successful program gives a reward of 1.0.
The reward from the simulation is propagated back up the tree from the leaf node to the root. Each node along this path has its statistics updated (visits and value), informing future Selection phases.
- Error Handling & Code Repair: If a selected node contains code with an error, the agent doesn't immediately expand it. Instead, it enters a "fix" cycle, using the LLM to repair the buggy code. It tracks multiple fix attempts, scoring each one.
- Hard Reset & Re-Decomposition: If a node proves too difficult to fix after several attempts, the agent triggers a "hard reset." It asks the LLM to devise a new plan (re-decompose the task) from that point onwards, effectively pruning the dead-end path and creating a new one.
The project directory contains several key components:
mcts_llm_polish.py: The main executable script that runs the MCTS agent.utils.py: Contains helper functions for interacting with the LLM API (e.g.,decompose_task,generate_task_codes,llm_task_update).multi_process.py: A crucial utility for executing generated code in a separate, monitored process. This prevents the main agent from crashing due to errors, infinite loops, or excessive memory usage in the generated code.workspace/: A directory where the agent executes the code. Any files generated by the code (e.g., plots, data files) will appear here.Archived/: A folder likely used to store results from previous runs.
Install the required Python libraries. You can create a requirements.txt file or install them directly:
pip install openai jedi timeout-decorator structured-logprobs earthengine-api geemap autopep8 pillow networkx matplotlib pydot