Skip to content

Agentlab Controller #252

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 24 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
6ae2c99
add azure agents
patricebechard Jun 10, 2025
73745ce
add agentlab server and agentlab controller
patricebechard Jun 12, 2025
a89a37d
add updating action and thought, add reload task instead of reset
patricebechard Jun 12, 2025
033e21d
remove deployment name from azure model args
patricebechard Jun 12, 2025
d65ad94
add streamlit requirement, update readme
patricebechard Jun 12, 2025
abc5af1
clean up controller and server
patricebechard Jun 12, 2025
9665579
clean up agent controller
patricebechard Jun 12, 2025
b210b99
Add demo video for AgentLab Controller
patricebechard Jun 12, 2025
257c22e
add docstrings for server.py
patricebechard Jun 12, 2025
a51ea1b
change docstring style to google
patricebechard Jun 12, 2025
0f67857
format with black line length 100
patricebechard Jun 12, 2025
46f91d5
remove forced background color which was breaking dark mode
patricebechard Jun 12, 2025
3e629ac
enable looking at all past steps in new tab
patricebechard Jun 19, 2025
13ee957
add button to go back to arbitrary past step
patricebechard Jun 19, 2025
f9fdd4e
implement save feature to save traces and hints
patricebechard Jun 19, 2025
b7a54ca
add advanced options to go back to step k, reprompt k times, and act …
patricebechard Jun 19, 2025
6a516e9
minor refactoring
patricebechard Jun 19, 2025
7aa7a09
bug fixes
patricebechard Jun 22, 2025
6f07f35
update controller
patricebechard Jul 5, 2025
21ebdd7
add ability to save with same format as agentlab-xray
patricebechard Jul 15, 2025
a7702db
support for ToolUseAgent in controller, enable loading of previous run
patricebechard Jul 16, 2025
71999b4
update controller
patricebechard Jul 17, 2025
c41b8ef
enable reprompt tool use agent from controller
patricebechard Jul 21, 2025
ecff4d8
updates to agent controller
patricebechard Jul 22, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 44 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -226,6 +226,50 @@ image to select a step and observe the action taken by the agent.

**⚠️ Note**: Gradio is still developing, and unexpected behavior has been frequently noticed. Version 5.5 seems to work properly so far. If you're not sure that the proper information is displaying, refresh the page and select your experiment again.

### AgentLab Server and AgentLab Controller

https://github.com/user-attachments/assets/9a498c99-453a-4d7c-89fc-13e18db8dad6

The AgentLab Server and Controller are two components that work together to control and debug an agent deployed in an environment.

#### Prerequisites

First, set a `.env` file at the root of the repo with the following content:

```bash
# LLM Creds (Azure as an example)
AZURE_OPENAI_ENDPOINT=<YOUR_AZURE_OPENAI_ENDPOINT>
AZURE_OPENAI_API_KEY=<YOUR_AZURE_OPENAI_API_KEY>
AZURE_OPENAI_API_VERSION=<YOUR_AZURE_OPENAI_API_KEY>

# ServiceNow dev instance creds
SNOW_INSTANCE_URL=https://<your_servicenow_dev_instance>.service-now.com/
SNOW_INSTANCE_UNAME="admin"
SNOW_INSTANCE_PWD=<password>

# MiniWob
MINIWOB_URL="file:///path/to/BrowserGym/miniwob-plusplus/miniwob/html/miniwob/"
```

#### Launch the server

The AgentLab Server is responsible for hosting and enabling interaction with the environment. It is a lightweight FastAPI server that handles the BrowserGym environment and provides a REST API for the controller.

To launch the server, open a terminal and run (you will need to keep this terminal open for the next step):

```bash
agentlab-server
```

#### Launch the controller

The AgentLab Controller is a streamlit app responsible for controlling the agent and how it interacts with the environment hosted on the server.

To launch the controller, open a new terminal and run:

```bash
agentlab-controller
```

## 🏆 Leaderboard

Expand Down
2 changes: 2 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -58,3 +58,5 @@ exclude = '''
[project.scripts]
agentlab-assistant = "agentlab.ui_assistant:main"
agentlab-xray = "agentlab.analyze.agent_xray:main"
agentlab-controller = "agentlab.analyze.run_agentlab_controller:main"
agentlab-server = "agentlab.analyze.server:main"
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -27,3 +27,4 @@ ray[default]
python-slugify
pillow
gymnasium>=0.27
streamlit
16 changes: 16 additions & 0 deletions src/agentlab/agents/generic_agent/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,14 @@
AGENT_o3_MINI,
FLAGS_GPT_4o,
GenericAgentArgs,
AGENT_AZURE_4o_MINI,
AGENT_AZURE_4o,
AGENT_AZURE_4o_VISION,
AGENT_AZURE_4o_MINI_VISION,
AGENT_AZURE_41,
AGENT_AZURE_41_MINI,
AGENT_AZURE_41_VISION,
AGENT_AZURE_41_MINI_VISION,
)

__all__ = [
Expand All @@ -46,4 +54,12 @@
"AGENT_4o_VISION",
"AGENT_4o_MINI_VISION",
"AGENT_CLAUDE_SONNET_35_VISION",
"AGENT_AZURE_4o_MINI",
"AGENT_AZURE_4o",
"AGENT_AZURE_4o_VISION",
"AGENT_AZURE_4o_MINI_VISION",
"AGENT_AZURE_41",
"AGENT_AZURE_41_MINI",
"AGENT_AZURE_41_VISION",
"AGENT_AZURE_41_MINI_VISION",
]
38 changes: 38 additions & 0 deletions src/agentlab/agents/generic_agent/agent_configs.py
Original file line number Diff line number Diff line change
Expand Up @@ -350,3 +350,41 @@
chat_model_args=CHAT_MODEL_ARGS_DICT["openai/gpt-4o-2024-05-13"],
flags=DEFAULT_RS_FLAGS,
)


AGENT_AZURE_4o_MINI = GenericAgentArgs(
chat_model_args=CHAT_MODEL_ARGS_DICT["azure/gpt-4o-mini"],
flags=FLAGS_GPT_4o,
)
AGENT_AZURE_4o = GenericAgentArgs(
chat_model_args=CHAT_MODEL_ARGS_DICT["azure/gpt-4o"],
flags=FLAGS_GPT_4o,
)
AGENT_AZURE_41 = GenericAgentArgs(
chat_model_args=CHAT_MODEL_ARGS_DICT["azure/gpt-4.1"],
flags=FLAGS_GPT_4o,
)
AGENT_AZURE_41_MINI = GenericAgentArgs(
chat_model_args=CHAT_MODEL_ARGS_DICT["azure/gpt-4.1-mini"],
flags=FLAGS_GPT_4o,
)

AGENT_AZURE_4o_VISION = GenericAgentArgs(
chat_model_args=CHAT_MODEL_ARGS_DICT["azure/gpt-4o"],
flags=FLAGS_GPT_4o_VISION,
)

AGENT_AZURE_4o_MINI_VISION = GenericAgentArgs(
chat_model_args=CHAT_MODEL_ARGS_DICT["azure/gpt-4o-mini"],
flags=FLAGS_GPT_4o_VISION,
)

AGENT_AZURE_41_VISION = GenericAgentArgs(
chat_model_args=CHAT_MODEL_ARGS_DICT["azure/gpt-4.1"],
flags=FLAGS_GPT_4o_VISION,
)

AGENT_AZURE_41_MINI_VISION = GenericAgentArgs(
chat_model_args=CHAT_MODEL_ARGS_DICT["azure/gpt-4.1-mini"],
flags=FLAGS_GPT_4o_VISION,
)
9 changes: 9 additions & 0 deletions src/agentlab/agents/tool_use_agent/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,12 @@

# for backward compatibility of unpickling
sys.modules[__name__ + ".multi_tool_agent"] = sys.modules[__name__]

__all__ = [
"GPT_4_1",
"AZURE_GPT_4_1",
"GPT_4_1_MINI",
"AZURE_GPT_4_1_MINI",
"OPENAI_CHATAPI_MODEL_CONFIG",
"CLAUDE_MODEL_CONFIG",
]
24 changes: 24 additions & 0 deletions src/agentlab/agents/tool_use_agent/hint_db.csv
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,27 @@ July 13,workarena.servicenow.create-hardware-asset,385,gpt-4.1,ToolUse-gpt-4.1,W
July 13,workarena.servicenow.create-hardware-asset,385,gpt-4.1,ToolUse-gpt-4.1,WorkArena-L1,WorkArena-L1,allac,Filling form in WorkArena,"Before clicking submit, make sure that all fields are filled properly. Then click submit."
July 13,workarena.servicenow.create-hardware-asset,385,gpt-4.1,ToolUse-gpt-4.1,WorkArena-L1,WorkArena-L1,allac,Filling form in WorkArena,Avoid back and forth from tabs to tabs to reduce the number of actions
July 14,workarena.servicenow.create-hardware-asset,385,gpt-4.1,ToolUse-gpt-4.1,WorkArena-L1,WorkArena-L1,allac,Filling form in WorkArena,When you see auto-complete make sure to select an element from that list
July 16,workarena.servicenow.sort-asset-list,406,gpt-4-1,ToolUseAgent-gpt-4-1,workarena,workarena,patricebechard,Sorting lists in ServiceNow,"1. **Navigate to Your Table/List**

* For example, go to **Incident > All** or any other table you want to view.

2. **Sort by One or Multiple Columns**

* `click` on the ""show / hide filter"" button (funnel icon) at the top left of the page to open the filter row.
* Repeat the following steps for each column you want to sort by in this exact order:
* `click` on the ""Add Sort"" button to add a new sort filter. This will create a new ordering filter row with two comboboxes under the heading ""Order results by the following fields"".
* `fill` the first combobox with the appropriate field name you want to sort by. MAKE SURE to use the exact field name provided.
* `press` Enter after typing the field name to close the dropdown. It is VERY IMPORTANT that you do this before doing anything else otherwise the field will not be selected and the task will not be successful. DO NOT click on the run filter button before having confirmed your choice by explicitly pressing ENTER.
* `select_option` for the appropriate ordering between ascending (a to z) or descending (z to a) in the second combobox.
* Once all sort filters have been added, `click` the ""Run filter"" button to apply the sort.

Notes:
* NEVER directly sort the columns using the table header.
* NEVER add columns via the Personalize List menu.
* ALWAYS sort the table using the EXACT NAMES of the provided fields. DO NOT use different but similar field names. For example, if the field you're asked to sort by is ""Opened by"", DO NOT filter by ""Created by"" even if they sound similar, but instead ALWAYS use the exact ""Opened by"" wording.
* Some columns might not appear by default in the visible view of the table. This does not mean they do not exist. ALWAYS use the EXACT names provided to sort by otherwise the task will not be successful.

3. **Resetting or Clearing Sorting**

* To reset sorting, click another column, or click again to toggle.
* In the filter bar, you may see a ""Sorted by..."" indicator—clear or change it as needed."
47 changes: 32 additions & 15 deletions src/agentlab/agents/tool_use_agent/tool_use_agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,22 +8,14 @@

import bgym
import pandas as pd
from bgym import Benchmark as BgymBenchmark
from browsergym.core.observation import extract_screenshot
from browsergym.utils.obs import (
flatten_axtree_to_str,
flatten_dom_to_str,
overlay_som,
prune_html,
)

from agentlab.agents.agent_args import AgentArgs
from agentlab.benchmarks.abstract_env import AbstractBenchmark as AgentLabBenchmark
from agentlab.benchmarks.osworld import OSWorldActionSet
from agentlab.llm.base_api import BaseModelArgs
from agentlab.llm.llm_utils import image_to_png_base64_url
from agentlab.llm.response_api import (
APIPayload,
AzureOpenAIResponseModelArgs,
ClaudeResponseModelArgs,
LLMOutput,
MessageBuilder,
Expand All @@ -33,6 +25,14 @@
ToolCalls,
)
from agentlab.llm.tracking import cost_tracker_decorator
from bgym import Benchmark as BgymBenchmark
from browsergym.core.observation import extract_screenshot
from browsergym.utils.obs import (
flatten_axtree_to_str,
flatten_dom_to_str,
overlay_som,
prune_html,
)


@dataclass
Expand All @@ -43,8 +43,8 @@ def _init(self):

def make(self) -> "Block":
"""Returns a copy so the init can start adding some stuff to `self` without changing the
original datatclass that should only contain a config.
The aim is avoid having 2 calss definition for each block, e.g. Block and BlockArgs.
original dataclass that should only contain a config.
The aim is avoid having 2 class definitions for each block, e.g. Block and BlockArgs.

Returns:
Block: A copy of the current block instance with initialization applied.
Expand Down Expand Up @@ -387,7 +387,6 @@ def __init__(
self.config.action_subsets, multiaction=self.config.multiaction # type: ignore
)
self.tools = self.action_set.to_tool_description(api=model_args.api)

self.call_ids = []

self.llm = model_args.make_model()
Expand Down Expand Up @@ -508,6 +507,15 @@ def get_action(self, obs: Any) -> float:
vision_support=True,
)

AZURE_GPT_4_1 = AzureOpenAIResponseModelArgs(
model_name="gpt-4.1",
max_total_tokens=200_000,
max_input_tokens=200_000,
max_new_tokens=2_000,
temperature=0.1,
vision_support=True,
)

GPT_4_1_MINI = OpenAIResponseModelArgs(
model_name="gpt-4.1-mini",
max_total_tokens=200_000,
Expand All @@ -517,6 +525,15 @@ def get_action(self, obs: Any) -> float:
vision_support=True,
)

AZURE_GPT_4_1_MINI = AzureOpenAIResponseModelArgs(
model_name="gpt-4.1-mini",
max_total_tokens=200_000,
max_input_tokens=200_000,
max_new_tokens=2_000,
temperature=0.1,
vision_support=True,
)

OPENAI_CHATAPI_MODEL_CONFIG = OpenAIChatModelArgs(
model_name="gpt-4o-2024-08-06",
max_total_tokens=200_000,
Expand Down Expand Up @@ -576,9 +593,9 @@ def get_action(self, obs: Any) -> float:
general_hints=GeneralHints(use_hints=False),
task_hint=TaskHint(use_task_hint=True),
keep_last_n_obs=None,
multiaction=True, # whether to use multi-action or not
# action_subsets=("bid",),
action_subsets=("coord"),
multiaction=False, # whether to use multi-action or not
action_subsets=("bid",),
# action_subsets=("coord"),
# action_subsets=("coord", "bid"),
)

Expand Down
Loading