Skip to content

Commit 776b321

Browse files
authored
add community-use-case: Excel Analyzer (#446)
2 parents af27b0f + 8bfeea9 commit 776b321

File tree

6 files changed

+635
-0
lines changed

6 files changed

+635
-0
lines changed
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# Excel Analyzer
2+
[中文](README_zh.md)
3+
4+
This project uses **Owl** for data analysis and visualization.
5+
6+
## Features
7+
8+
- Provides both English and Chinese versions of the raw data and prompts
9+
- Utilizes **CodeExecutionToolkit**, **ExcelToolkit**, and **FileWriteToolkit** to complete related tasks
10+
- Implements **ExcelRolePlaying** based on **OwlRolePlaying**, which overrides the `system_prompt` with a cleaner, more focused version tailored for data analysis scenarios
11+
- tested using `gpt-4o` and `deepseek-v3`
12+
- The analysis and visualization of this Excel file involve:
13+
- Complex headers (merged rows)
14+
- Nan value handling
15+
- Complex group calculations
16+
- Visualization
17+
18+
## How to Use
19+
1. Set up the environment according to Owl's official instructions
20+
2. Run the following commands:
21+
```bash
22+
cd community_usecase/excel_analyzer
23+
24+
# Chinese version, using deepseek-v3
25+
python excel_analyzer_zh.py
26+
27+
# English version, using gpt-4o
28+
python excel_analyzer_zh.py
29+
```
30+
3. The analysis results will be saved in the current directory
31+
32+
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# Excel Analyzer
2+
这个项目使用owl来做数据分析和可视化
3+
4+
5+
## Features
6+
- 提供了英文,中文两个版本的原始数据和prompt,方便理解
7+
- 使用**CodeExecutionToolkit****ExcelToolkit****FileWriteToolkit**来完成相关工作
8+
-**OwlRolePlaying**基础之上实现了**ExcelRolePalying**,它重写了system_prompt,更简洁,聚焦在数据分析场景
9+
- 经过测试,在`gpt-4o``deepseek-v3`下均可以达到预期效果
10+
- 对该excel进行分析和可视化时涉及到的内容有:
11+
- 复杂表头(合并行)
12+
- 缺失值处理
13+
- 复杂的分组计算
14+
- 可视化
15+
16+
## How to use
17+
1. 按照owl的官方流程搭建好环境
18+
2. 运行
19+
```
20+
cd community_usecase/excel_analyzer
21+
22+
# Chinese version, using deepseek-v3
23+
python excel_analyzer_zh.py
24+
25+
# English version, using gpt-4o
26+
python excel_analyzer_zh.py
27+
```
28+
3. 数据集分析的结果将会在出存在当前目录下
29+
30+
33.2 KB
Binary file not shown.
27 KB
Binary file not shown.
Lines changed: 264 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,264 @@
1+
# ========= Copyright 2023-2024 @ CAMEL-AI.org. All Rights Reserved. =========
2+
# Licensed under the Apache License, Version 2.0 (the "License");
3+
# you may not use this file except in compliance with the License.
4+
# You may obtain a copy of the License at
5+
#
6+
# http://www.apache.org/licenses/LICENSE-2.0
7+
#
8+
# Unless required by applicable law or agreed to in writing, software
9+
# distributed under the License is distributed on an "AS IS" BASIS,
10+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
11+
# See the License for the specific language governing permissions and
12+
# limitations under the License.
13+
# ========= Copyright 2023-2024 @ CAMEL-AI.org. All Rights Reserved. =========
14+
import os
15+
import sys
16+
17+
18+
from dotenv import load_dotenv
19+
from camel.configs import ChatGPTConfig
20+
from camel.models import ModelFactory
21+
from camel.messages.base import BaseMessage
22+
23+
from camel.toolkits import (
24+
CodeExecutionToolkit,
25+
ExcelToolkit,
26+
FileWriteToolkit,
27+
)
28+
from camel.types import ModelPlatformType
29+
30+
from owl.utils import OwlRolePlaying
31+
from typing import Dict, List, Optional, Tuple
32+
from camel.logger import set_log_level, set_log_file, get_logger
33+
34+
import pathlib
35+
36+
logger = get_logger(__name__)
37+
38+
base_dir = pathlib.Path(__file__).parent.parent.parent
39+
env_path = base_dir / "owl" / ".env"
40+
load_dotenv(dotenv_path=str(env_path))
41+
42+
set_log_level(level="DEBUG")
43+
44+
class ExcelRolePalying(OwlRolePlaying):
45+
def _construct_gaia_sys_msgs(self):
46+
user_system_prompt = f"""
47+
===== RULES OF USER =====
48+
Never forget you are a user and I am a assistant. Never flip roles! You will always instruct me. We share a common interest in collaborating to successfully complete a task.
49+
I must help you to complete a difficult task.
50+
You must instruct me based on my expertise and your needs to solve the task step by step. The format of your instruction is: `Instruction: [YOUR INSTRUCTION]`, where "Instruction" describes a sub-task or question.
51+
You must give me one instruction at a time.
52+
I must write a response that appropriately solves the requested instruction.
53+
You should instruct me not ask me questions.
54+
55+
Please note that the task may be very complicated. Do not attempt to solve the task by single step. You must instruct me to find the answer step by step.
56+
Here are some tips that will help you to give more valuable instructions about our task to me:
57+
<tips>
58+
- I can use various tools, such as Excel Toolkit and Code Execution Toolkit.
59+
60+
- Although the task may be complex, the answer exists.
61+
If you find that the current approach does not lead to the answer, reconsider the task, and use alternative methods or tools to achieve the same goal.
62+
63+
- Always remind me to verify whether the final answer is correct!
64+
This can be done in multiple ways, such as screenshots, web analysis, etc.
65+
66+
- If I have written code, remind me to run the code and obtain the results.
67+
68+
- Flexibly use code to solve problems, especially for Excel-related tasks.
69+
70+
</tips>
71+
72+
Now, here is the overall task: <task>{self.task_prompt}</task>. Never forget our task!
73+
74+
Now you must start to instruct me to solve the task step-by-step. Do not add anything else other than your instruction!
75+
Keep giving me instructions until you think the task is completed.
76+
When the task is completed, you must only reply with a single word <TASK_DONE>.
77+
Never say <TASK_DONE> unless my responses have solved your task.
78+
"""
79+
80+
assistant_system_prompt = f"""
81+
===== RULES OF ASSISTANT =====
82+
Never forget you are a assistant and I am a user. Never flip roles! Never instruct me! You have to utilize your available tools to solve the task I assigned.
83+
We share a common interest in collaborating to successfully complete a complex task.
84+
You must help me to complete the task.
85+
86+
Here is our overall task: {self.task_prompt}. Never forget our task!
87+
88+
I must instruct you based on your expertise and my needs to complete the task. An instruction is typically a sub-task or question.
89+
90+
You must leverage your available tools, try your best to solve the problem, and explain your solutions.
91+
Unless I say the task is completed, you should always start with:
92+
Solution: [YOUR_SOLUTION]
93+
[YOUR_SOLUTION] should be specific, including detailed explanations and provide preferable detailed implementations and examples and lists for task-solving.
94+
95+
Please note that our overall task may be very complicated. Here are some tips that may help you solve the task:
96+
<tips>
97+
- If one method fails, try another. The answer exists!
98+
- When it comes to viewing information in an Excel file, you can always start by writing Python code to read the Excel file and check sheet names, column names, and similar details.
99+
- When providing Python code, always remember to import the necessary libraries at the beginning, such as the commonly used libraries for Excel analysis below:
100+
```
101+
import pandas as pd
102+
```
103+
- Always verify whether your final answer is correct!
104+
- Always write complete code from scratch. After writing the code, be sure to run it and obtain the results!
105+
If you encounter errors, try debugging the code.
106+
Note that the code execution environment does not support interactive input.
107+
- If the tool fails to run or the code does not execute correctly,
108+
never assume that it has returned the correct result and continue reasoning based on it!
109+
The correct approach is to analyze the cause of the error and try to fix it!
110+
</tips>
111+
112+
"""
113+
114+
user_sys_msg = BaseMessage.make_user_message(
115+
role_name=self.user_role_name, content=user_system_prompt
116+
)
117+
118+
assistant_sys_msg = BaseMessage.make_assistant_message(
119+
role_name=self.assistant_role_name, content=assistant_system_prompt
120+
)
121+
122+
return user_sys_msg, assistant_sys_msg
123+
124+
def run_society(
125+
society: ExcelRolePalying,
126+
round_limit: int = 15,
127+
) -> Tuple[str, List[dict], dict]:
128+
overall_completion_token_count = 0
129+
overall_prompt_token_count = 0
130+
131+
chat_history = []
132+
init_prompt = """
133+
Now please give me instructions to solve over overall task step by step. If the task requires some specific knowledge, please instruct me to use tools to complete the task.
134+
"""
135+
input_msg = society.init_chat(init_prompt)
136+
for _round in range(round_limit):
137+
assistant_response, user_response = society.step(input_msg)
138+
# Check if usage info is available before accessing it
139+
if assistant_response.info.get("usage") and user_response.info.get("usage"):
140+
overall_completion_token_count += assistant_response.info["usage"].get(
141+
"completion_tokens", 0
142+
) + user_response.info["usage"].get("completion_tokens", 0)
143+
overall_prompt_token_count += assistant_response.info["usage"].get(
144+
"prompt_tokens", 0
145+
) + user_response.info["usage"].get("prompt_tokens", 0)
146+
147+
# convert tool call to dict
148+
tool_call_records: List[dict] = []
149+
if assistant_response.info.get("tool_calls"):
150+
for tool_call in assistant_response.info["tool_calls"]:
151+
tool_call_records.append(tool_call.as_dict())
152+
153+
_data = {
154+
"user": user_response.msg.content
155+
if hasattr(user_response, "msg") and user_response.msg
156+
else "",
157+
"assistant": assistant_response.msg.content
158+
if hasattr(assistant_response, "msg") and assistant_response.msg
159+
else "",
160+
"tool_calls": tool_call_records,
161+
}
162+
163+
chat_history.append(_data)
164+
logger.info(
165+
f"Round #{_round} user_response:\n {user_response.msgs[0].content if user_response.msgs and len(user_response.msgs) > 0 else ''}"
166+
)
167+
logger.info(
168+
f"Round #{_round} assistant_response:\n {assistant_response.msgs[0].content if assistant_response.msgs and len(assistant_response.msgs) > 0 else ''}"
169+
)
170+
171+
if (
172+
assistant_response.terminated
173+
or user_response.terminated
174+
or "TASK_DONE" in user_response.msg.content
175+
):
176+
break
177+
178+
input_msg = assistant_response.msg
179+
180+
answer = chat_history[-1]["assistant"]
181+
token_info = {
182+
"completion_token_count": overall_completion_token_count,
183+
"prompt_token_count": overall_prompt_token_count,
184+
}
185+
186+
return answer, chat_history, token_info
187+
188+
def construct_society(question: str) -> ExcelRolePalying:
189+
r"""Construct a society of agents based on the given question.
190+
191+
Args:
192+
question (str): The task or question to be addressed by the society.
193+
194+
Returns:
195+
OwlRolePlaying: A configured society of agents ready to address the question.
196+
"""
197+
198+
# Create models for different components using Azure OpenAI
199+
base_model_config = {
200+
"model_platform": ModelPlatformType.AZURE,
201+
"model_type": os.getenv("AZURE_OPENAI_MODEL_TYPE"),
202+
"model_config_dict": ChatGPTConfig(temperature=0.01, max_tokens=4096).as_dict(),
203+
}
204+
205+
206+
models = {
207+
"user": ModelFactory.create(**base_model_config),
208+
"assistant": ModelFactory.create(**base_model_config),
209+
}
210+
211+
# Configure toolkits
212+
tools = [
213+
*CodeExecutionToolkit(sandbox="subprocess", verbose=True).get_tools(),
214+
*ExcelToolkit().get_tools(),
215+
*FileWriteToolkit(output_dir="./").get_tools(),
216+
]
217+
218+
# Configure agent roles and parameters
219+
user_agent_kwargs = {"model": models["user"]}
220+
assistant_agent_kwargs = {"model": models["assistant"], "tools": tools}
221+
222+
# Configure task parameters
223+
task_kwargs = {
224+
"task_prompt": question,
225+
"with_task_specify": False,
226+
}
227+
228+
# Create and return the society
229+
society = ExcelRolePalying(
230+
**task_kwargs,
231+
user_role_name="user",
232+
user_agent_kwargs=user_agent_kwargs,
233+
assistant_role_name="assistant",
234+
assistant_agent_kwargs=assistant_agent_kwargs,
235+
output_language="English"
236+
)
237+
238+
return society
239+
240+
241+
def main():
242+
# Example question
243+
244+
default_task = """Please help analyze the file `./data/admission_en.xlsx` by:
245+
- Calculating the number of admitted students, as well as the highest and lowest scores for each college
246+
- Plotting this information in a single chart: use a bar chart for the number of admitted students, and line charts for the highest and lowest scores
247+
- Saving the generated chart as `vis_en.png` in the current directory"""
248+
249+
set_log_file('log.txt')
250+
251+
# Override default task if command line argument is provided
252+
task = sys.argv[1] if len(sys.argv) > 1 else default_task
253+
254+
# Construct and run the society
255+
society = construct_society(task)
256+
257+
answer, chat_history, token_count = run_society(society)
258+
259+
# Output the result
260+
print(f"\033[94mAnswer: {answer}\033[0m")
261+
262+
263+
if __name__ == "__main__":
264+
main()

0 commit comments

Comments
 (0)