You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/rai_bench/README.md
+14-13Lines changed: 14 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,9 +10,9 @@ The Manipulation O3DE Benchmark [manipulation_o3de_benchmark_module](./rai_bench
10
10
-**GroupObjectsTask**
11
11
-**BuildCubeTowerTask**
12
12
-**PlaceObjectAtCoordTask**
13
-
-**RotateObjectTask** (currently not applicable due to limitations in the ManipulatorMoveTo tool)
13
+
-**RotateObjectTask** (currently not applicable due to limitations in the `ManipulatorMoveTo` tool)
14
14
15
-
The result of a task is a value between 0 and 1, calculated like initially_misplaced_now_correct / initially_misplaced. This score is calculated at the end of each scenario.
15
+
The result of a task is a value between 0 and 1, calculated like `initially_misplaced_now_correct / initially_misplaced`. This score is calculated at the end of each scenario.
16
16
17
17
### Frame Components
18
18
@@ -92,15 +92,15 @@ python src/rai_bench/rai_bench/examples/manipulation_o3de/main.py --model-name l
92
92
```
93
93
94
94
> [!NOTE]
95
-
> For now benchmark runs all available scenarios (~160). See [Examples](#example-usege)
95
+
> For now benchmark runs all available scenarios (~160). See [Examples](#example-usage)
96
96
> section for details.
97
97
98
98
### Development
99
99
100
100
When creating new task or changing existing ones, make sure to add unit tests for score calculation in [rai_bench_tests](../../tests/rai_bench/manipulation_o3de/tasks/).
101
101
This applies also when you are adding or changing the helper methods in `Task` or `ManipulationTask`.
102
102
103
-
The number of scenarios can be easily extened without writing new tasks, by increasing number of variants of the same task and adding more simulation configs but it won't improve variety of scenarios as much as creating new tasks.
103
+
The number of scenarios can be easily extended without writing new tasks, by increasing number of variants of the same task and adding more simulation configs but it won't improve variety of scenarios as much as creating new tasks.
104
104
105
105
## Tool Calling Agent Benchmark
106
106
@@ -109,15 +109,16 @@ The Tool Calling Agent Benchmark is the benchmark for LangChain tool calling age
-[Scores tracing](rai_bench/tool_calling_agent_bench/scores_tracing.py) - Component handling sending scores to tracing backends
113
-
-[Interfaces](rai_bench//tool_calling_agent/interfaces.py) - Interfaces for validation classes - Task, Validator, SubTask
114
-
For detailed description of validation visit -> [Validation](.//rai_bench/docs/tool_calling_agent_benchmark.md)
112
+
-[Scores tracing](rai_bench/results_processing/langfuse_scores_tracing.py) - Component handling sending scores to tracing backends
113
+
-[Interfaces](rai_bench//tool_calling_agent/interfaces.py) - Interfaces for validation classes - `Task`, `Validator`, `SubTask`
115
114
116
-
[tool_calling_agent_test_bench.py](rai_bench/examples/tool_calling_agent/main.py) - Script providing benchmark on tasks based on the ROS2 tools usage.
115
+
For detailed description of validation visit -> [Validation](./rai_bench/docs/tool_calling_agent_benchmark.md)
116
+
117
+
[tool_calling_agent_test_bench.py](./rai_bench/examples/tool_calling_agent/main.py) - Script providing benchmark on tasks based on the ROS2 tools usage.
117
118
118
119
### Example Usage
119
120
120
-
Validators can be constructed from any SubTasks, Tasks can be validated by any numer of Validators, which makes whole validation process incredibly versital.
121
+
`Validators` can be constructed from any `SubTasks`, `Tasks` can be validated by any number of `Validators`, which makes whole validation process incredibly versatile.
0 commit comments