Skip to content

Commit 1dd3d09

Browse files
committed
Pass parallel_tool_calls directly and document intended usage in integration test
Signed-off-by: Anastas Stoyanovsky <[email protected]>
1 parent 91f1b35 commit 1dd3d09

File tree

8 files changed

+141
-30
lines changed

8 files changed

+141
-30
lines changed

docs/docs/providers/agents/index.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
description: |
33
Agents
44
5-
APIs for creating and interacting with agentic systems.
5+
APIs for creating and interacting with agentic systems.
66
sidebar_label: Agents
77
title: Agents
88
---
@@ -13,6 +13,6 @@ title: Agents
1313

1414
Agents
1515

16-
APIs for creating and interacting with agentic systems.
16+
APIs for creating and interacting with agentic systems.
1717

1818
This section contains documentation for all available providers for the **agents** API.
Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
11
---
22
description: |
33
The Batches API enables efficient processing of multiple requests in a single operation,
4-
particularly useful for processing large datasets, batch evaluation workflows, and
5-
cost-effective inference at scale.
4+
particularly useful for processing large datasets, batch evaluation workflows, and
5+
cost-effective inference at scale.
66
7-
The API is designed to allow use of openai client libraries for seamless integration.
7+
The API is designed to allow use of openai client libraries for seamless integration.
88
9-
This API provides the following extensions:
10-
- idempotent batch creation
9+
This API provides the following extensions:
10+
- idempotent batch creation
1111
12-
Note: This API is currently under active development and may undergo changes.
12+
Note: This API is currently under active development and may undergo changes.
1313
sidebar_label: Batches
1414
title: Batches
1515
---
@@ -19,14 +19,14 @@ title: Batches
1919
## Overview
2020

2121
The Batches API enables efficient processing of multiple requests in a single operation,
22-
particularly useful for processing large datasets, batch evaluation workflows, and
23-
cost-effective inference at scale.
22+
particularly useful for processing large datasets, batch evaluation workflows, and
23+
cost-effective inference at scale.
2424

25-
The API is designed to allow use of openai client libraries for seamless integration.
25+
The API is designed to allow use of openai client libraries for seamless integration.
2626

27-
This API provides the following extensions:
28-
- idempotent batch creation
27+
This API provides the following extensions:
28+
- idempotent batch creation
2929

30-
Note: This API is currently under active development and may undergo changes.
30+
Note: This API is currently under active development and may undergo changes.
3131

3232
This section contains documentation for all available providers for the **batches** API.

docs/docs/providers/eval/index.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
description: |
33
Evaluations
44
5-
Llama Stack Evaluation API for running evaluations on model and agent candidates.
5+
Llama Stack Evaluation API for running evaluations on model and agent candidates.
66
sidebar_label: Eval
77
title: Eval
88
---
@@ -13,6 +13,6 @@ title: Eval
1313

1414
Evaluations
1515

16-
Llama Stack Evaluation API for running evaluations on model and agent candidates.
16+
Llama Stack Evaluation API for running evaluations on model and agent candidates.
1717

1818
This section contains documentation for all available providers for the **eval** API.

docs/docs/providers/files/index.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
description: |
33
Files
44
5-
This API is used to upload documents that can be used with other Llama Stack APIs.
5+
This API is used to upload documents that can be used with other Llama Stack APIs.
66
sidebar_label: Files
77
title: Files
88
---
@@ -13,6 +13,6 @@ title: Files
1313

1414
Files
1515

16-
This API is used to upload documents that can be used with other Llama Stack APIs.
16+
This API is used to upload documents that can be used with other Llama Stack APIs.
1717

1818
This section contains documentation for all available providers for the **files** API.

docs/docs/providers/inference/index.mdx

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,12 @@
22
description: |
33
Inference
44
5-
Llama Stack Inference API for generating completions, chat completions, and embeddings.
5+
Llama Stack Inference API for generating completions, chat completions, and embeddings.
66
7-
This API provides the raw interface to the underlying models. Three kinds of models are supported:
8-
- LLM models: these models generate "raw" and "chat" (conversational) completions.
9-
- Embedding models: these models generate embeddings to be used for semantic search.
10-
- Rerank models: these models reorder the documents based on their relevance to a query.
7+
This API provides the raw interface to the underlying models. Three kinds of models are supported:
8+
- LLM models: these models generate "raw" and "chat" (conversational) completions.
9+
- Embedding models: these models generate embeddings to be used for semantic search.
10+
- Rerank models: these models reorder the documents based on their relevance to a query.
1111
sidebar_label: Inference
1212
title: Inference
1313
---
@@ -18,11 +18,11 @@ title: Inference
1818

1919
Inference
2020

21-
Llama Stack Inference API for generating completions, chat completions, and embeddings.
21+
Llama Stack Inference API for generating completions, chat completions, and embeddings.
2222

23-
This API provides the raw interface to the underlying models. Three kinds of models are supported:
24-
- LLM models: these models generate "raw" and "chat" (conversational) completions.
25-
- Embedding models: these models generate embeddings to be used for semantic search.
26-
- Rerank models: these models reorder the documents based on their relevance to a query.
23+
This API provides the raw interface to the underlying models. Three kinds of models are supported:
24+
- LLM models: these models generate "raw" and "chat" (conversational) completions.
25+
- Embedding models: these models generate embeddings to be used for semantic search.
26+
- Rerank models: these models reorder the documents based on their relevance to a query.
2727

2828
This section contains documentation for all available providers for the **inference** API.

docs/docs/providers/safety/index.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
description: |
33
Safety
44
5-
OpenAI-compatible Moderations API.
5+
OpenAI-compatible Moderations API.
66
sidebar_label: Safety
77
title: Safety
88
---
@@ -13,6 +13,6 @@ title: Safety
1313

1414
Safety
1515

16-
OpenAI-compatible Moderations API.
16+
OpenAI-compatible Moderations API.
1717

1818
This section contains documentation for all available providers for the **safety** API.

src/llama_stack/providers/inline/agents/meta_reference/responses/streaming.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -242,6 +242,7 @@ async def create_response(self) -> AsyncIterator[OpenAIResponseObjectStream]:
242242
messages=messages,
243243
# Pydantic models are dict-compatible but mypy treats them as distinct types
244244
tools=self.ctx.chat_tools, # type: ignore[arg-type]
245+
parallel_tool_calls=self.parallel_tool_calls,
245246
stream=True,
246247
temperature=self.ctx.temperature,
247248
response_format=response_format,

tests/integration/agents/test_openai_responses.py

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -682,3 +682,113 @@ def test_max_tool_calls_with_builtin_tools(openai_client, client_with_models, te
682682

683683
# Verify we have a valid max_tool_calls field
684684
assert response_3.max_tool_calls == max_tool_calls[1]
685+
686+
687+
@pytest.mark.skip(reason="Tool calling is not reliable.")
688+
def test_parallel_tool_calls_true(openai_client, client_with_models, text_model_id):
689+
"""Test handling of parallel_tool_calls with function tools in responses."""
690+
if isinstance(client_with_models, LlamaStackAsLibraryClient):
691+
pytest.skip("OpenAI responses are not supported when testing with library client yet.")
692+
693+
client = openai_client
694+
parallel_tool_calls = True
695+
696+
tools = [
697+
{
698+
"type": "function",
699+
"name": "get_weather",
700+
"description": "Get weather information for a specified location",
701+
"parameters": {
702+
"type": "object",
703+
"properties": {
704+
"location": {
705+
"type": "string",
706+
"description": "The city name (e.g., 'New York', 'London')",
707+
},
708+
},
709+
},
710+
}
711+
]
712+
713+
# First create a response that triggers function tools
714+
response = client.responses.create(
715+
model=text_model_id,
716+
input="Get the weather in New York and in Paris",
717+
tools=tools,
718+
stream=False,
719+
parallel_tool_calls=parallel_tool_calls,
720+
)
721+
722+
# Verify we got two function calls
723+
assert len(response.output) == 2
724+
assert response.output[0].type == "function_call"
725+
assert response.output[0].name == "get_weather"
726+
assert response.output[0].status == "completed"
727+
assert response.output[1].type == "function_call"
728+
assert response.output[1].name == "get_weather"
729+
assert response.output[0].status == "completed"
730+
731+
# Verify we have a valid parallel_tool_calls field
732+
assert response.parallel_tool_calls == parallel_tool_calls
733+
734+
735+
@pytest.mark.skip(reason="Tool calling is not reliable.")
736+
def test_parallel_tool_calls_false(openai_client, client_with_models, text_model_id):
737+
"""Test handling of parallel_tool_calls with function tools in responses."""
738+
if isinstance(client_with_models, LlamaStackAsLibraryClient):
739+
pytest.skip("OpenAI responses are not supported when testing with library client yet.")
740+
741+
client = openai_client
742+
parallel_tool_calls = False
743+
744+
tools = [
745+
{
746+
"type": "function",
747+
"name": "get_weather",
748+
"description": "Get weather information for a specified location",
749+
"parameters": {
750+
"type": "object",
751+
"properties": {
752+
"location": {
753+
"type": "string",
754+
"description": "The city name (e.g., 'New York', 'London')",
755+
},
756+
},
757+
},
758+
}
759+
]
760+
761+
# First create a response that triggers function tools
762+
response = client.responses.create(
763+
model=text_model_id,
764+
input="Get the weather in New York and in Paris",
765+
tools=tools,
766+
stream=False,
767+
parallel_tool_calls=parallel_tool_calls,
768+
)
769+
770+
# Verify we got the first function call
771+
assert len(response.output) == 1
772+
assert response.output[0].type == "function_call"
773+
assert response.output[0].name == "get_weather"
774+
assert response.output[0].status == "completed"
775+
776+
# Verify we have a valid parallel_tool_calls field
777+
assert response.parallel_tool_calls == parallel_tool_calls
778+
779+
response2 = client.responses.create(
780+
model=text_model_id,
781+
input=[
782+
{"role": "user", "content": "Check the weather in Paris and New York."},
783+
{"call_id": response.output[0].call_id, "type": "function_call_output", "output": "18 c"},
784+
],
785+
tools=tools,
786+
stream=False,
787+
parallel_tool_calls=parallel_tool_calls,
788+
)
789+
790+
# Verify we got the second function call
791+
assert len(response.output) == 1
792+
assert response2.output[0].type == "function_call"
793+
assert response2.output[0].name == "get_weather"
794+
assert response2.output[0].status == "completed"

0 commit comments

Comments
 (0)