Pass parallel_tool_calls directly and document intended usage in integration test

anastasds · anastasds · commit 1dd3d097c159 · 2025-11-19T14:33:54.000-05:00
Signed-off-by: Anastas Stoyanovsky &lt;astoyano@redhat.com&gt;
diff --git a/docs/docs/providers/agents/index.mdx b/docs/docs/providers/agents/index.mdx
@@ -2,7 +2,7 @@
 description: |
   Agents
 
-      APIs for creating and interacting with agentic systems.
+  APIs for creating and interacting with agentic systems.
 sidebar_label: Agents
 title: Agents
 ---
@@ -13,6 +13,6 @@ title: Agents
 
 Agents
 
-    APIs for creating and interacting with agentic systems.
+APIs for creating and interacting with agentic systems.
 
 This section contains documentation for all available providers for the **agents** API.
diff --git a/docs/docs/providers/batches/index.mdx b/docs/docs/providers/batches/index.mdx
@@ -1,15 +1,15 @@
 ---
 description: |
   The Batches API enables efficient processing of multiple requests in a single operation,
-      particularly useful for processing large datasets, batch evaluation workflows, and
-      cost-effective inference at scale.
+  particularly useful for processing large datasets, batch evaluation workflows, and
+  cost-effective inference at scale.
 
-      The API is designed to allow use of openai client libraries for seamless integration.
+  The API is designed to allow use of openai client libraries for seamless integration.
 
-      This API provides the following extensions:
-       - idempotent batch creation
+  This API provides the following extensions:
+   - idempotent batch creation
 
-      Note: This API is currently under active development and may undergo changes.
+  Note: This API is currently under active development and may undergo changes.
 sidebar_label: Batches
 title: Batches
 ---
@@ -19,14 +19,14 @@ title: Batches
 ## Overview
 
 The Batches API enables efficient processing of multiple requests in a single operation,
-    particularly useful for processing large datasets, batch evaluation workflows, and
-    cost-effective inference at scale.
+particularly useful for processing large datasets, batch evaluation workflows, and
+cost-effective inference at scale.
 
-    The API is designed to allow use of openai client libraries for seamless integration.
+The API is designed to allow use of openai client libraries for seamless integration.
 
-    This API provides the following extensions:
-     - idempotent batch creation
+This API provides the following extensions:
+ - idempotent batch creation
 
-    Note: This API is currently under active development and may undergo changes.
+Note: This API is currently under active development and may undergo changes.
 
 This section contains documentation for all available providers for the **batches** API.
diff --git a/docs/docs/providers/eval/index.mdx b/docs/docs/providers/eval/index.mdx
@@ -2,7 +2,7 @@
 description: |
   Evaluations
 
-      Llama Stack Evaluation API for running evaluations on model and agent candidates.
+  Llama Stack Evaluation API for running evaluations on model and agent candidates.
 sidebar_label: Eval
 title: Eval
 ---
@@ -13,6 +13,6 @@ title: Eval
 
 Evaluations
 
-    Llama Stack Evaluation API for running evaluations on model and agent candidates.
+Llama Stack Evaluation API for running evaluations on model and agent candidates.
 
 This section contains documentation for all available providers for the **eval** API.
diff --git a/docs/docs/providers/files/index.mdx b/docs/docs/providers/files/index.mdx
@@ -2,7 +2,7 @@
 description: |
   Files
 
-      This API is used to upload documents that can be used with other Llama Stack APIs.
+  This API is used to upload documents that can be used with other Llama Stack APIs.
 sidebar_label: Files
 title: Files
 ---
@@ -13,6 +13,6 @@ title: Files
 
 Files
 
-    This API is used to upload documents that can be used with other Llama Stack APIs.
+This API is used to upload documents that can be used with other Llama Stack APIs.
 
 This section contains documentation for all available providers for the **files** API.
diff --git a/docs/docs/providers/inference/index.mdx b/docs/docs/providers/inference/index.mdx
@@ -2,12 +2,12 @@
 description: |
   Inference
 
-      Llama Stack Inference API for generating completions, chat completions, and embeddings.
+  Llama Stack Inference API for generating completions, chat completions, and embeddings.
 
-      This API provides the raw interface to the underlying models. Three kinds of models are supported:
-      - LLM models: these models generate "raw" and "chat" (conversational) completions.
-      - Embedding models: these models generate embeddings to be used for semantic search.
-      - Rerank models: these models reorder the documents based on their relevance to a query.
+  This API provides the raw interface to the underlying models. Three kinds of models are supported:
+  - LLM models: these models generate "raw" and "chat" (conversational) completions.
+  - Embedding models: these models generate embeddings to be used for semantic search.
+  - Rerank models: these models reorder the documents based on their relevance to a query.
 sidebar_label: Inference
 title: Inference
 ---
@@ -18,11 +18,11 @@ title: Inference
 
 Inference
 
-    Llama Stack Inference API for generating completions, chat completions, and embeddings.
+Llama Stack Inference API for generating completions, chat completions, and embeddings.
 
-    This API provides the raw interface to the underlying models. Three kinds of models are supported:
-    - LLM models: these models generate "raw" and "chat" (conversational) completions.
-    - Embedding models: these models generate embeddings to be used for semantic search.
-    - Rerank models: these models reorder the documents based on their relevance to a query.
+This API provides the raw interface to the underlying models. Three kinds of models are supported:
+- LLM models: these models generate "raw" and "chat" (conversational) completions.
+- Embedding models: these models generate embeddings to be used for semantic search.
+- Rerank models: these models reorder the documents based on their relevance to a query.
 
 This section contains documentation for all available providers for the **inference** API.
diff --git a/docs/docs/providers/safety/index.mdx b/docs/docs/providers/safety/index.mdx
@@ -2,7 +2,7 @@
 description: |
   Safety
 
-      OpenAI-compatible Moderations API.
+  OpenAI-compatible Moderations API.
 sidebar_label: Safety
 title: Safety
 ---
@@ -13,6 +13,6 @@ title: Safety
 
 Safety
 
-    OpenAI-compatible Moderations API.
+OpenAI-compatible Moderations API.
 
 This section contains documentation for all available providers for the **safety** API.
diff --git a/src/llama_stack/providers/inline/agents/meta_reference/responses/streaming.py b/src/llama_stack/providers/inline/agents/meta_reference/responses/streaming.py
@@ -242,6 +242,7 @@ async def create_response(self) -> AsyncIterator[OpenAIResponseObjectStream]:
                     messages=messages,
                     # Pydantic models are dict-compatible but mypy treats them as distinct types
                     tools=self.ctx.chat_tools,  # type: ignore[arg-type]
+                    parallel_tool_calls=self.parallel_tool_calls,
                     stream=True,
                     temperature=self.ctx.temperature,
                     response_format=response_format,
diff --git a/tests/integration/agents/test_openai_responses.py b/tests/integration/agents/test_openai_responses.py
@@ -682,3 +682,113 @@ def test_max_tool_calls_with_builtin_tools(openai_client, client_with_models, te
 
     # Verify we have a valid max_tool_calls field
     assert response_3.max_tool_calls == max_tool_calls[1]
+
+
+@pytest.mark.skip(reason="Tool calling is not reliable.")
+def test_parallel_tool_calls_true(openai_client, client_with_models, text_model_id):
+    """Test handling of parallel_tool_calls with function tools in responses."""
+    if isinstance(client_with_models, LlamaStackAsLibraryClient):
+        pytest.skip("OpenAI responses are not supported when testing with library client yet.")
+
+    client = openai_client
+    parallel_tool_calls = True
+
+    tools = [
+        {
+            "type": "function",
+            "name": "get_weather",
+            "description": "Get weather information for a specified location",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "location": {
+                        "type": "string",
+                        "description": "The city name (e.g., 'New York', 'London')",
+                    },
+                },
+            },
+        }
+    ]
+
+    # First create a response that triggers function tools
+    response = client.responses.create(
+        model=text_model_id,
+        input="Get the weather in New York and in Paris",
+        tools=tools,
+        stream=False,
+        parallel_tool_calls=parallel_tool_calls,
+    )
+
+    # Verify we got two function calls
+    assert len(response.output) == 2
+    assert response.output[0].type == "function_call"
+    assert response.output[0].name == "get_weather"
+    assert response.output[0].status == "completed"
+    assert response.output[1].type == "function_call"
+    assert response.output[1].name == "get_weather"
+    assert response.output[0].status == "completed"
+
+    # Verify we have a valid parallel_tool_calls field
+    assert response.parallel_tool_calls == parallel_tool_calls
+
+
+@pytest.mark.skip(reason="Tool calling is not reliable.")
+def test_parallel_tool_calls_false(openai_client, client_with_models, text_model_id):
+    """Test handling of parallel_tool_calls with function tools in responses."""
+    if isinstance(client_with_models, LlamaStackAsLibraryClient):
+        pytest.skip("OpenAI responses are not supported when testing with library client yet.")
+
+    client = openai_client
+    parallel_tool_calls = False
+
+    tools = [
+        {
+            "type": "function",
+            "name": "get_weather",
+            "description": "Get weather information for a specified location",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "location": {
+                        "type": "string",
+                        "description": "The city name (e.g., 'New York', 'London')",
+                    },
+                },
+            },
+        }
+    ]
+
+    # First create a response that triggers function tools
+    response = client.responses.create(
+        model=text_model_id,
+        input="Get the weather in New York and in Paris",
+        tools=tools,
+        stream=False,
+        parallel_tool_calls=parallel_tool_calls,
+    )
+
+    # Verify we got the first function call
+    assert len(response.output) == 1
+    assert response.output[0].type == "function_call"
+    assert response.output[0].name == "get_weather"
+    assert response.output[0].status == "completed"
+
+    # Verify we have a valid parallel_tool_calls field
+    assert response.parallel_tool_calls == parallel_tool_calls
+
+    response2 = client.responses.create(
+        model=text_model_id,
+        input=[
+            {"role": "user", "content": "Check the weather in Paris and New York."},
+            {"call_id": response.output[0].call_id, "type": "function_call_output", "output": "18 c"},
+        ],
+        tools=tools,
+        stream=False,
+        parallel_tool_calls=parallel_tool_calls,
+    )
+
+    # Verify we got the second function call
+    assert len(response.output) == 1
+    assert response2.output[0].type == "function_call"
+    assert response2.output[0].name == "get_weather"
+    assert response2.output[0].status == "completed"