Merge branch 'main' into evaluate-cleanup

chenmoneygithub · chenmoneygithub · commit 8ee8ade249b0 · 2025-07-16T17:09:35.000-07:00
diff --git a/docs/docs/learn/evaluation/data.md b/docs/docs/learn/evaluation/data.md
@@ -72,7 +72,7 @@ print("Example object with Non-Input fields only:", non_input_key_only)
 
 **Output**
 ```
-Example object with Input fields only: Example({'article': 'This is an article.'}) (input_keys=None)
+Example object with Input fields only: Example({'article': 'This is an article.'}) (input_keys={'article'})
 Example object with Non-Input fields only: Example({'summary': 'This is a summary.'}) (input_keys=None)
 ```
 
diff --git a/docs/docs/tutorials/conversation_history/index.md b/docs/docs/tutorials/conversation_history/index.md
@@ -0,0 +1,219 @@
+# Managing Conversation History
+
+Maintaining conversation history is a fundamental feature when building AI applications such as chatbots. While DSPy does not provide automatic conversation history management within `dspy.Module`, it offers the `dspy.History` utility to help you manage conversation history effectively.
+
+## Using `dspy.History` to Manage Conversation History
+
+The `dspy.History` class can be used as an input field type, containing a `messages: list[dict[str, Any]]` attribute that stores the conversation history. Each entry in this list is a dictionary with keys corresponding to the fields defined in your signature. See the example below:
+
+```python
+import dspy
+import os
+
+os.environ["OPENAI_API_KEY"] = "{your_openai_api_key}"
+
+dspy.settings.configure(lm=dspy.LM("openai/gpt-4o-mini"))
+
+class QA(dspy.Signature):
+    question: str = dspy.InputField()
+    history: dspy.History = dspy.InputField()
+    answer: str = dspy.OutputField()
+
+predict = dspy.Predict(QA)
+history = dspy.History(messages=[])
+
+while True:
+    question = input("Type your question, end conversation by typing 'finish': ")
+    if question == "finish":
+        break
+    outputs = predict(question=question, history=history)
+    print(f"\n{outputs.answer}\n")
+    history.messages.append({"question": question, **outputs})
+
+dspy.inspect_history()
+```
+
+There are two key steps when using the conversation history:
+
+- **Include a field of type `dspy.History` in your Signature.**
+- **Maintain a history instance at runtime, appending new conversation turns to it.** Each entry should include all relevant input and output field information.
+
+A sample run might look like this:
+
+```
+Type your question, end conversation by typing 'finish': do you know the competition between pytorch and tensorflow?
+
+Yes, there is a notable competition between PyTorch and TensorFlow, which are two of the most popular deep learning frameworks. PyTorch, developed by Facebook, is known for its dynamic computation graph, which allows for more flexibility and ease of use, especially in research settings. TensorFlow, developed by Google, initially used a static computation graph but has since introduced eager execution to improve usability. TensorFlow is often favored in production environments due to its scalability and deployment capabilities. Both frameworks have strong communities and extensive libraries, and the choice between them often depends on specific project requirements and personal preference.
+
+Type your question, end conversation by typing 'finish': which one won the battle? just tell me the result, don't include any reasoning, thanks!
+
+There is no definitive winner; both PyTorch and TensorFlow are widely used and have their own strengths.
+Type your question, end conversation by typing 'finish': finish
+
+
+
+
+[2025-07-11T16:35:57.592762]
+
+System message:
+
+Your input fields are:
+1. `question` (str): 
+2. `history` (History):
+Your output fields are:
+1. `answer` (str):
+All interactions will be structured in the following way, with the appropriate values filled in.
+
+[[ ## question ## ]]
+{question}
+
+[[ ## history ## ]]
+{history}
+
+[[ ## answer ## ]]
+{answer}
+
+[[ ## completed ## ]]
+In adhering to this structure, your objective is: 
+        Given the fields `question`, `history`, produce the fields `answer`.
+
+
+User message:
+
+[[ ## question ## ]]
+do you know the competition between pytorch and tensorflow?
+
+
+Assistant message:
+
+[[ ## answer ## ]]
+Yes, there is a notable competition between PyTorch and TensorFlow, which are two of the most popular deep learning frameworks. PyTorch, developed by Facebook, is known for its dynamic computation graph, which allows for more flexibility and ease of use, especially in research settings. TensorFlow, developed by Google, initially used a static computation graph but has since introduced eager execution to improve usability. TensorFlow is often favored in production environments due to its scalability and deployment capabilities. Both frameworks have strong communities and extensive libraries, and the choice between them often depends on specific project requirements and personal preference.
+
+[[ ## completed ## ]]
+
+
+User message:
+
+[[ ## question ## ]]
+which one won the battle? just tell me the result, don't include any reasoning, thanks!
+
+Respond with the corresponding output fields, starting with the field `[[ ## answer ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.
+
+
+Response:
+
+[[ ## answer ## ]]
+There is no definitive winner; both PyTorch and TensorFlow are widely used and have their own strengths.
+
+[[ ## completed ## ]]
+```
+
+Notice how each user input and assistant response is appended to the history, allowing the model to maintain context across turns.
+
+The actual prompt sent to the language model is a multi-turn message, as shown by the output of `dspy.inspect_history`. Each conversation turn is represented as a user message followed by an assistant message.
+
+## History in Few-shot Examples
+
+You may notice that `history` does not appear in the input fields section of the prompt, even though it is listed as an input field (e.g., "2. `history` (History):" in the system message). This is intentional: when formatting few-shot examples that include conversation history, DSPy does not expand the history into multiple turns. Instead, to remain compatible with the OpenAI standard format, each few-shot example is represented as a single turn.
+
+For example:
+
+```
+import dspy
+
+dspy.settings.configure(lm=dspy.LM("openai/gpt-4o-mini"))
+
+
+class QA(dspy.Signature):
+    question: str = dspy.InputField()
+    history: dspy.History = dspy.InputField()
+    answer: str = dspy.OutputField()
+
+
+predict = dspy.Predict(QA)
+history = dspy.History(messages=[])
+
+predict.demos.append(
+    dspy.Example(
+        question="What is the capital of France?",
+        history=dspy.History(
+            messages=[{"question": "What is the capital of Germany?", "answer": "The capital of Germany is Berlin."}]
+        ),
+        answer="The capital of France is Paris.",
+    )
+)
+
+predict(question="What is the capital of America?", history=dspy.History(messages=[]))
+dspy.inspect_history()
+```
+
+The resulting history will look like this:
+
+```
+[2025-07-11T16:53:10.994111]
+
+System message:
+
+Your input fields are:
+1. `question` (str): 
+2. `history` (History):
+Your output fields are:
+1. `answer` (str):
+All interactions will be structured in the following way, with the appropriate values filled in.
+
+[[ ## question ## ]]
+{question}
+
+[[ ## history ## ]]
+{history}
+
+[[ ## answer ## ]]
+{answer}
+
+[[ ## completed ## ]]
+In adhering to this structure, your objective is: 
+        Given the fields `question`, `history`, produce the fields `answer`.
+
+
+User message:
+
+[[ ## question ## ]]
+What is the capital of France?
+
+[[ ## history ## ]]
+{"messages": [{"question": "What is the capital of Germany?", "answer": "The capital of Germany is Berlin."}]}
+
+
+Assistant message:
+
+[[ ## answer ## ]]
+The capital of France is Paris.
+
+[[ ## completed ## ]]
+
+
+User message:
+
+[[ ## question ## ]]
+What is the capital of Germany?
+
+Respond with the corresponding output fields, starting with the field `[[ ## answer ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.
+
+
+Response:
+
+[[ ## answer ## ]]
+The capital of Germany is Berlin.
+
+[[ ## completed ## ]]
+```
+
+As you can see, the few-shot example does not expand the conversation history into multiple turns. Instead, it represents the history as JSON data within its section:
+
+```
+[[ ## history ## ]]
+{"messages": [{"question": "What is the capital of Germany?", "answer": "The capital of Germany is Berlin."}]}
+```
+
+This approach ensures compatibility with standard prompt formats while still providing the model with relevant conversational context.
+
diff --git a/docs/docs/tutorials/core_development/index.md b/docs/docs/tutorials/core_development/index.md
@@ -4,6 +4,9 @@ This section covers essential DSPy features and best practices for professional
 
 ## Integration and Tooling
 
+### [Managing Conversation History](../conversation_history/index.md)
+Learn how to manage conversation history in DSPy applications.
+
 ### [Use MCP in DSPy](../mcp/index.md)
 Learn to integrate Model Context Protocol (MCP) with DSPy applications. This tutorial shows how to leverage MCP for enhanced context management and more sophisticated AI interactions.
 
diff --git a/docs/docs/tutorials/index.md b/docs/docs/tutorials/index.md
@@ -33,6 +33,7 @@ Welcome to DSPy tutorials! We've organized our tutorials into three main categor
     - [Finetuning Agents](games/index.ipynb)
 
 - Tools, Development, and Deployment
+    - [Managing Conversation History](conversation_history/index.md)
     - [Use MCP in DSPy](mcp/index.md)
     - [Output Refinement](output_refinement/best-of-n-and-refine.md)
     - [Saving and Loading](saving/index.md)
diff --git a/docs/docs/tutorials/streaming/index.md b/docs/docs/tutorials/streaming/index.md
@@ -188,6 +188,67 @@ Final output:  Prediction(
 )
 ```
 
+### Streaming the Same Field Multiple Times (as in dspy.ReAct)
+
+By default, a `StreamListener` automatically closes itself after completing a single streaming session.
+This design helps prevent performance issues, since every token is broadcast to all configured stream listeners,
+and having too many active listeners can introduce significant overhead.
+
+However, in scenarios where a DSPy module is used repeatedly in a loop—such as with `dspy.ReAct` — you may want to stream
+the same field from each prediction, every time it is used. To enable this behavior, set allow_reuse=True when creating
+your `StreamListener`. See the example below:
+
+```python
+import asyncio
+
+import dspy
+
+lm = dspy.LM("openai/gpt-4o-mini", cache=False)
+dspy.settings.configure(lm=lm)
+
+
+def fetch_user_info(user_name: str):
+    """Get user information like name, birthday, etc."""
+    return {
+        "name": user_name,
+        "birthday": "2009-05-16",
+    }
+
+
+def get_sports_news(year: int):
+    """Get sports news for a given year."""
+    if year == 2009:
+        return "Usane Bolt broke the world record in the 100m race."
+    return None
+
+
+react = dspy.ReAct("question->answer", tools=[fetch_user_info, get_sports_news])
+
+stream_listeners = [
+    # dspy.ReAct has a built-in output field called "next_thought".
+    dspy.streaming.StreamListener(signature_field_name="next_thought", allow_reuse=True),
+]
+stream_react = dspy.streamify(react, stream_listeners=stream_listeners)
+
+
+async def read_output_stream():
+    output = stream_react(question="What sports news happened in the year Adam was born?")
+    return_value = None
+    async for chunk in output:
+        if isinstance(chunk, dspy.streaming.StreamResponse):
+            print(chunk)
+        elif isinstance(chunk, dspy.Prediction):
+            return_value = chunk
+    return return_value
+
+
+print(asyncio.run(read_output_stream()))
+```
+
+In this example, by setting `allow_reuse=True` in the StreamListener, you ensure that streaming for "next_thought" is
+available for every iteration, not just the first. When you run this code, you will see the streaming tokens for `next_thought`
+output each time the field is produced.
+
 #### Handling Duplicate Field Names
 
 When streaming fields with the same name from different modules, specify both the `predict` and `predict_name` in the `StreamListener`:
diff --git a/docs/mkdocs.yml b/docs/mkdocs.yml
@@ -51,6 +51,7 @@ nav:
             - RL for Multi-Hop Research: tutorials/rl_multihop/index.ipynb
         - Tools, Development, and Deployment:
             - Overview: tutorials/core_development/index.md
+            - Managing Conversation History: tutorials/conversation_history/index.md
             - Use MCP in DSPy: tutorials/mcp/index.md
             - Output Refinement: tutorials/output_refinement/best-of-n-and-refine.md
             - Saving and Loading: tutorials/saving/index.md
diff --git a/dspy/adapters/types/code.py b/dspy/adapters/types/code.py
@@ -1,7 +1,8 @@
 import re
-from typing import Any
+from typing import Any, ClassVar
 
 import pydantic
+from pydantic import create_model
 
 from dspy.adapters.types.base_type import Type
 
@@ -23,7 +24,7 @@ class CodeGeneration(dspy.Signature):
         '''Generate python code to answer the question.'''
 
         question: str = dspy.InputField(description="The question to answer")
-        code: dspy.Code = dspy.OutputField(description="The code to execute")
+        code: dspy.Code["java"] = dspy.OutputField(description="The code to execute")
 
 
     predict = dspy.Predict(CodeGeneration)
@@ -43,7 +44,7 @@ class CodeGeneration(dspy.Signature):
     class CodeAnalysis(dspy.Signature):
         '''Analyze the time complexity of the function.'''
 
-        code: dspy.Code = dspy.InputField(description="The function to analyze")
+        code: dspy.Code["python"] = dspy.InputField(description="The function to analyze")
         result: str = dspy.OutputField(description="The time complexity of the function")
 
 
@@ -64,6 +65,8 @@ def sleepsort(x):
 
     code: str
 
+    language: ClassVar[str] = "python"
+
     def format(self):
         return f"{self.code}"
 
@@ -76,7 +79,8 @@ def serialize_model(self):
     def description(cls) -> str:
         return (
             "Code represented in a string, specified in the `code` field. If this is an output field, the code "
-            "should follow the markdown code block format, e.g. \n```python\n{code}\n``` or \n```cpp\n{code}\n```."
+            "field should follow the markdown code block format, e.g. \n```python\n{code}\n``` or \n```cpp\n{code}\n```"
+            f"\nProgramming language: {cls.language}"
         )
 
     @pydantic.model_validator(mode="before")
@@ -115,3 +119,13 @@ def _filter_code(code: str) -> str:
         return match.group(1).strip()
     # Fallback case
     return code
+
+
+# Patch __class_getitem__ directly on the class to support dspy.Code["python"] syntax
+def _code_class_getitem(cls, language):
+    code_with_language_cls = create_model(f"{cls.__name__}_{language}", __base__=cls)
+    code_with_language_cls.language = language
+    return code_with_language_cls
+
+
+Code.__class_getitem__ = classmethod(_code_class_getitem)
diff --git a/dspy/adapters/utils.py b/dspy/adapters/utils.py
@@ -171,7 +171,14 @@ def parse_value(value, annotation):
 
     try:
         return TypeAdapter(annotation).validate_python(candidate)
-    except pydantic.ValidationError:
+    except pydantic.ValidationError as e:
+        if issubclass(annotation, Type):
+            try:
+                # For dspy.Type, try parsing from the original value in case it has a custom parser
+                return TypeAdapter(annotation).validate_python(value)
+            except Exception:
+                raise e
+
         if origin is Union and type(None) in get_args(annotation) and str in get_args(annotation):
             return str(candidate)
         raise
diff --git a/tests/adapters/test_code.py b/tests/adapters/test_code.py