Optimize fixed output system prompts #32

aichy126 · 2025-10-27T16:00:35Z

Summary by CodeRabbit

Breaking Changes
- Core processing and LLM provider APIs are now synchronous (no await), including streaming behavior.
Documentation
- Usage examples and docstrings updated to reflect synchronous APIs.
Chores
- Package version bumped to 0.2.19.
- Output instruction markers and guidance format replaced with new tag-based structure.
Behavior
- Output-instruction processing now returns content plus a flag instead of inserting an explanatory prefix.

… processing logic

…nous processing

gemini-code-assist · 2025-10-27T16:00:59Z

Summary of Changes

Hello @aichy126, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request streamlines the markdown_flow library's handling of fixed output system prompts and internal processing. Key changes include a shift from asynchronous to synchronous execution for core processing methods, the adoption of new XML-like markers for content preservation, and a comprehensive update to the system prompt explanation for these markers. This ensures more precise control over LLM output, allowing content to be strictly preserved or translated as needed, while also improving the efficiency of prompt construction by conditionally including detailed instructions.

Highlights

Synchronous Processing Shift: Core process methods and related internal functions across markdown_flow/core.py and markdown_flow/llm.py have been refactored from async to synchronous, changing AsyncGenerator to Generator where applicable.
Updated Output Instruction Markers: The OUTPUT_INSTRUCTION_PREFIX and OUTPUT_INSTRUCTION_SUFFIX in markdown_flow/constants.py have been updated from [输出] to XML-like tags (<preserve_or_translate>).
Enhanced Output Instruction Explanation: The system prompt explanation for handling preserved content has been significantly expanded and restructured in markdown_flow/constants.py, providing clearer rules and examples for LLMs.
Conditional System Prompt Injection: The OUTPUT_INSTRUCTION_EXPLANATION is now dynamically added to the system prompt in markdown_flow/core.py only when content requiring preservation is detected, optimizing prompt length.
Version Bump: The package version has been updated to 0.2.18 in markdown_flow/__init__.py.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2025-10-27T16:01:03Z

Walkthrough

The PR converts the library from asynchronous to synchronous processing across core, LLM provider, and utilities, changes output instruction markers to XML-style tags, updates preserved-content handling, and bumps the package version to 0.2.19.

Changes

Cohort / File(s)	Summary
Version & Module Metadata `markdown_flow/__init__.py`	Bumped `__version__` from `0.2.5` to `0.2.19` and updated docstring examples to use synchronous `process()` calls (removed `await`).
Instruction Markers & Constants `markdown_flow/constants.py`	Replaced `OUTPUT_INSTRUCTION_PREFIX`/`SUFFIX` values from `"[输出]"`/`"[/输出]"` to `"<preserve_or_translate>"`/`"</preserve_or_translate>"`; replaced the old explanatory block with a new `<preserve_or_translate_instruction>` structured guidance block.
Core Processing (Async → Sync) `markdown_flow/core.py`	Converted all async APIs and internal helpers to synchronous equivalents (e.g., `process`, `_process_content`, `_process_interaction_*`, validation and error renderers). Replaced `AsyncGenerator` with `Generator` typing, removed `await` calls to provider methods, converted stream handling to normal nested generators, and added `has_preserved_content` logic to influence prompt construction.
LLM Provider Interface (Async → Sync) `markdown_flow/llm.py`	Changed `LLMProvider` and `NoLLMProvider` method signatures from `async def complete/stream` to synchronous `def complete/stream`; stream now yields synchronously.
Utility Functions `markdown_flow/utils.py`	Removed `OUTPUT_INSTRUCTION_EXPLANATION` import usage; changed `process_output_instructions(content: str)` to return `(processed_content: str, has_output_instruction: bool)` and stopped injecting the explanation prefix into content.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant MarkdownFlow
    participant LLMProvider

    Caller->>MarkdownFlow: process(block_index, mode, ...)
    Note right of MarkdownFlow: build messages (sync)\ndetect has_preserved_content
    MarkdownFlow->>LLMProvider: complete(messages)
    LLMProvider-->>MarkdownFlow: response (string)
    MarkdownFlow-->>Caller: LLMResult (or generator of LLMResult)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Areas to pay extra attention:
- markdown_flow/core.py: ensure all previously async control flows are correctly converted to synchronous Generator semantics and that stream generators yield expected chunks.
- Call sites for process_output_instructions() across the repo to ensure they now handle the (str, bool) return.
- markdown_flow/llm.py: verify concrete providers comply with the new synchronous stream() contract.

Possibly related PRs

fix: restore to the correct version #27 — modifies markdown_flow/__init__.py version declaration; related to version bumps in this PR.
feat!: switch multiline preserved fences to !=== #26 — touches preserved-content handling and constants; strongly related to the constants and preserved-content changes here.

Poem

🐰 I hopped from async into sync today,
Messages now flow without delay.
XML tags snug in their place,
Generators yield with steady pace.
Hop—our little flow hops on its way ✨

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Title Check	⚠️ Warning	The pull request title "Optimize fixed output system prompts" refers to a real aspect of the changeset—the updates to output instruction constants and the preserve_or_translate instruction block in markdown_flow/constants.py and markdown_flow/utils.py. However, the dominant architectural change in this pull request is the conversion from asynchronous to synchronous processing throughout the entire codebase, affecting core.py, llm.py, and init.py. The title does not capture this primary refactoring and instead emphasizes a secondary supporting change, which misrepresents the scope and nature of the PR when viewed in commit history or during code review.
Docstring Coverage	⚠️ Warning	Docstring coverage is 65.22% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch aichy/fixed-output-251027

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a1e8eaa and ca16559.

📒 Files selected for processing (2)

markdown_flow/__init__.py (2 hunks)
markdown_flow/constants.py (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

markdown_flow/init.py

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: All code (comments, identifiers, logs, error messages) must be written in English
Do not hardcode API keys or secrets in code; use environment variables or config
Prefer and maintain type hints; use MyPy for type checking
Use Ruff for linting (ruff check --fix) and formatting (ruff format)
Python modules should use snake_case for filenames
Maintain Python 3.10+ compatibility

Files:

markdown_flow/constants.py

markdown_flow/constants.py

📄 CodeRabbit inference engine (AGENTS.md)

Define and use pre-compiled regex patterns in constants.py

Files:

markdown_flow/constants.py

🪛 Ruff (0.14.1)

markdown_flow/constants.py

96-96: String contains ambiguous ， (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF001)

101-101: String contains ambiguous ， (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF001)

109-109: String contains ambiguous ， (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF001)

111-111: String contains ambiguous ， (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF001)

124-124: String contains ambiguous ， (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF001)

126-126: String contains ambiguous ， (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF001)

149-149: String contains ambiguous ， (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF001)

153-153: String contains ambiguous ｜ (FULLWIDTH VERTICAL LINE). Did you mean | (VERTICAL LINE)?

(RUF001)

154-154: String contains ambiguous ， (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF001)

🔇 Additional comments (2)

markdown_flow/constants.py (2)

93-157: Static analysis warnings are false positives for Chinese punctuation.

The Ruff warnings about "ambiguous" fullwidth commas and vertical lines are false positives. These are correct Chinese punctuation marks (，and ｜) used intentionally in Chinese text content. The fullwidth characters are standard for Chinese writing and should not be changed to ASCII equivalents.

50-51: XML-style marker migration complete and verified.

Old bracket-style markers [输出] and [/输出] have been completely removed. New markers are consistently defined and used:

OUTPUT_INSTRUCTION_PREFIX and OUTPUT_INSTRUCTION_SUFFIX (<preserve_or_translate> tags) wrap content in utils.py (lines 597, 632, 634)

OUTPUT_INSTRUCTION_EXPLANATION is wrapped in the outer <preserve_or_translate_instruction> tag for metadata separation

No old references remain in the codebase

The XML-style tags are self-documenting and the dual-tag design (inner content markers + outer explanation wrapper) is architecturally sound.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request focuses on optimizing the fixed output system prompts within the markdown-flow library. The primary changes involve replacing the original output instruction markers with <preserve_or_translate> tags and updating the process_output_instructions function to return a tuple indicating whether preserved content exists. Additionally, the process methods in core.py have been refactored to be synchronous rather than asynchronous, and the LLM provider interface has been updated accordingly. The version number in __init__.py has also been updated.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

markdown_flow/llm.py (1)

41-68: Critical: Removal of async/await violates coding guidelines.

The conversion of complete and stream methods from async to synchronous directly contradicts the established coding guidelines, which explicitly state: "Use async/await for LLM calls and I/O to avoid blocking."

LLM API calls are I/O-bound operations that benefit significantly from async/await patterns. Removing async will:

Block the calling thread during network I/O

Prevent efficient concurrent processing

Degrade performance in multi-request scenarios

Make it impossible to use connection pooling effectively

This architectural change needs careful reconsideration. If synchronous operation is required for specific use cases, consider maintaining both async and sync interfaces, or wrapping async calls appropriately.

As per coding guidelines and learnings.

🧹 Nitpick comments (1)

markdown_flow/core.py (1)

230-230: Consider removing unused method arguments.

Static analysis identified unused context parameters at lines 230 and 349, and an unused block_index parameter at line 557. If these parameters are not needed for the current implementation and are not required for interface consistency, consider removing them to improve code clarity.

Also applies to: 349-349, 557-557

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 59a564d and a1e8eaa.

📒 Files selected for processing (5)

markdown_flow/__init__.py (2 hunks)
markdown_flow/constants.py (2 hunks)
markdown_flow/core.py (25 hunks)
markdown_flow/llm.py (3 hunks)
markdown_flow/utils.py (3 hunks)

🧰 Additional context used

📓 Path-based instructions (8)

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: All code (comments, identifiers, logs, error messages) must be written in English
Do not hardcode API keys or secrets in code; use environment variables or config
Prefer and maintain type hints; use MyPy for type checking
Use Ruff for linting (ruff check --fix) and formatting (ruff format)
Python modules should use snake_case for filenames
Maintain Python 3.10+ compatibility

Files:

markdown_flow/constants.py
markdown_flow/utils.py
markdown_flow/core.py
markdown_flow/__init__.py
markdown_flow/llm.py

markdown_flow/constants.py

📄 CodeRabbit inference engine (AGENTS.md)

Define and use pre-compiled regex patterns in constants.py

Files:

markdown_flow/constants.py

markdown_flow/{core,enums,exceptions,llm,models,utils}.py

📄 CodeRabbit inference engine (AGENTS.md)

Do not compile regex inline in modules; import compiled patterns from constants.py

Files:

markdown_flow/utils.py
markdown_flow/core.py
markdown_flow/llm.py

markdown_flow/{llm,core}.py

📄 CodeRabbit inference engine (AGENTS.md)

Use async/await for LLM calls and I/O to avoid blocking

Files:

markdown_flow/core.py
markdown_flow/llm.py

markdown_flow/core.py

📄 CodeRabbit inference engine (AGENTS.md)

markdown_flow/core.py: Reuse a single LLM provider instance across requests to enable connection reuse
Prefer streaming responses for large documents where possible
Cache parsed blocks and variable extractions (e.g., with lru_cache) to avoid recomputation

Files:

markdown_flow/core.py

markdown_flow/{core,llm}.py

📄 CodeRabbit inference engine (AGENTS.md)

Minimize prompt tokens while maintaining required context

Files:

markdown_flow/core.py
markdown_flow/llm.py

markdown_flow/__init__.py

📄 CodeRabbit inference engine (AGENTS.md)

Update version number in markdown_flow/init.py for releases

Files:

markdown_flow/__init__.py

markdown_flow/llm.py

📄 CodeRabbit inference engine (AGENTS.md)

Implement retry logic with exponential backoff for LLM calls

Files:

markdown_flow/llm.py

🧠 Learnings (2)

📚 Learning: 2025-09-20T17:36:43.544Z

Learnt from: CR
PR: ai-shifu/markdown-flow-agent-py#0
File: AGENTS.md:0-0
Timestamp: 2025-09-20T17:36:43.544Z
Learning: Applies to markdown_flow/{llm,core}.py : Use async/await for LLM calls and I/O to avoid blocking

Applied to files:

markdown_flow/core.py
markdown_flow/llm.py

📚 Learning: 2025-09-20T17:36:43.544Z

Learnt from: CR
PR: ai-shifu/markdown-flow-agent-py#0
File: AGENTS.md:0-0
Timestamp: 2025-09-20T17:36:43.544Z
Learning: Applies to markdown_flow/__init__.py : Update version number in markdown_flow/__init__.py for releases

Applied to files:

markdown_flow/__init__.py

🧬 Code graph analysis (1)

markdown_flow/core.py (3)

markdown_flow/llm.py (6)

ProcessMode (15-20)

complete (41-53)

complete (74-75)

LLMResult (24-34)

stream (56-68)

stream (77-78)

markdown_flow/enums.py (1)

BlockType (21-30)

markdown_flow/utils.py (3)

InteractionType (169-177)

process_output_instructions (561-655)

replace_variables_in_text (744-796)

🪛 Ruff (0.14.1)

markdown_flow/constants.py

105-105: String contains ambiguous ， (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF001)

107-107: String contains ambiguous ， (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF001)

127-127: String contains ambiguous ， (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF001)

markdown_flow/core.py

230-230: Unused method argument: context

(ARG002)

349-349: Unused method argument: context

(ARG002)

557-557: Unused method argument: block_index

(ARG002)

🔇 Additional comments (6)

markdown_flow/utils.py (1)

561-655: LGTM! Breaking change is well-documented.

The function signature change to return a tuple is a breaking change, but it's clearly documented in the docstring. The implementation correctly tracks whether output instructions were found and returns both the processed content and the flag.

markdown_flow/__init__.py (2)

35-40: Documentation correctly updated for synchronous API.

The usage examples have been properly updated to reflect the synchronous API by removing await keywords.

86-86: Verify the large version jump.

The version jumped from 0.2.5 to 0.2.18 (13 versions). This is a significant jump for a single PR. Please confirm this is the intended version number.

Based on coding guidelines.

markdown_flow/constants.py (2)

50-51: LGTM! Instruction markers updated to XML format.

The change from Chinese markers [输出] to XML-style <preserve_or_translate> tags improves clarity and follows a more standardized format.

93-131: LGTM! Improved instruction structure.

The new OUTPUT_INSTRUCTION_EXPLANATION provides a much more structured and detailed explanation using XML-style tags. The fullwidth commas flagged by Ruff are intentional as they appear in Chinese text content.

markdown_flow/core.py (1)

653-672: LGTM! Conditional instruction explanation logic is well-implemented.

The logic to conditionally append OUTPUT_INSTRUCTION_EXPLANATION only when preserved content is detected is a good optimization. This avoids adding unnecessary instructions when they're not needed.

The implementation correctly:

Unpacks the tuple from process_output_instructions

Checks the has_preserved_content flag

Handles both cases: with and without document_prompt

markdown_flow/core.py

… that the content must be output as it is

sonarqubecloud · 2025-10-27T16:36:40Z

Quality Gate failed

Failed conditions
4.4% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

aichy126 added 4 commits October 27, 2025 22:40

feat: update the output instruction markup to XML format and optimize…

96461ba

… processing logic

docs: update system prompt words for fixed output

329ff9e

feat: remove asynchronous support and unify the interface for synchro…

0c9ec4b

…nous processing

build: updated version number to 0.2.18

a1e8eaa

gemini-code-assist bot reviewed Oct 27, 2025

View reviewed changes

coderabbitai bot reviewed Oct 27, 2025

View reviewed changes

markdown_flow/core.py Show resolved Hide resolved

feat: update the output instruction processing rules to make it clear…

ca16559

… that the content must be output as it is

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Optimize fixed output system prompts #32

Optimize fixed output system prompts #32

aichy126 commented Oct 27, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

gemini-code-assist bot commented Oct 27, 2025

Uh oh!

coderabbitai bot commented Oct 27, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

sonarqubecloud bot commented Oct 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Optimize fixed output system prompts #32

Are you sure you want to change the base?

Optimize fixed output system prompts #32

Conversation

aichy126 commented Oct 27, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

gemini-code-assist bot commented Oct 27, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks and finishing touches

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sonarqubecloud bot commented Oct 27, 2025

Quality Gate failed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aichy126 commented Oct 27, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 27, 2025 •

edited

Loading