Skip to content

Conversation

@aichy126
Copy link
Contributor

@aichy126 aichy126 commented Oct 27, 2025

Summary by CodeRabbit

  • Breaking Changes

    • Core processing and LLM provider APIs are now synchronous (no await), including streaming behavior.
  • Documentation

    • Usage examples and docstrings updated to reflect synchronous APIs.
  • Chores

    • Package version bumped to 0.2.19.
    • Output instruction markers and guidance format replaced with new tag-based structure.
  • Behavior

    • Output-instruction processing now returns content plus a flag instead of inserting an explanatory prefix.

@gemini-code-assist
Copy link

Summary of Changes

Hello @aichy126, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request streamlines the markdown_flow library's handling of fixed output system prompts and internal processing. Key changes include a shift from asynchronous to synchronous execution for core processing methods, the adoption of new XML-like markers for content preservation, and a comprehensive update to the system prompt explanation for these markers. This ensures more precise control over LLM output, allowing content to be strictly preserved or translated as needed, while also improving the efficiency of prompt construction by conditionally including detailed instructions.

Highlights

  • Synchronous Processing Shift: Core process methods and related internal functions across markdown_flow/core.py and markdown_flow/llm.py have been refactored from async to synchronous, changing AsyncGenerator to Generator where applicable.
  • Updated Output Instruction Markers: The OUTPUT_INSTRUCTION_PREFIX and OUTPUT_INSTRUCTION_SUFFIX in markdown_flow/constants.py have been updated from [输出] to XML-like tags (<preserve_or_translate>).
  • Enhanced Output Instruction Explanation: The system prompt explanation for handling preserved content has been significantly expanded and restructured in markdown_flow/constants.py, providing clearer rules and examples for LLMs.
  • Conditional System Prompt Injection: The OUTPUT_INSTRUCTION_EXPLANATION is now dynamically added to the system prompt in markdown_flow/core.py only when content requiring preservation is detected, optimizing prompt length.
  • Version Bump: The package version has been updated to 0.2.18 in markdown_flow/__init__.py.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link

coderabbitai bot commented Oct 27, 2025

Walkthrough

The PR converts the library from asynchronous to synchronous processing across core, LLM provider, and utilities, changes output instruction markers to XML-style tags, updates preserved-content handling, and bumps the package version to 0.2.19.

Changes

Cohort / File(s) Summary
Version & Module Metadata
markdown_flow/__init__.py
Bumped __version__ from 0.2.5 to 0.2.19 and updated docstring examples to use synchronous process() calls (removed await).
Instruction Markers & Constants
markdown_flow/constants.py
Replaced OUTPUT_INSTRUCTION_PREFIX/SUFFIX values from "[输出]"/"[/输出]" to "<preserve_or_translate>"/"</preserve_or_translate>"; replaced the old explanatory block with a new <preserve_or_translate_instruction> structured guidance block.
Core Processing (Async → Sync)
markdown_flow/core.py
Converted all async APIs and internal helpers to synchronous equivalents (e.g., process, _process_content, _process_interaction_*, validation and error renderers). Replaced AsyncGenerator with Generator typing, removed await calls to provider methods, converted stream handling to normal nested generators, and added has_preserved_content logic to influence prompt construction.
LLM Provider Interface (Async → Sync)
markdown_flow/llm.py
Changed LLMProvider and NoLLMProvider method signatures from async def complete/stream to synchronous def complete/stream; stream now yields synchronously.
Utility Functions
markdown_flow/utils.py
Removed OUTPUT_INSTRUCTION_EXPLANATION import usage; changed process_output_instructions(content: str) to return (processed_content: str, has_output_instruction: bool) and stopped injecting the explanation prefix into content.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant MarkdownFlow
    participant LLMProvider

    Caller->>MarkdownFlow: process(block_index, mode, ...)
    Note right of MarkdownFlow: build messages (sync)\ndetect has_preserved_content
    MarkdownFlow->>LLMProvider: complete(messages)
    LLMProvider-->>MarkdownFlow: response (string)
    MarkdownFlow-->>Caller: LLMResult (or generator of LLMResult)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Areas to pay extra attention:
    • markdown_flow/core.py: ensure all previously async control flows are correctly converted to synchronous Generator semantics and that stream generators yield expected chunks.
    • Call sites for process_output_instructions() across the repo to ensure they now handle the (str, bool) return.
    • markdown_flow/llm.py: verify concrete providers comply with the new synchronous stream() contract.

Possibly related PRs

Poem

🐰 I hopped from async into sync today,
Messages now flow without delay.
XML tags snug in their place,
Generators yield with steady pace.
Hop—our little flow hops on its way ✨

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Title Check ⚠️ Warning The pull request title "Optimize fixed output system prompts" refers to a real aspect of the changeset—the updates to output instruction constants and the preserve_or_translate instruction block in markdown_flow/constants.py and markdown_flow/utils.py. However, the dominant architectural change in this pull request is the conversion from asynchronous to synchronous processing throughout the entire codebase, affecting core.py, llm.py, and init.py. The title does not capture this primary refactoring and instead emphasizes a secondary supporting change, which misrepresents the scope and nature of the PR when viewed in commit history or during code review.
Docstring Coverage ⚠️ Warning Docstring coverage is 65.22% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch aichy/fixed-output-251027

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a1e8eaa and ca16559.

📒 Files selected for processing (2)
  • markdown_flow/__init__.py (2 hunks)
  • markdown_flow/constants.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • markdown_flow/init.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: All code (comments, identifiers, logs, error messages) must be written in English
Do not hardcode API keys or secrets in code; use environment variables or config
Prefer and maintain type hints; use MyPy for type checking
Use Ruff for linting (ruff check --fix) and formatting (ruff format)
Python modules should use snake_case for filenames
Maintain Python 3.10+ compatibility

Files:

  • markdown_flow/constants.py
markdown_flow/constants.py

📄 CodeRabbit inference engine (AGENTS.md)

Define and use pre-compiled regex patterns in constants.py

Files:

  • markdown_flow/constants.py
🪛 Ruff (0.14.1)
markdown_flow/constants.py

96-96: String contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF001)


101-101: String contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF001)


109-109: String contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF001)


111-111: String contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF001)


124-124: String contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF001)


126-126: String contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF001)


149-149: String contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF001)


153-153: String contains ambiguous (FULLWIDTH VERTICAL LINE). Did you mean | (VERTICAL LINE)?

(RUF001)


154-154: String contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF001)

🔇 Additional comments (2)
markdown_flow/constants.py (2)

93-157: Static analysis warnings are false positives for Chinese punctuation.

The Ruff warnings about "ambiguous" fullwidth commas and vertical lines are false positives. These are correct Chinese punctuation marks (,and |) used intentionally in Chinese text content. The fullwidth characters are standard for Chinese writing and should not be changed to ASCII equivalents.


50-51: XML-style marker migration complete and verified.

Old bracket-style markers [输出] and [/输出] have been completely removed. New markers are consistently defined and used:

  • OUTPUT_INSTRUCTION_PREFIX and OUTPUT_INSTRUCTION_SUFFIX (<preserve_or_translate> tags) wrap content in utils.py (lines 597, 632, 634)
  • OUTPUT_INSTRUCTION_EXPLANATION is wrapped in the outer <preserve_or_translate_instruction> tag for metadata separation
  • No old references remain in the codebase

The XML-style tags are self-documenting and the dual-tag design (inner content markers + outer explanation wrapper) is architecturally sound.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request focuses on optimizing the fixed output system prompts within the markdown-flow library. The primary changes involve replacing the original output instruction markers with <preserve_or_translate> tags and updating the process_output_instructions function to return a tuple indicating whether preserved content exists. Additionally, the process methods in core.py have been refactored to be synchronous rather than asynchronous, and the LLM provider interface has been updated accordingly. The version number in __init__.py has also been updated.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
markdown_flow/llm.py (1)

41-68: Critical: Removal of async/await violates coding guidelines.

The conversion of complete and stream methods from async to synchronous directly contradicts the established coding guidelines, which explicitly state: "Use async/await for LLM calls and I/O to avoid blocking."

LLM API calls are I/O-bound operations that benefit significantly from async/await patterns. Removing async will:

  1. Block the calling thread during network I/O
  2. Prevent efficient concurrent processing
  3. Degrade performance in multi-request scenarios
  4. Make it impossible to use connection pooling effectively

This architectural change needs careful reconsideration. If synchronous operation is required for specific use cases, consider maintaining both async and sync interfaces, or wrapping async calls appropriately.

As per coding guidelines and learnings.

🧹 Nitpick comments (1)
markdown_flow/core.py (1)

230-230: Consider removing unused method arguments.

Static analysis identified unused context parameters at lines 230 and 349, and an unused block_index parameter at line 557. If these parameters are not needed for the current implementation and are not required for interface consistency, consider removing them to improve code clarity.

Also applies to: 349-349, 557-557

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 59a564d and a1e8eaa.

📒 Files selected for processing (5)
  • markdown_flow/__init__.py (2 hunks)
  • markdown_flow/constants.py (2 hunks)
  • markdown_flow/core.py (25 hunks)
  • markdown_flow/llm.py (3 hunks)
  • markdown_flow/utils.py (3 hunks)
🧰 Additional context used
📓 Path-based instructions (8)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: All code (comments, identifiers, logs, error messages) must be written in English
Do not hardcode API keys or secrets in code; use environment variables or config
Prefer and maintain type hints; use MyPy for type checking
Use Ruff for linting (ruff check --fix) and formatting (ruff format)
Python modules should use snake_case for filenames
Maintain Python 3.10+ compatibility

Files:

  • markdown_flow/constants.py
  • markdown_flow/utils.py
  • markdown_flow/core.py
  • markdown_flow/__init__.py
  • markdown_flow/llm.py
markdown_flow/constants.py

📄 CodeRabbit inference engine (AGENTS.md)

Define and use pre-compiled regex patterns in constants.py

Files:

  • markdown_flow/constants.py
markdown_flow/{core,enums,exceptions,llm,models,utils}.py

📄 CodeRabbit inference engine (AGENTS.md)

Do not compile regex inline in modules; import compiled patterns from constants.py

Files:

  • markdown_flow/utils.py
  • markdown_flow/core.py
  • markdown_flow/llm.py
markdown_flow/{llm,core}.py

📄 CodeRabbit inference engine (AGENTS.md)

Use async/await for LLM calls and I/O to avoid blocking

Files:

  • markdown_flow/core.py
  • markdown_flow/llm.py
markdown_flow/core.py

📄 CodeRabbit inference engine (AGENTS.md)

markdown_flow/core.py: Reuse a single LLM provider instance across requests to enable connection reuse
Prefer streaming responses for large documents where possible
Cache parsed blocks and variable extractions (e.g., with lru_cache) to avoid recomputation

Files:

  • markdown_flow/core.py
markdown_flow/{core,llm}.py

📄 CodeRabbit inference engine (AGENTS.md)

Minimize prompt tokens while maintaining required context

Files:

  • markdown_flow/core.py
  • markdown_flow/llm.py
markdown_flow/__init__.py

📄 CodeRabbit inference engine (AGENTS.md)

Update version number in markdown_flow/init.py for releases

Files:

  • markdown_flow/__init__.py
markdown_flow/llm.py

📄 CodeRabbit inference engine (AGENTS.md)

Implement retry logic with exponential backoff for LLM calls

Files:

  • markdown_flow/llm.py
🧠 Learnings (2)
📚 Learning: 2025-09-20T17:36:43.544Z
Learnt from: CR
PR: ai-shifu/markdown-flow-agent-py#0
File: AGENTS.md:0-0
Timestamp: 2025-09-20T17:36:43.544Z
Learning: Applies to markdown_flow/{llm,core}.py : Use async/await for LLM calls and I/O to avoid blocking

Applied to files:

  • markdown_flow/core.py
  • markdown_flow/llm.py
📚 Learning: 2025-09-20T17:36:43.544Z
Learnt from: CR
PR: ai-shifu/markdown-flow-agent-py#0
File: AGENTS.md:0-0
Timestamp: 2025-09-20T17:36:43.544Z
Learning: Applies to markdown_flow/__init__.py : Update version number in markdown_flow/__init__.py for releases

Applied to files:

  • markdown_flow/__init__.py
🧬 Code graph analysis (1)
markdown_flow/core.py (3)
markdown_flow/llm.py (6)
  • ProcessMode (15-20)
  • complete (41-53)
  • complete (74-75)
  • LLMResult (24-34)
  • stream (56-68)
  • stream (77-78)
markdown_flow/enums.py (1)
  • BlockType (21-30)
markdown_flow/utils.py (3)
  • InteractionType (169-177)
  • process_output_instructions (561-655)
  • replace_variables_in_text (744-796)
🪛 Ruff (0.14.1)
markdown_flow/constants.py

105-105: String contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF001)


107-107: String contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF001)


127-127: String contains ambiguous (FULLWIDTH COMMA). Did you mean , (COMMA)?

(RUF001)

markdown_flow/core.py

230-230: Unused method argument: context

(ARG002)


349-349: Unused method argument: context

(ARG002)


557-557: Unused method argument: block_index

(ARG002)

🔇 Additional comments (6)
markdown_flow/utils.py (1)

561-655: LGTM! Breaking change is well-documented.

The function signature change to return a tuple is a breaking change, but it's clearly documented in the docstring. The implementation correctly tracks whether output instructions were found and returns both the processed content and the flag.

markdown_flow/__init__.py (2)

35-40: Documentation correctly updated for synchronous API.

The usage examples have been properly updated to reflect the synchronous API by removing await keywords.


86-86: Verify the large version jump.

The version jumped from 0.2.5 to 0.2.18 (13 versions). This is a significant jump for a single PR. Please confirm this is the intended version number.

Based on coding guidelines.

markdown_flow/constants.py (2)

50-51: LGTM! Instruction markers updated to XML format.

The change from Chinese markers [输出] to XML-style <preserve_or_translate> tags improves clarity and follows a more standardized format.


93-131: LGTM! Improved instruction structure.

The new OUTPUT_INSTRUCTION_EXPLANATION provides a much more structured and detailed explanation using XML-style tags. The fullwidth commas flagged by Ruff are intentional as they appear in Chinese text content.

markdown_flow/core.py (1)

653-672: LGTM! Conditional instruction explanation logic is well-implemented.

The logic to conditionally append OUTPUT_INSTRUCTION_EXPLANATION only when preserved content is detected is a good optimization. This avoids adding unnecessary instructions when they're not needed.

The implementation correctly:

  • Unpacks the tuple from process_output_instructions
  • Checks the has_preserved_content flag
  • Handles both cases: with and without document_prompt

@sonarqubecloud
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
4.4% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant