Skip to content

fix: Add JSON instruction to default text tagging prompt before content insertion#2356

Closed
ElectricTea wants to merge 1 commit intokarakeep-app:mainfrom
ElectricTea:patch-3
Closed

fix: Add JSON instruction to default text tagging prompt before content insertion#2356
ElectricTea wants to merge 1 commit intokarakeep-app:mainfrom
ElectricTea:patch-3

Conversation

@ElectricTea
Copy link
Contributor

Added the prompt instruction

- You must respond in valid JSON with the key "tags" and the value is list of tags. Don't wrap the response in a markdown code.

to the default text tagging instructions before the content insertion. This change significantly improves the success rate of a response containing structured JSON when a prompt is truncated by the LLM due to the prompt exceeding a maximum token limit.

I kept the original JSON instruction at the end of the prompt because it "reminds" the LLM to use JSON structure after the content insertion, and it causes no issues.

…nt insertion

Added the prompt instruction
```
- You must respond in valid JSON with the key "tags" and the value is list of tags. Don't wrap the response in a markdown code.
```
to the default text tagging instructions before the content insertion. This change significantly improves the success rate of a response containing structured JSON when a prompt is truncated by the LLM due to the prompt exceeding a maximum token limit.

I kept the original JSON instruction at the end of the prompt because it "reminds" the LLM to use JSON structure after the content insertion, and it causes no issues.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 6, 2026

Walkthrough

A duplicate instruction line was added to the buildImagePrompt function in packages/shared/prompts.ts, requiring responses to be valid JSON with a "tags" key containing a list of tags. No control flow or logic changes were introduced.

Changes

Cohort / File(s) Summary
Prompt Instruction Enhancement
packages/shared/prompts.ts
Added duplicate JSON formatting directive to buildImagePrompt requiring "tags" key with list value in response

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: adding a JSON instruction to a prompt before content insertion.
Description check ✅ Passed The description is directly related to the changeset, explaining the added JSON instruction and its purpose for improving structured JSON responses.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link

greptile-apps bot commented Jan 6, 2026

Greptile Summary

Added JSON format instruction before content insertion in buildImagePrompt() to improve robustness when prompts are truncated due to token limits. The change places the JSON instruction at line 61, before the tag style and custom prompts, while keeping the reminder at line 64. This ensures LLMs receive the JSON format requirement even when the prompt is cut off, significantly improving structured response success rates.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The change is a simple, non-breaking improvement to prompt engineering that addresses a real issue (truncated prompts) without modifying any logic, types, or interfaces. The duplicate instruction is intentional and beneficial.
  • No files require special attention

Important Files Changed

Filename Overview
packages/shared/prompts.ts Added JSON instruction before content insertion in image tagging prompt to improve success rate when prompts are truncated by token limits

Sequence Diagram

sequenceDiagram
    participant Worker as Tagging Worker
    participant Prompt as buildImagePrompt()
    participant LLM as LLM Service
    participant Parser as parseJsonFromLLMResponse()
    
    Worker->>Prompt: Request image tagging prompt
    Note over Prompt: Constructs prompt with rules<br/>JSON instruction at line 61<br/>tagStyleInstruction<br/>customPrompts<br/>JSON instruction reminder at line 64
    Prompt-->>Worker: Return complete prompt
    
    Worker->>LLM: Send prompt + image
    alt Token limit exceeded
        Note over LLM: Truncates prompt from end<br/>Early JSON instruction survives
        LLM-->>Worker: Valid JSON response
    else Normal processing
        Note over LLM: Processes full prompt<br/>Both JSON instructions present
        LLM-->>Worker: Valid JSON response
    end
    
    Worker->>Parser: Parse LLM response
    alt Valid JSON
        Parser-->>Worker: Parsed tags object
    else Invalid format
        Note over Parser: Attempts extraction from<br/>markdown or text
        Parser-->>Worker: Parsed tags or error
    end
    
    Worker->>Worker: Connect tags to bookmark
Loading

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI Agents
In @packages/shared/prompts.ts:
- Line 61: The duplicate JSON instruction was added to buildImagePrompt but
truncation happens in buildTextPrompt/constructTextTaggingPrompt; move or add
the duplicated instruction into constructTextTaggingPrompt immediately before
the <TEXT_CONTENT> placeholder so the JSON instruction is present prior to
content truncation performed by buildTextPrompt; keep buildImagePrompt unchanged
and ensure only one JSON instruction remains at the end of
constructTextTaggingPrompt (and/or a duplicate immediately before
<TEXT_CONTENT>) so it survives token truncation.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aa7a81e and 84106e1.

📒 Files selected for processing (1)
  • packages/shared/prompts.ts
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

Use TypeScript for type safety in all source files

Files:

  • packages/shared/prompts.ts
**/*.{ts,tsx,js,jsx,json,css,md}

📄 CodeRabbit inference engine (AGENTS.md)

Format code using Prettier according to project standards

Files:

  • packages/shared/prompts.ts
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (AGENTS.md)

Lint code using oxlint and fix issues with pnpm lint:fix

Files:

  • packages/shared/prompts.ts
packages/shared/**/*.{ts,tsx}

📄 CodeRabbit inference engine (AGENTS.md)

Organize shared code and types in the packages/shared directory for use across packages

Files:

  • packages/shared/prompts.ts
🧠 Learnings (1)
📚 Learning: 2026-01-03T11:36:34.916Z
Learnt from: RobertRosca
Repo: karakeep-app/karakeep PR: 2339
File: packages/shared/config.ts:62-62
Timestamp: 2026-01-03T11:36:34.916Z
Learning: In packages/shared/config.ts, enforce OpenAI SDK version compatibility: service_tier values are limited to ["auto", "default", "flex"]. The "priority" tier requires OpenAI SDK >= v5.8.0. Add a guard or validation in config to prevent using priority tier unless the SDK is upgraded (v5.8.0+). Consider documenting this constraint and adding a unit test or lint rule to ensure only allowed service_tier values are used based on the installed SDK version.

Applied to files:

  • packages/shared/prompts.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review

- If the tag is not generic enough, don't include it.
- Aim for 10-15 tags.
- If there are no good tags, don't emit any.
- You must respond in valid JSON with the key "tags" and the value is list of tags. Don't wrap the response in a markdown code.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

wc -l packages/shared/prompts.ts

Repository: karakeep-app/karakeep

Length of output: 95


🏁 Script executed:

cat -n packages/shared/prompts.ts

Repository: karakeep-app/karakeep

Length of output: 6847


Change appears to be in the wrong function given the stated objectives.

The duplicate JSON instruction is correctly identified at lines 61 and 64 in buildImagePrompt. However, this change does not align with the stated PR objective of protecting against truncation:

  • buildImagePrompt (lines 46-65) performs no truncation—it returns a static template string
  • The actual truncation happens in buildTextPrompt (lines 131-156), which calculates token budgets and truncates content before passing it to constructTextTaggingPrompt
  • constructTextTaggingPrompt (lines 70-96), where text content is actually processed, has only a single JSON instruction at the end (line 95), not a duplicate

If the goal is to protect the JSON instruction from being lost during truncation, the duplicate instruction should be added to constructTextTaggingPrompt before the <TEXT_CONTENT> block to ensure it survives token truncation in buildTextPrompt.

🤖 Prompt for AI Agents
In @packages/shared/prompts.ts at line 61, The duplicate JSON instruction was
added to buildImagePrompt but truncation happens in
buildTextPrompt/constructTextTaggingPrompt; move or add the duplicated
instruction into constructTextTaggingPrompt immediately before the
<TEXT_CONTENT> placeholder so the JSON instruction is present prior to content
truncation performed by buildTextPrompt; keep buildImagePrompt unchanged and
ensure only one JSON instruction remains at the end of
constructTextTaggingPrompt (and/or a duplicate immediately before
<TEXT_CONTENT>) so it survives token truncation.

@ElectricTea ElectricTea marked this pull request as draft January 6, 2026 22:07
@ElectricTea
Copy link
Contributor Author

Woops, I added it to the incorrect function 🤦

Sorry about that. Will close this PR and open another to keep the commit history clean.

@ElectricTea ElectricTea closed this Jan 6, 2026
@ElectricTea ElectricTea deleted the patch-3 branch January 6, 2026 22:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant