fix: Add JSON instruction to default text tagging prompt before content insertion#2356
fix: Add JSON instruction to default text tagging prompt before content insertion#2356ElectricTea wants to merge 1 commit intokarakeep-app:mainfrom
Conversation
…nt insertion Added the prompt instruction ``` - You must respond in valid JSON with the key "tags" and the value is list of tags. Don't wrap the response in a markdown code. ``` to the default text tagging instructions before the content insertion. This change significantly improves the success rate of a response containing structured JSON when a prompt is truncated by the LLM due to the prompt exceeding a maximum token limit. I kept the original JSON instruction at the end of the prompt because it "reminds" the LLM to use JSON structure after the content insertion, and it causes no issues.
WalkthroughA duplicate instruction line was added to the Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Greptile SummaryAdded JSON format instruction before content insertion in Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Worker as Tagging Worker
participant Prompt as buildImagePrompt()
participant LLM as LLM Service
participant Parser as parseJsonFromLLMResponse()
Worker->>Prompt: Request image tagging prompt
Note over Prompt: Constructs prompt with rules<br/>JSON instruction at line 61<br/>tagStyleInstruction<br/>customPrompts<br/>JSON instruction reminder at line 64
Prompt-->>Worker: Return complete prompt
Worker->>LLM: Send prompt + image
alt Token limit exceeded
Note over LLM: Truncates prompt from end<br/>Early JSON instruction survives
LLM-->>Worker: Valid JSON response
else Normal processing
Note over LLM: Processes full prompt<br/>Both JSON instructions present
LLM-->>Worker: Valid JSON response
end
Worker->>Parser: Parse LLM response
alt Valid JSON
Parser-->>Worker: Parsed tags object
else Invalid format
Note over Parser: Attempts extraction from<br/>markdown or text
Parser-->>Worker: Parsed tags or error
end
Worker->>Worker: Connect tags to bookmark
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI Agents
In @packages/shared/prompts.ts:
- Line 61: The duplicate JSON instruction was added to buildImagePrompt but
truncation happens in buildTextPrompt/constructTextTaggingPrompt; move or add
the duplicated instruction into constructTextTaggingPrompt immediately before
the <TEXT_CONTENT> placeholder so the JSON instruction is present prior to
content truncation performed by buildTextPrompt; keep buildImagePrompt unchanged
and ensure only one JSON instruction remains at the end of
constructTextTaggingPrompt (and/or a duplicate immediately before
<TEXT_CONTENT>) so it survives token truncation.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
packages/shared/prompts.ts
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{ts,tsx}
📄 CodeRabbit inference engine (AGENTS.md)
Use TypeScript for type safety in all source files
Files:
packages/shared/prompts.ts
**/*.{ts,tsx,js,jsx,json,css,md}
📄 CodeRabbit inference engine (AGENTS.md)
Format code using Prettier according to project standards
Files:
packages/shared/prompts.ts
**/*.{ts,tsx,js,jsx}
📄 CodeRabbit inference engine (AGENTS.md)
Lint code using oxlint and fix issues with
pnpm lint:fix
Files:
packages/shared/prompts.ts
packages/shared/**/*.{ts,tsx}
📄 CodeRabbit inference engine (AGENTS.md)
Organize shared code and types in the
packages/shareddirectory for use across packages
Files:
packages/shared/prompts.ts
🧠 Learnings (1)
📚 Learning: 2026-01-03T11:36:34.916Z
Learnt from: RobertRosca
Repo: karakeep-app/karakeep PR: 2339
File: packages/shared/config.ts:62-62
Timestamp: 2026-01-03T11:36:34.916Z
Learning: In packages/shared/config.ts, enforce OpenAI SDK version compatibility: service_tier values are limited to ["auto", "default", "flex"]. The "priority" tier requires OpenAI SDK >= v5.8.0. Add a guard or validation in config to prevent using priority tier unless the SDK is upgraded (v5.8.0+). Consider documenting this constraint and adding a unit test or lint rule to ensure only allowed service_tier values are used based on the installed SDK version.
Applied to files:
packages/shared/prompts.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
| - If the tag is not generic enough, don't include it. | ||
| - Aim for 10-15 tags. | ||
| - If there are no good tags, don't emit any. | ||
| - You must respond in valid JSON with the key "tags" and the value is list of tags. Don't wrap the response in a markdown code. |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
wc -l packages/shared/prompts.tsRepository: karakeep-app/karakeep
Length of output: 95
🏁 Script executed:
cat -n packages/shared/prompts.tsRepository: karakeep-app/karakeep
Length of output: 6847
Change appears to be in the wrong function given the stated objectives.
The duplicate JSON instruction is correctly identified at lines 61 and 64 in buildImagePrompt. However, this change does not align with the stated PR objective of protecting against truncation:
buildImagePrompt(lines 46-65) performs no truncation—it returns a static template string- The actual truncation happens in
buildTextPrompt(lines 131-156), which calculates token budgets and truncates content before passing it toconstructTextTaggingPrompt constructTextTaggingPrompt(lines 70-96), where text content is actually processed, has only a single JSON instruction at the end (line 95), not a duplicate
If the goal is to protect the JSON instruction from being lost during truncation, the duplicate instruction should be added to constructTextTaggingPrompt before the <TEXT_CONTENT> block to ensure it survives token truncation in buildTextPrompt.
🤖 Prompt for AI Agents
In @packages/shared/prompts.ts at line 61, The duplicate JSON instruction was
added to buildImagePrompt but truncation happens in
buildTextPrompt/constructTextTaggingPrompt; move or add the duplicated
instruction into constructTextTaggingPrompt immediately before the
<TEXT_CONTENT> placeholder so the JSON instruction is present prior to content
truncation performed by buildTextPrompt; keep buildImagePrompt unchanged and
ensure only one JSON instruction remains at the end of
constructTextTaggingPrompt (and/or a duplicate immediately before
<TEXT_CONTENT>) so it survives token truncation.
|
Woops, I added it to the incorrect function 🤦 Sorry about that. Will close this PR and open another to keep the commit history clean. |
Added the prompt instruction
to the default text tagging instructions before the content insertion. This change significantly improves the success rate of a response containing structured JSON when a prompt is truncated by the LLM due to the prompt exceeding a maximum token limit.
I kept the original JSON instruction at the end of the prompt because it "reminds" the LLM to use JSON structure after the content insertion, and it causes no issues.