feat: screenshots in prompt, vision toggle, LLM-driven blind select#76
Open
felipefl142 wants to merge 2 commits intocoder:mainfrom
Open
feat: screenshots in prompt, vision toggle, LLM-driven blind select#76felipefl142 wants to merge 2 commits intocoder:mainfrom
felipefl142 wants to merge 2 commits intocoder:mainfrom
Conversation
… select
- Embed a pre-call screenshot (base64 image_url) in the LLM prompt so
vision-capable models see the current board. Screenshot still lands in
`screenshots/{custom_id}.png` via Collector.peek_next_custom_id.
- Add `vision` config (default true) + `--no-vision` CLI flag +
`BALATROLLM_VISION` env var for text-only models (e.g. Ollama).
- LLMClient auto-detects 404 "image input" errors, strips image blocks,
and retries once; subsequent calls skip screenshots for the session.
- Hand BLIND_SELECT to the LLM instead of auto-selecting, so strategies
can choose to skip blinds via the existing `skip` tool.
- Pass `tool_choice="required"` so models must emit a tool call rather
than prose.
- Add `_to_wine_path` helper so the `screenshot` RPC receives a
Windows-style path when the game runs under Wine/Proton.
Some models emit raw JSON for `next_round` instead of invoking the function, which stalls the shop loop. Reinforce in the description that it must be called as a tool.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
image_urlcontent blocks so vision-capable models can see the current board. Screenshots are still persisted toscreenshots/{custom_id}.png— now taken before the request via a newCollector.peek_next_custom_id().--no-visionflag /BALATROLLM_VISIONenv var /visionconfig field (default on) for text-only models (e.g. local Ollama).LLMClientalso auto-detects 404"image input"errors, strips image blocks, and retries once; subsequent calls skip screenshots for the rest of the session.BLIND_SELECTinstead of the bot auto-selecting, so strategies can choose to skip a blind via the existingskiptool.tool_choice=\"required\"so models must emit a tool call rather than free prose.next_roundTOOLS.json description reinforces that the tool must be called, not described in JSON — some models stall the shop loop otherwise._to_wine_pathhelper so thescreenshotRPC receives a Windows-style path when Balatro runs under Wine/Proton.Test plan
make qualitypassesgpt-4o-mini) and confirm screenshots reach the prompt + are still saved toscreenshots/--no-visionagainst a text-only model and confirm no image blocks sentnext_roundtool is invoked reliably at shop end