feat: screenshots in prompt, vision toggle, LLM-driven blind select by felipefl142 · Pull Request #76 · coder/balatrollm

felipefl142 · 2026-04-19T20:36:03Z

Summary

Embed screenshots in the LLM prompt as base64 image_url content blocks so vision-capable models can see the current board. Screenshots are still persisted to screenshots/{custom_id}.png — now taken before the request via a new Collector.peek_next_custom_id().
--no-vision flag / BALATROLLM_VISION env var / vision config field (default on) for text-only models (e.g. local Ollama). LLMClient also auto-detects 404 "image input" errors, strips image blocks, and retries once; subsequent calls skip screenshots for the rest of the session.
LLM now handles BLIND_SELECT instead of the bot auto-selecting, so strategies can choose to skip a blind via the existing skip tool.
tool_choice=\"required\" so models must emit a tool call rather than free prose.
next_round TOOLS.json description reinforces that the tool must be called, not described in JSON — some models stall the shop loop otherwise.
_to_wine_path helper so the screenshot RPC receives a Windows-style path when Balatro runs under Wine/Proton.

Test plan

make quality passes
Run a full game with a vision-capable model (e.g. gpt-4o-mini) and confirm screenshots reach the prompt + are still saved to screenshots/
Run with --no-vision against a text-only model and confirm no image blocks sent
Run against a model that 404s on images and confirm auto-fallback kicks in mid-session
Confirm bot now occasionally skips a blind where the strategy calls for it
Confirm next_round tool is invoked reliably at shop end

… select - Embed a pre-call screenshot (base64 image_url) in the LLM prompt so vision-capable models see the current board. Screenshot still lands in `screenshots/{custom_id}.png` via Collector.peek_next_custom_id. - Add `vision` config (default true) + `--no-vision` CLI flag + `BALATROLLM_VISION` env var for text-only models (e.g. Ollama). - LLMClient auto-detects 404 "image input" errors, strips image blocks, and retries once; subsequent calls skip screenshots for the session. - Hand BLIND_SELECT to the LLM instead of auto-selecting, so strategies can choose to skip blinds via the existing `skip` tool. - Pass `tool_choice="required"` so models must emit a tool call rather than prose. - Add `_to_wine_path` helper so the `screenshot` RPC receives a Windows-style path when the game runs under Wine/Proton.

Some models emit raw JSON for `next_round` instead of invoking the function, which stalls the shop loop. Reinforce in the description that it must be called as a tool.

felipefl142 added 2 commits April 19, 2026 17:34

docs(strategies): clarify next_round must be a tool call

1bec84d

Some models emit raw JSON for `next_round` instead of invoking the function, which stalls the shop loop. Reinforce in the description that it must be called as a tool.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: screenshots in prompt, vision toggle, LLM-driven blind select#76

feat: screenshots in prompt, vision toggle, LLM-driven blind select#76
felipefl142 wants to merge 2 commits intocoder:mainfrom
felipefl142:feat/vision-prompt-screenshots

felipefl142 commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

felipefl142 commented Apr 19, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant