Skip to content

feat: screenshots in prompt, vision toggle, LLM-driven blind select#76

Open
felipefl142 wants to merge 2 commits intocoder:mainfrom
felipefl142:feat/vision-prompt-screenshots
Open

feat: screenshots in prompt, vision toggle, LLM-driven blind select#76
felipefl142 wants to merge 2 commits intocoder:mainfrom
felipefl142:feat/vision-prompt-screenshots

Conversation

@felipefl142
Copy link
Copy Markdown
Contributor

Summary

  • Embed screenshots in the LLM prompt as base64 image_url content blocks so vision-capable models can see the current board. Screenshots are still persisted to screenshots/{custom_id}.png — now taken before the request via a new Collector.peek_next_custom_id().
  • --no-vision flag / BALATROLLM_VISION env var / vision config field (default on) for text-only models (e.g. local Ollama). LLMClient also auto-detects 404 "image input" errors, strips image blocks, and retries once; subsequent calls skip screenshots for the rest of the session.
  • LLM now handles BLIND_SELECT instead of the bot auto-selecting, so strategies can choose to skip a blind via the existing skip tool.
  • tool_choice=\"required\" so models must emit a tool call rather than free prose.
  • next_round TOOLS.json description reinforces that the tool must be called, not described in JSON — some models stall the shop loop otherwise.
  • _to_wine_path helper so the screenshot RPC receives a Windows-style path when Balatro runs under Wine/Proton.

Test plan

  • make quality passes
  • Run a full game with a vision-capable model (e.g. gpt-4o-mini) and confirm screenshots reach the prompt + are still saved to screenshots/
  • Run with --no-vision against a text-only model and confirm no image blocks sent
  • Run against a model that 404s on images and confirm auto-fallback kicks in mid-session
  • Confirm bot now occasionally skips a blind where the strategy calls for it
  • Confirm next_round tool is invoked reliably at shop end

… select

- Embed a pre-call screenshot (base64 image_url) in the LLM prompt so
  vision-capable models see the current board. Screenshot still lands in
  `screenshots/{custom_id}.png` via Collector.peek_next_custom_id.
- Add `vision` config (default true) + `--no-vision` CLI flag +
  `BALATROLLM_VISION` env var for text-only models (e.g. Ollama).
- LLMClient auto-detects 404 "image input" errors, strips image blocks,
  and retries once; subsequent calls skip screenshots for the session.
- Hand BLIND_SELECT to the LLM instead of auto-selecting, so strategies
  can choose to skip blinds via the existing `skip` tool.
- Pass `tool_choice="required"` so models must emit a tool call rather
  than prose.
- Add `_to_wine_path` helper so the `screenshot` RPC receives a
  Windows-style path when the game runs under Wine/Proton.
Some models emit raw JSON for `next_round` instead of invoking the
function, which stalls the shop loop. Reinforce in the description
that it must be called as a tool.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant