Skip to content

Sakaax/img-pilot

Repository files navigation

img-pilot banner

The first Claude Code plugin that generates AI images — from your terminal, cost-optimized, on-brand.

MIT License GitHub Stars Landing Page

Install  ·  Commands  ·  How it works  ·  Pilot ecosystem  ·  Landing page


Works with ux-pilot + brand-pilot

img-pilot is the third plugin in the Sakaax pilot ecosystem:

ux-pilot     → UX discovery + brief
brand-pilot  → brand tokens (CSS + Tailwind + palette)
img-pilot    → AI-generated visual assets (this plugin)

If either sister plugin has been run, img-pilot reads their outputs automatically — palette, fonts, tone, style, product type, validated design tokens. Zero reconfiguration. You keep the same design identity across every plugin in the pipeline because they all share the same DA and they all read the same briefs.


The first image generator built into Claude Code

Every other AI image tool lives in a browser tab. You context-switch, you prompt, you download, you drag into your project, you repeat for every variant. img-pilot lives inside Claude Code — the terminal you're already in — and makes Claude the art director that writes the prompt, chooses the provider, and derives every asset size from a single generation.

You pay for one API call. You get eight assets (logo + favicon set + OG + Twitter card + Discord embed + GitHub banner + iOS icon + Android icon).

Why img-pilot?

Every dev project needs visual assets. Logo. Favicon. OG image. GitHub banner. Most devs either:

  • Skip them entirely — default favicon, no OG image, cardboard-box GitHub page
  • Spend hours in Figma or Canva for basic assets, inventing a design language on the fly
  • Pay a designer $200+ for a logo they end up using mostly at 16×16

img-pilot collapses that into a single confirmed terminal command. On-brand, in minutes, for the cost of one API call.

What makes it different

Manual approach img-pilot
Prompt quality User guesses at specifics Claude builds from full UX + brand context
API calls per result One per asset (5+ calls for a full set) One source → 8+ derived assets via sharp
Cost for a full set $0.20 – $0.40 $0.03 – $0.08
Consistency Each asset designed in isolation All derived from the same source, guaranteed
Favicon set Forgotten or default Auto-derived, all sizes, webmanifest included
Review Assets used immediately Gallery HTML in img-pilot/, review before use
Provider lock-in Hardcoded to one 9 providers, switch in one config line
API key safety Hope you remembered .gitignore Auto-gitignore + chmod 600 + pre-commit hook

Installation

# Step 1 — Add the marketplace entry
/plugin marketplace add Sakaax/img-pilot

# Step 2 — Install the plugin
/plugin install img-pilot@img-pilot

Then configure a provider:

/img-pilot config

Fill in an API key for at least one of the nine supported providers. Config is stored globally at ~/.config/img-pilot/config.toml (reusable across all your projects) with an optional project-local override at <project>/img-pilot/config.toml.

Commands

/img-pilot              # Guided flow — infers what you need, proposes the cheapest path
/img-pilot logo         # Generate logo (icon / text / combo)
/img-pilot favicon      # Favicon + full icon set (16/32/180/192/512 + site.webmanifest)
/img-pilot social       # OG image, Twitter card, Discord embed
/img-pilot banner       # GitHub banner (1280×640)
/img-pilot icons        # iOS/Android app icons (with proper rounded corners)
/img-pilot config       # Edit config (providers, defaults, limits)

Every command runs standalone or as part of the full flow via /img-pilot.

How it works

1. Claude reads your context

img-pilot reads, in priority order:

Source What it extracts
brand-pilot/tokens.css + tailwind.config.snippet.js + .palette.cache.json Validated brand tokens (exact hex, radius scale, shadow scale, fonts)
brand-pilot/brand-kit.md Human-readable brand description
ux-pilot/ux-brief.md Palette, fonts, tone, style, product type, page structure
img-pilot/brief.md Your own fallback brief (from discovery)

If none of these exist, img-pilot runs a quick 6-question discovery and saves img-pilot/brief.md for future runs.

2. Claude builds the prompt

Production-quality, 200–300 word prompts. Exact hex values (not "orange"), specific style words from the brief, explicit constraints (must work at 16×16, transparent background, flat design), and a curated anti-slop list (no gradients on logo marks, no stock-photo aesthetics, no generic tech clichés like gears or circuits).

The full prompt is shown to you in chat before anything is dispatched.

3. Cost optimizer picks the cheapest path

Every request goes through a decision tree:

  1. Already exists in img-pilot/? — skill asks "regenerate?"
  2. Can we skip the API entirely? — SVG-pure (letter favicon, geometric logo)
  3. Derivable from an existing asset? — resize / compose / round corners (zero cost)
  4. Derivable from another asset in this run's plan? — one API call + N sharp derivations
  5. API call required — build prompt, pick provider, emit plan step with cost estimate

You see the complete plan before any charge: assets to produce, provider chosen (with the reason), each step's kind, total cost, prompt preview. You approve. You pay only then.

4. Gallery

Every run updates img-pilot/gallery.html — a persistent dark-themed HTML gallery of every asset generated in this project. Each card shows the image, the prompt used, the provider, the cost, dimensions, and timestamp. Latest run on top, scrollable history.

Run with --serve to browse it on http://localhost:4090.

Supported providers

Nine providers in v0.1, one thin adapter each (~80–120 lines per file). Plug your own key. No cloud SDK dependencies. Switch providers per run with --provider <name>.

Provider Model Best for ~$/image
OpenAI GPT Image 1.5 General quality baseline $0.04
Black Forest Labs FLUX 2 Pro Photorealism, sharp edges $0.04
Google Imagen 4 (Vertex AI) Best value, strong text rendering $0.04
Stability AI Stable Diffusion 3.5 Dev-friendly, self-hostable path $0.03
Ideogram Ideogram v3 Text-in-image (logos, wordmarks) $0.08
Leonardo AI Leonardo Custom models, brand fine-tuning $0.035
Replicate Multi-model router Access to any open-source model variable
Recraft Recraft v3 Native SVG output (icons) $0.04
fal.ai Multi-model, ultra-fast Low-latency, async webhooks variable

Midjourney is on the roadmap and intentionally not included in v0.1 — no official REST API (Discord-only), and we'd rather ship 9 rock-solid adapters than 10 fragile ones.

Output structure

Everything lands in img-pilot/ at your project root (auto-added to .gitignore):

img-pilot/
├── config.toml                 # Provider API keys (chmod 600, gitignored)
├── brief.md                    # Image brief (auto from ux-pilot or discovery)
├── gallery.html                # Persistent audit log of every asset
├── logo/
│   ├── logo-icon.png           # 1024×1024 source
│   ├── logo-text.png
│   └── logo-combo.png
├── favicon/
│   ├── favicon.svg
│   ├── favicon-16.png · favicon-32.png · apple-touch-icon.png
│   ├── icon-192.png · icon-512.png
│   └── site.webmanifest
├── social/
│   ├── og-image.png            # 1200×630
│   ├── twitter-card.png        # 1200×628
│   └── discord-embed.png       # 1280×720
├── banner/
│   └── github-banner.png       # 1280×640
└── icons/
    ├── ios-180.png
    ├── android-192.png
    └── android-512.png

Cost & safety

Every call is confirmed

The CLI emits a --dry-run JSON plan before anything costs money. Claude Code presents the plan in chat — assets, provider, full prompt preview, total cost — and waits for your explicit approval. You approve, you pay. You cancel, you don't. There is no path in the code where an API call happens without this confirmation round-trip.

Hard session limit

Default max_api_calls_per_session = 5 (configurable). Even if something goes sideways in the skill layer, the CLI refuses to exceed the limit and throws SessionLimitExceededError.

API keys protected at three layers

Layer What it does
Auto-gitignore img-pilot/ is appended to .gitignore before any config write. If the gitignore check fails, the write is aborted.
chmod 600 config.toml is set to owner read/write only immediately after write (POSIX; Windows uses icacls).
Pre-commit hook A hook is installed in .git/hooks/pre-commit that scans staged files for sk-..., AIza..., key-..., api_key = "..." patterns and blocks commits containing them. Append-safe (preserves any existing hook).

API keys are never printed in full anywhere — terminal output, logs, and error messages all mask them as sk-...xxxx (first 3 + last 4 chars).

No network calls you didn't trigger

Zero telemetry. Zero feature flags. Zero remote rule fetching. If you tcpdump during a scan, the only outbound traffic is the provider call you explicitly approved. Period.

Tech stack

Language TypeScript 5.8
Runtime Bun 1.1+ (Node.js fallback supported)
Image processing sharp (libvips native binding) — no ImageMagick required
HTTP Native fetch — no SDK lock-in
Config TOML via @iarna/toml
Tests 120+ via Bun test runner; all provider calls mocked in CI
License MIT

Dependencies are intentionally minimal. sharp covers 100% of the image operations needed in v0.1 (resize, crop, compose, round corners via SVG mask, format convert, text overlay). No shell-spawning. No subprocess management.

Project structure

img-pilot/
├── .claude-plugin/{plugin,marketplace}.json
├── .claude/skills/img-pilot/SKILL.md
├── skills/img-pilot.md
├── skill.json
├── src/
│   ├── index.ts                # CLI dispatcher
│   ├── types.ts
│   ├── config/                 # Hybrid global+project TOML loader
│   ├── security/               # Gitignore, chmod, pre-commit hook, key masking
│   ├── brief/                  # ux-pilot + brand-pilot + own parsers, sanitize, reader
│   ├── discovery/              # 6-question fallback flow
│   ├── optimizer/              # Derivation map, existing-scan, plan-builder
│   ├── prompt/                 # Templates + anti-slop builder
│   ├── providers/              # 9 adapters (OpenAI, BFL, Google, Stability, Ideogram, Leonardo, Replicate, Recraft, fal)
│   ├── processor/              # sharp-based resize/crop/compose/round-corners/convert/add-text
│   ├── svg/                    # Zero-cost generators (letter favicon, geometric logo, text logo, webmanifest)
│   └── gallery/                # HTML gallery + --serve mode
├── templates/                  # config.toml, gallery.html, 5 prompt templates
├── hooks/pre-commit.sh
├── tests/                      # 120+ tests, all API calls mocked
└── .github/workflows/ci.yml

Roadmap

v0.2 and beyond:

  • Midjourney when they ship an official REST API
  • Self-hosted inference — local Stable Diffusion via Ollama/ComfyUI
  • Batch mode — N variants of the same prompt in one call
  • Fine-tuning / style reference upload (LoRA, IP-adapter)
  • CI wrapper — regenerate assets in GitHub Actions on brief change
  • Provider benchmarking — run the same prompt across 3 providers and compare side-by-side
  • Live pricing — fetch cost estimates from provider APIs at dry-run time

Troubleshooting

Plugin shows old version / features not working

Claude Code caches plugins aggressively. To force a clean reinstall:

  1. Quit Claude Code completely
  2. Delete the cache:
    rm -rf ~/.claude/plugins/cache/img-pilot
  3. Relaunch Claude Code
  4. Reinstall:
    /plugin marketplace add Sakaax/img-pilot
    /plugin marketplace update img-pilot
    /plugin install img-pilot@img-pilot
    /reload-plugins

Provider configuration issues

  • Google Imagen requires a Vertex AI service-account JSON (the full JSON blob, not a path) pasted into api_key. You also need the Imagen API enabled on your GCP project. See Google Cloud quickstart.
  • Midjourney is not supported in v0.1. Use any of the other 9 providers in the meantime — FLUX 2 Pro via Black Forest Labs or Replicate gets you comparable quality.

"No config found" error on first run

Run /img-pilot config to set up a provider. Alternatively create ~/.config/img-pilot/config.toml manually — the template is shipped with the plugin and will be written on first config invocation.

sharp install fails

sharp ships a native binary (~15 MB) that downloads on bun install. If it fails:

  • Make sure you're on Node 18+ / Bun 1.1+
  • Check your network (corporate proxy? VPN?)
  • Retry: bun install --force
  • Last resort: rm -rf node_modules bun.lock && bun install

Links

Landing page img-pilot.sakaax.com
GitHub github.com/Sakaax/img-pilot
Pilot ecosystem ux-pilot · brand-pilot
Twitter @sakaaxx

Credits

Built by @Sakaax. Part of the pilot ecosystem — each plugin stands alone, but they compose into a complete design pipeline from first sketch to production assets. If you run all three in order, you get a full product design done in your terminal.

License

MIT — free forever. No accounts, no subscriptions, no usage caps beyond what your chosen image-generation provider imposes.


Built by Sakaax — first Claude Code plugin to generate AI images from the terminal.

Releases

No releases published

Packages

 
 
 

Contributors

Languages