Image to image with gemini-2.0-flash-preview-image-generation #248

tpaulshippy · 2025-06-14T19:05:30Z

What this does

Enable image-to-image generation with gemini-2.0-flash-preview-image-generation

Type of change

New feature

Scope check

I read the Contributing Guide
This aligns with RubyLLM's focus on LLM communication
This isn't application-specific logic that belongs in user code
This benefits most users, not just my specific use case

Quality check

I ran overcommit --install and all hooks pass
I tested my changes thoroughly
I updated documentation if needed
I didn't modify auto-generated files manually (models.json, aliases.json)

API changes

New public methods/classes

Related issues

Screenshots

Here's what the test did.

Input
put this in a ring

Output

Second input
'change the background to blue'

Second output

tpaulshippy · 2025-06-15T14:49:39Z

Thinking I need to move to content with attachments so the image gets sent properly on the next call.

crmne

I know it's a draft and you mentioned it in a comment but you shouldn't add an images attribute to the Message object since we have the Content object for a reason.

tpaulshippy · 2025-07-20T05:02:23Z

I realize this is a very different approach than the RubyLLM.paint method as it involves generating images within a chat. I do think it has some value however, as it allows for multimodal conversations.

This has similar value as #152 but there is a bit of a clash as this introduces an ImageAttachment that is provider agnostic (although only used in Gemini so far) while that PR has a ImageAttachments class that is OpenAI specific.

I also am not sure exactly how/where to document this in the guides.

@crmne Looking forward to your feedback/thoughts.

tpaulshippy · 2025-07-20T05:12:43Z

This document describes the two approaches pretty well I think. I could see an implementation of Imagen in RubyLLM that looks more like the #152 approach.

It looks like OpenAI supports conversational image generation through the responses API and a built in tool called "image_generation" - see here.

tpaulshippy · 2025-07-20T05:19:34Z

I like how OpenAI allows you to reference the previous images via IDs. We really need to get support for these built-in tools via the responses API into RubyLLM. We are already doing it in a fork to get web_search_preview (see diff here) but it's pretty messy.

…d model files

- Modified to_llm to accept optional context parameter - Updated with_context to pass context to to_llm - Added tests to verify custom contexts work without global configuration - Users can now use custom contexts even when global RubyLLM config is missing

…ON) (crmne#302) ## What this does When migrating from [ruby-openai](https://github.com/alexrudall/ruby-openai), I had some issues getting the same responses in my Anthropic test suite. After some digging, I observed that the Anthropic requests send the `system context` as serialized JSON instead of a plain string like described in the [API reference](https://docs.anthropic.com/en/api/messages#body-system): ```ruby { :system => "{type:\n \"text\", text: \"You must include the exact phrase \\\"XKCD7392\\\" somewhere\n in your response.\"}", [...] } ``` instead of : ```ruby { :system => "You must include the exact phrase \"XKCD7392\" somewhere in your response.", [...] } ``` It works quite well (the model still understands it) but it uses more tokens than needed. It could also mislead the model in interpreting the system prompt. This PR fixed it. I also took the initiative to make the temperature an optional parameter ([just like with OpenAI](https://github.com/crmne/ruby_llm/blob/main/lib/ruby_llm/providers/openai/chat.rb#L21-L22)). I hope it's not too much for a single PR, but since I was already re-recording the cassettes, I figured it would be easier. I'm sorry but I don't have any API key for Bedrock/OpenRouter. I only recorded the main Anthropic cassettes. ## Type of change - [x] Bug fix - [ ] New feature - [ ] Breaking change - [ ] Documentation - [ ] Performance improvement ## Scope check - [x] I read the [Contributing Guide](https://github.com/crmne/ruby_llm/blob/main/CONTRIBUTING.md) - [x] This aligns with RubyLLM's focus on **LLM communication** - [x] This isn't application-specific logic that belongs in user code - [x] This benefits most users, not just my specific use case ## Quality check - [x] I ran `overcommit --install` and all hooks pass - [x] I tested my changes thoroughly - [ ] I updated documentation if needed - [x] I didn't modify auto-generated files manually (`models.json`, `aliases.json`) ## API changes - [ ] Breaking change - [ ] New public methods/classes - [ ] Changed method signatures - [ ] No API changes ## Related issues  --------- Co-authored-by: Carmine Paolino <[email protected]>

## What this does  Give callers access to the Faraday response on a property of the Message called "raw" ## Type of change - [x] New feature ## Scope check - [x] I read the [Contributing Guide](https://github.com/crmne/ruby_llm/blob/main/CONTRIBUTING.md) - [x] This aligns with RubyLLM's focus on **LLM communication** - [x] This isn't application-specific logic that belongs in user code - [x] This benefits most users, not just my specific use case ## Quality check - [x] I ran `overcommit --install` and all hooks pass - [x] I tested my changes thoroughly - [x] I updated documentation if needed - [x] I didn't modify auto-generated files manually (`models.json`, `aliases.json`) ## API changes - [x] New public methods/classes ## Related issues  Resolves crmne#301 --------- Co-authored-by: Mike Robbins <[email protected]>

## What this does This PR adds a new callback hook to `Chat` that sends information when a tool call is initiated by the model. This is useful when building a coding agent to show the user progress of interactions inline with streaming responses. ## Type of change - [ ] Bug fix - [x] New feature - [ ] Breaking change - [ ] Documentation - [ ] Performance improvement ## Scope check - [x] I read the [Contributing Guide](https://github.com/crmne/ruby_llm/blob/main/CONTRIBUTING.md) - [x] This aligns with RubyLLM's focus on **LLM communication** - [x] This isn't application-specific logic that belongs in user code - [x] This benefits most users, not just my specific use case - this is beneficial to all users who want to show tool call indications to the user ## Quality check - [x] I ran `overcommit --install` and all hooks pass - [x] I tested my changes thoroughly - [x] I updated documentation if needed - [x] I didn't modify auto-generated files manually (`models.json`, `aliases.json`) ## API changes - [ ] Breaking change - [x] New public methods/classes - [ ] Changed method signatures - [ ] No API changes ## Related issues N/A --------- Co-authored-by: Carmine Paolino <[email protected]>

…y V1 and V2 (crmne#273) ## What this does When used within our app, streaming error responses were throwing an error and not being properly handled ``` worker | D, [2025-07-03T18:49:52.221013 #81269] DEBUG -- RubyLLM: Received chunk: event: error worker | data: {"type":"error","error":{"details":null,"type":"overloaded_error","message":"Overloaded"} } worker | worker | worker | 2025-07-03 18:49:52.233610 E [81269:sidekiq.default/processor chat_agent.rb:42] {jid: 7382519287f08cfa7cd1e4e4, queue: default} Rails -- Error in ChatAgent#send_with_streaming: NoMethodError - undefined method `merge' for nil:NilClass worker | worker | error_response = env.merge(body: JSON.parse(error_data), status: status) worker | ^^^^^^ worker | 2025-07-03 18:49:52.233852 E [81269:sidekiq.default/processor chat_agent.rb:43] {jid: 7382519287f08cfa7cd1e4e4, queue: default} Rails -- Backtrace: /Users/dansingerman/.rbenv/versions/3.1.6/lib/ruby/gems/3.1.0/gems/ruby_llm-1.3.1/lib/ruby_llm/streaming.rb:91:in `handle_error_chunk' worker | /Users/dansingerman/.rbenv/versions/3.1.6/lib/ruby/gems/3.1.0/gems/ruby_llm-1.3.1/lib/ruby_llm/streaming.rb:62:in `process_stream_chunk' worker | /Users/dansingerman/.rbenv/versions/3.1.6/lib/ruby/gems/3.1.0/gems/ruby_llm-1.3.1/lib/ruby_llm/streaming.rb:70:in `block in legacy_stream_processor' worker | /Users/dansingerman/.rbenv/versions/3.1.6/lib/ruby/gems/3.1.0/gems/faraday-net_http-1.0.1/lib/faraday/adapter/net_http.rb:113:in `block in perform_request' worker | /Users/dansingerman/.rbenv/versions/3.1.6/lib/ruby/gems/3.1.0/gems/net-protocol-0.2.2/lib/net/protocol.rb:535:in `call_block' worker | /Users/dansingerman/.rbenv/versions/3.1.6/lib/ruby/gems/3.1.0/gems/net-protocol-0.2.2/lib/net/protocol.rb:526:in `<<' worker | /Users/dansingerman/.rbenv/versions/3.1.6/lib/ruby/gems/3.1.0/gems/net-protocol-0.2.2/lib/net/protocol.rb ``` It looks like the [introduction of support for Faraday V1 ](crmne#173 this error, as the error handling relies on an `env` that is no longer passed. This should provide a fix for both V1 and V2. One thing to note, I had to manually construct the VCR cassettes, I'm not sure of a better way to test an intermittent error response. I have also only written the tests against `anthropic/claude-3-5-haiku-20241022` - it's possible other models with a different error format may still not be properly handled, but even in that case it won't error for the reasons fixed here. ## Type of change - [x] Bug fix - [ ] New feature - [ ] Breaking change - [ ] Documentation - [ ] Performance improvement ## Scope check - [x] I read the [Contributing Guide](https://github.com/crmne/ruby_llm/blob/main/CONTRIBUTING.md) - [x] This aligns with RubyLLM's focus on **LLM communication** - [x] This isn't application-specific logic that belongs in user code - [x] This benefits most users, not just my specific use case ## Quality check - [x] I ran `overcommit --install` and all hooks pass - [x] I tested my changes thoroughly - [x] I updated documentation if needed - [x] I didn't modify auto-generated files manually (`models.json`, `aliases.json`) ## API changes - [ ] Breaking change - [ ] New public methods/classes - [ ] Changed method signatures - [x] No API changes ## Related issues --------- Co-authored-by: Carmine Paolino <[email protected]>

@ioquatix

## Summary - Added documentation for handling ActionCable message ordering issues - Includes a Stimulus controller solution for client-side reordering - Mentions async stack and AnyCable as alternatives ## Context This PR addresses the message ordering issues discussed in crmne#282. The documentation includes: 1. A Stimulus controller that reorders messages based on timestamps 2. Explanation of ActionCable's ordering limitations 3. Alternative approaches (async stack, AnyCable) ## Request for Review @ioquatix @palkan - I'd appreciate your review on the technical accuracy of this documentation, particularly: - Is my description of ActionCable's ordering behavior accurate? - Are the suggested solutions appropriate? - Any other approaches you'd recommend documenting? ## Test Plan - [x] Documentation builds correctly - [x] Code examples are syntactically correct - [ ] Technical accuracy verified by domain experts

Corrects ActionCable to Action Cable throughout the documentation to match Rails naming conventions.

- Add structured output with JSON schemas example - Include async support and model registry features - Expand document analysis to include CSV, JSON, XML, Markdown, and code files - Add smart configuration and automatic retry features - Show proper RubyLLM::Schema subclassing pattern for structured output - Ensure feature parity between README.md and docs/index.md

- Implement Perplexity as OpenAI-compatible provider - Add 5 Perplexity models with pricing information - Handle HTML error responses from Perplexity API - Add test coverage with proper skips for unsupported features - Update documentation and configuration

Promoted the available models documentation from guides subfolder to top-level navigation after ecosystem section for better visibility.

Implements Mistral AI as an OpenAI-compatible provider with minimal customizations: - Extends OpenAI provider for core functionality - Custom Chat module to handle system role mapping (uses 'system' instead of 'developer') - Custom render_payload to remove unsupported stream_options parameter - Custom Embeddings module that ignores dimensions parameter (not supported by Mistral) - Implements capabilities detection for vision (pixtral), embeddings, and chat models - Adds ministral-3b-latest for chat tests (cheapest option) - Adds pixtral-12b-latest for vision tests - Adds mistral-embed for embedding tests - Fetches and includes 63 Mistral models in models.json - Adds appropriate test skips for known model limitations

…ersations and custom dimensions

Fixes crmne#310, crmne#311

- Fix mistral models capabilities format (Hash -> Array) - Fix imagen models output modality (text -> image) - Add models_schema.json for validation Fixes crmne#315

…#318) ## What this does An `on_tool_call` callback was added in `1.4.0` via crmne#299, but it doens't work with a model using the rails integration via `acts_as_chat`. This PR wires up the missing method so it works with the integration. ## Type of change - [x] Bug fix - [ ] New feature - [ ] Breaking change - [ ] Documentation - [ ] Performance improvement ## Scope check - [x] I read the [Contributing Guide](https://github.com/crmne/ruby_llm/blob/main/CONTRIBUTING.md) - [x] This aligns with RubyLLM's focus on **LLM communication** - [x] This isn't application-specific logic that belongs in user code - [x] This benefits most users, not just my specific use case ## Quality check - [ ] ~I ran `overcommit --install` and all hooks pass~ - When I tried to commit the hooks generated a bunch of changes to `models.json` and `aliases.json` and broke a bunch of the specs, so I removed the hooks and ran specs and rubocop manually - [x] I tested my changes thoroughly - [x] I updated documentation if needed - No need - [x] I didn't modify auto-generated files manually (`models.json`, `aliases.json`) ## API changes - [ ] Breaking change - [ ] New public methods/classes - [ ] Changed method signatures - [x] No API changes ## Related issues

tpaulshippy added 4 commits June 14, 2025 11:18

Support modalities for gemini-2.0-flash-preview-image-generation

f3136e3

Extract images from chat response

6e3128c

Rubocop

28cf942

Set modalities from capabilities

357ea8a

crmne requested changes Jul 16, 2025

View reviewed changes

crmne added the enhancement New feature or request label Jul 16, 2025

tpaulshippy added 4 commits July 19, 2025 20:31

Merge branch 'main' into image-to-image

0fed244

Attach output image to message content

9a1eeb8

Update comment

1f60caa

Refine image in conversation

98097d4

tpaulshippy marked this pull request as ready for review July 20, 2025 04:57

Merge branch 'main' into image-to-image

1b58d43

tpaulshippy requested a review from crmne July 29, 2025 01:32

crmne and others added 13 commits August 2, 2025 22:01

Remove duplicate SEO tags from docs

d14d1e9

Updated models

460108c

fix: add missing blank lines for improved readability in generator an…

730a8c8

…d model files

Bump version to 1.4.0 and update VCR cassettes

b6095a5

Update model pricing and capabilities in JSON configuration

8b0809b

Fix Action Cable capitalization in Rails guide

598b584

Corrects ActionCable to Action Cable throughout the documentation to match Rails naming conventions.

crmne and others added 24 commits August 2, 2025 22:01

Update Rails guide with instant message display pattern

7595a4e

Move available models guide to top-level navigation

7842a0b

Promoted the available models documentation from guides subfolder to top-level navigation after ecosystem section for better visibility.

Fix broken links to available-models guide after relocation

fe9d9d9

Update specs to disable additional RuboCop checks for multi-turn conv…

81f0a8c

…ersations and custom dimensions

docs: add mistral provider

c5e059a

reorder providers alphabetically

9ae1018

Bust cache of gem version badge in README

2336483

Fix Rails generator migration order and PostgreSQL detection

bf8c096

Fixes crmne#310, crmne#311

Removed unnecessary rubocop disable comments after last commit

2ff42aa

Fix Mistral models created_at timestamps

5a19a41

Version bump to 1.5.0

a405e9e

Fix model capabilities format and imagen output modality

07444e5

- Fix mistral models capabilities format (Hash -> Array) - Fix imagen models output modality (text -> image) - Add models_schema.json for validation Fixes crmne#315

Automatically generate appraisal gemfiles

a6dcd40

Update JRuby version in CI matrix to jruby-10.0.1.0

b3b4684

Bump version to 1.5.1

98f0cd1

Bust cache for gem badge in README

db1d563

Bust cache again for gem badge

43afe2f

Resolve rubocop offenses

f291744

Merge branch 'main' into image-to-image

76c7714

Update guides

84a939f

Merge branch 'main' into image-to-image

5c2c5d2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Image to image with gemini-2.0-flash-preview-image-generation #248

Image to image with gemini-2.0-flash-preview-image-generation #248

Uh oh!

tpaulshippy commented Jun 14, 2025 •

edited

Loading

Uh oh!

tpaulshippy commented Jun 15, 2025

Uh oh!

crmne left a comment

Uh oh!

tpaulshippy commented Jul 20, 2025

Uh oh!

tpaulshippy commented Jul 20, 2025

Uh oh!

tpaulshippy commented Jul 20, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Image to image with gemini-2.0-flash-preview-image-generation #248

Are you sure you want to change the base?

Image to image with gemini-2.0-flash-preview-image-generation #248

Uh oh!

Conversation

tpaulshippy commented Jun 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this does

Type of change

Scope check

Quality check

API changes

Related issues

Screenshots

Uh oh!

tpaulshippy commented Jun 15, 2025

Uh oh!

crmne left a comment

Choose a reason for hiding this comment

Uh oh!

tpaulshippy commented Jul 20, 2025

Uh oh!

tpaulshippy commented Jul 20, 2025

Uh oh!

tpaulshippy commented Jul 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

tpaulshippy commented Jun 14, 2025 •

edited

Loading

tpaulshippy commented Jul 20, 2025 •

edited

Loading