Skip to content

Image to image with gemini-2.0-flash-preview-image-generation #248

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 46 commits into
base: main
Choose a base branch
from

Conversation

tpaulshippy
Copy link
Contributor

@tpaulshippy tpaulshippy commented Jun 14, 2025

What this does

Enable image-to-image generation with gemini-2.0-flash-preview-image-generation

Type of change

  • New feature

Scope check

  • I read the Contributing Guide
  • This aligns with RubyLLM's focus on LLM communication
  • This isn't application-specific logic that belongs in user code
  • This benefits most users, not just my specific use case

Quality check

  • I ran overcommit --install and all hooks pass
  • I tested my changes thoroughly
  • I updated documentation if needed
  • I didn't modify auto-generated files manually (models.json, aliases.json)

API changes

  • New public methods/classes

Related issues

Screenshots

Here's what the test did.

Input
put this in a ring
ruby

Output
image

Second input
'change the background to blue'

Second output
image

@tpaulshippy
Copy link
Contributor Author

Thinking I need to move to content with attachments so the image gets sent properly on the next call.

Copy link
Owner

@crmne crmne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's a draft and you mentioned it in a comment but you shouldn't add an images attribute to the Message object since we have the Content object for a reason.

@crmne crmne added the enhancement New feature or request label Jul 16, 2025
@tpaulshippy tpaulshippy marked this pull request as ready for review July 20, 2025 04:57
@tpaulshippy
Copy link
Contributor Author

I realize this is a very different approach than the RubyLLM.paint method as it involves generating images within a chat. I do think it has some value however, as it allows for multimodal conversations.

This has similar value as #152 but there is a bit of a clash as this introduces an ImageAttachment that is provider agnostic (although only used in Gemini so far) while that PR has a ImageAttachments class that is OpenAI specific.

I also am not sure exactly how/where to document this in the guides.

@crmne Looking forward to your feedback/thoughts.

@tpaulshippy
Copy link
Contributor Author

This document describes the two approaches pretty well I think. I could see an implementation of Imagen in RubyLLM that looks more like the #152 approach.

It looks like OpenAI supports conversational image generation through the responses API and a built in tool called "image_generation" - see here.

@tpaulshippy
Copy link
Contributor Author

tpaulshippy commented Jul 20, 2025

I like how OpenAI allows you to reference the previous images via IDs. We really need to get support for these built-in tools via the responses API into RubyLLM. We are already doing it in a fork to get web_search_preview (see diff here) but it's pretty messy.

@tpaulshippy tpaulshippy requested a review from crmne July 29, 2025 01:32
crmne and others added 13 commits August 2, 2025 22:01
- Modified to_llm to accept optional context parameter
- Updated with_context to pass context to to_llm
- Added tests to verify custom contexts work without global configuration
- Users can now use custom contexts even when global RubyLLM config is missing
…ON) (crmne#302)

## What this does

When migrating from
[ruby-openai](https://github.com/alexrudall/ruby-openai), I had some
issues getting the same responses in my Anthropic test suite.

After some digging, I observed that the Anthropic requests send the
`system context` as serialized JSON instead of a plain string like
described in the [API
reference](https://docs.anthropic.com/en/api/messages#body-system):

```ruby
{
  :system => "{type:\n        \"text\", text: \"You must include the exact phrase \\\"XKCD7392\\\" somewhere\n        in your response.\"}",
  [...]
}
```

instead of :  

```ruby
{
  :system => "You must include the exact phrase \"XKCD7392\" somewhere in your response.",
  [...]
}
```

It works quite well (the model still understands it) but it uses more
tokens than needed. It could also mislead the model in interpreting the
system prompt.

This PR fixed it. I also took the initiative to make the temperature an
optional parameter ([just like with
OpenAI](https://github.com/crmne/ruby_llm/blob/main/lib/ruby_llm/providers/openai/chat.rb#L21-L22)).
I hope it's not too much for a single PR, but since I was already
re-recording the cassettes, I figured it would be easier.

I'm sorry but I don't have any API key for Bedrock/OpenRouter. I only
recorded the main Anthropic cassettes.

## Type of change

- [x] Bug fix
- [ ] New feature
- [ ] Breaking change
- [ ] Documentation
- [ ] Performance improvement

## Scope check

- [x] I read the [Contributing
Guide](https://github.com/crmne/ruby_llm/blob/main/CONTRIBUTING.md)
- [x] This aligns with RubyLLM's focus on **LLM communication**
- [x] This isn't application-specific logic that belongs in user code
- [x] This benefits most users, not just my specific use case

## Quality check

- [x] I ran `overcommit --install` and all hooks pass
- [x] I tested my changes thoroughly
- [ ] I updated documentation if needed
- [x] I didn't modify auto-generated files manually (`models.json`,
`aliases.json`)

## API changes

- [ ] Breaking change
- [ ] New public methods/classes
- [ ] Changed method signatures
- [ ] No API changes

## Related issues

<!-- Link issues: "Fixes crmne#123" or "Related to crmne#123" -->

---------

Co-authored-by: Carmine Paolino <[email protected]>
## What this does

<!-- Clear description of what this PR does and why -->
Give callers access to the Faraday response on a property of the Message
called "raw"

## Type of change

- [x] New feature

## Scope check

- [x] I read the [Contributing
Guide](https://github.com/crmne/ruby_llm/blob/main/CONTRIBUTING.md)
- [x] This aligns with RubyLLM's focus on **LLM communication**
- [x] This isn't application-specific logic that belongs in user code
- [x] This benefits most users, not just my specific use case

## Quality check

- [x] I ran `overcommit --install` and all hooks pass
- [x] I tested my changes thoroughly
- [x] I updated documentation if needed
- [x] I didn't modify auto-generated files manually (`models.json`,
`aliases.json`)

## API changes

- [x] New public methods/classes

## Related issues

<!-- Link issues: "Fixes crmne#123" or "Related to crmne#123" -->
Resolves crmne#301

---------

Co-authored-by: Mike Robbins <[email protected]>
## What this does

This PR adds a new callback hook to `Chat` that sends information when a
tool call is initiated by the model. This is useful when building a
coding agent to show the user progress of interactions inline with
streaming responses.

## Type of change

- [ ] Bug fix
- [x] New feature
- [ ] Breaking change
- [ ] Documentation
- [ ] Performance improvement

## Scope check

- [x] I read the [Contributing
Guide](https://github.com/crmne/ruby_llm/blob/main/CONTRIBUTING.md)
- [x] This aligns with RubyLLM's focus on **LLM communication**
- [x] This isn't application-specific logic that belongs in user code
- [x] This benefits most users, not just my specific use case
- this is beneficial to all users who want to show tool call indications
to the user

## Quality check

- [x] I ran `overcommit --install` and all hooks pass
- [x] I tested my changes thoroughly
- [x] I updated documentation if needed
- [x] I didn't modify auto-generated files manually (`models.json`,
`aliases.json`)

## API changes

- [ ] Breaking change
- [x] New public methods/classes
- [ ] Changed method signatures
- [ ] No API changes

## Related issues

N/A

---------

Co-authored-by: Carmine Paolino <[email protected]>
…y V1 and V2 (crmne#273)

## What this does

When used within our app, streaming error responses were throwing an
error and not being properly handled

```
worker      | D, [2025-07-03T18:49:52.221013 #81269] DEBUG -- RubyLLM: Received chunk: event: error
worker      | data: {"type":"error","error":{"details":null,"type":"overloaded_error","message":"Overloaded"}               }
worker      | 
worker      | 
worker      | 2025-07-03 18:49:52.233610 E [81269:sidekiq.default/processor chat_agent.rb:42] {jid: 7382519287f08cfa7cd1e4e4, queue: default} Rails -- Error in ChatAgent#send_with_streaming: NoMethodError - undefined method `merge' for nil:NilClass
worker      | 
worker      |       error_response = env.merge(body: JSON.parse(error_data), status: status)
worker      |                           ^^^^^^
worker      | 2025-07-03 18:49:52.233852 E [81269:sidekiq.default/processor chat_agent.rb:43] {jid: 7382519287f08cfa7cd1e4e4, queue: default} Rails -- Backtrace: /Users/dansingerman/.rbenv/versions/3.1.6/lib/ruby/gems/3.1.0/gems/ruby_llm-1.3.1/lib/ruby_llm/streaming.rb:91:in `handle_error_chunk'
worker      | /Users/dansingerman/.rbenv/versions/3.1.6/lib/ruby/gems/3.1.0/gems/ruby_llm-1.3.1/lib/ruby_llm/streaming.rb:62:in `process_stream_chunk'
worker      | /Users/dansingerman/.rbenv/versions/3.1.6/lib/ruby/gems/3.1.0/gems/ruby_llm-1.3.1/lib/ruby_llm/streaming.rb:70:in `block in legacy_stream_processor'
worker      | /Users/dansingerman/.rbenv/versions/3.1.6/lib/ruby/gems/3.1.0/gems/faraday-net_http-1.0.1/lib/faraday/adapter/net_http.rb:113:in `block in perform_request'
worker      | /Users/dansingerman/.rbenv/versions/3.1.6/lib/ruby/gems/3.1.0/gems/net-protocol-0.2.2/lib/net/protocol.rb:535:in `call_block'
worker      | /Users/dansingerman/.rbenv/versions/3.1.6/lib/ruby/gems/3.1.0/gems/net-protocol-0.2.2/lib/net/protocol.rb:526:in `<<'
worker      | /Users/dansingerman/.rbenv/versions/3.1.6/lib/ruby/gems/3.1.0/gems/net-protocol-0.2.2/lib/net/protocol.rb
```

It looks like the [introduction of support for Faraday V1
](crmne#173 this error, as
the error handling relies on an `env` that is no longer passed. This
should provide a fix for both V1 and V2.

One thing to note, I had to manually construct the VCR cassettes, I'm
not sure of a better way to test an intermittent error response.

I have also only written the tests against
`anthropic/claude-3-5-haiku-20241022` - it's possible other models with
a different error format may still not be properly handled, but even in
that case it won't error for the reasons fixed here.

## Type of change

- [x] Bug fix
- [ ] New feature
- [ ] Breaking change
- [ ] Documentation
- [ ] Performance improvement

## Scope check

- [x] I read the [Contributing
Guide](https://github.com/crmne/ruby_llm/blob/main/CONTRIBUTING.md)
- [x] This aligns with RubyLLM's focus on **LLM communication**
- [x] This isn't application-specific logic that belongs in user code
- [x] This benefits most users, not just my specific use case

## Quality check

- [x] I ran `overcommit --install` and all hooks pass
- [x] I tested my changes thoroughly
- [x] I updated documentation if needed
- [x] I didn't modify auto-generated files manually (`models.json`,
`aliases.json`)

## API changes

- [ ] Breaking change
- [ ] New public methods/classes
- [ ] Changed method signatures
- [x] No API changes

## Related issues

---------

Co-authored-by: Carmine Paolino <[email protected]>
## Summary
- Added documentation for handling ActionCable message ordering issues
- Includes a Stimulus controller solution for client-side reordering
- Mentions async stack and AnyCable as alternatives

## Context
This PR addresses the message ordering issues discussed in crmne#282. The
documentation includes:

1. A Stimulus controller that reorders messages based on timestamps
2. Explanation of ActionCable's ordering limitations
3. Alternative approaches (async stack, AnyCable)

## Request for Review
@ioquatix @palkan - I'd appreciate your review on the technical accuracy
of this documentation, particularly:

- Is my description of ActionCable's ordering behavior accurate?
- Are the suggested solutions appropriate?
- Any other approaches you'd recommend documenting?

## Test Plan
- [x] Documentation builds correctly
- [x] Code examples are syntactically correct
- [ ] Technical accuracy verified by domain experts
Corrects ActionCable to Action Cable throughout the documentation to match Rails naming conventions.
- Add structured output with JSON schemas example
- Include async support and model registry features
- Expand document analysis to include CSV, JSON, XML, Markdown, and code files
- Add smart configuration and automatic retry features
- Show proper RubyLLM::Schema subclassing pattern for structured output
- Ensure feature parity between README.md and docs/index.md
crmne and others added 24 commits August 2, 2025 22:01
- Implement Perplexity as OpenAI-compatible provider
- Add 5 Perplexity models with pricing information
- Handle HTML error responses from Perplexity API
- Add test coverage with proper skips for unsupported features
- Update documentation and configuration
Promoted the available models documentation from guides subfolder to
top-level navigation after ecosystem section for better visibility.
Implements Mistral AI as an OpenAI-compatible provider with minimal customizations:

- Extends OpenAI provider for core functionality
- Custom Chat module to handle system role mapping (uses 'system' instead of 'developer')
- Custom render_payload to remove unsupported stream_options parameter
- Custom Embeddings module that ignores dimensions parameter (not supported by Mistral)
- Implements capabilities detection for vision (pixtral), embeddings, and chat models
- Adds ministral-3b-latest for chat tests (cheapest option)
- Adds pixtral-12b-latest for vision tests
- Adds mistral-embed for embedding tests
- Fetches and includes 63 Mistral models in models.json
- Adds appropriate test skips for known model limitations
- Fix mistral models capabilities format (Hash -> Array)
- Fix imagen models output modality (text -> image)
- Add models_schema.json for validation

Fixes crmne#315
…#318)

## What this does

An `on_tool_call` callback was added in `1.4.0` via
crmne#299, but it doens't work with a
model using the rails integration via `acts_as_chat`.

This PR wires up the missing method so it works with the integration.

## Type of change

- [x] Bug fix
- [ ] New feature
- [ ] Breaking change
- [ ] Documentation
- [ ] Performance improvement

## Scope check

- [x] I read the [Contributing
Guide](https://github.com/crmne/ruby_llm/blob/main/CONTRIBUTING.md)
- [x] This aligns with RubyLLM's focus on **LLM communication**
- [x] This isn't application-specific logic that belongs in user code
- [x] This benefits most users, not just my specific use case

## Quality check

- [ ] ~I ran `overcommit --install` and all hooks pass~
- When I tried to commit the hooks generated a bunch of changes to
`models.json` and `aliases.json` and broke a bunch of the specs, so I
removed the hooks and ran specs and rubocop manually
- [x] I tested my changes thoroughly
- [x] I updated documentation if needed
  - No need
- [x] I didn't modify auto-generated files manually (`models.json`,
`aliases.json`)

## API changes

- [ ] Breaking change
- [ ] New public methods/classes
- [ ] Changed method signatures
- [x] No API changes

## Related issues

<!-- Link issues: "Fixes crmne#123" or "Related to crmne#123" -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants