-
-
Notifications
You must be signed in to change notification settings - Fork 213
Image to image with gemini-2.0-flash-preview-image-generation #248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Thinking I need to move to content with attachments so the image gets sent properly on the next call. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's a draft and you mentioned it in a comment but you shouldn't add an images
attribute to the Message
object since we have the Content
object for a reason.
I realize this is a very different approach than the This has similar value as #152 but there is a bit of a clash as this introduces an I also am not sure exactly how/where to document this in the guides. @crmne Looking forward to your feedback/thoughts. |
This document describes the two approaches pretty well I think. I could see an implementation of Imagen in RubyLLM that looks more like the #152 approach. It looks like OpenAI supports conversational image generation through the responses API and a built in tool called "image_generation" - see here. |
I like how OpenAI allows you to reference the previous images via IDs. We really need to get support for these built-in tools via the responses API into RubyLLM. We are already doing it in a fork to get web_search_preview (see diff here) but it's pretty messy. |
- Modified to_llm to accept optional context parameter - Updated with_context to pass context to to_llm - Added tests to verify custom contexts work without global configuration - Users can now use custom contexts even when global RubyLLM config is missing
…ON) (crmne#302) ## What this does When migrating from [ruby-openai](https://github.com/alexrudall/ruby-openai), I had some issues getting the same responses in my Anthropic test suite. After some digging, I observed that the Anthropic requests send the `system context` as serialized JSON instead of a plain string like described in the [API reference](https://docs.anthropic.com/en/api/messages#body-system): ```ruby { :system => "{type:\n \"text\", text: \"You must include the exact phrase \\\"XKCD7392\\\" somewhere\n in your response.\"}", [...] } ``` instead of : ```ruby { :system => "You must include the exact phrase \"XKCD7392\" somewhere in your response.", [...] } ``` It works quite well (the model still understands it) but it uses more tokens than needed. It could also mislead the model in interpreting the system prompt. This PR fixed it. I also took the initiative to make the temperature an optional parameter ([just like with OpenAI](https://github.com/crmne/ruby_llm/blob/main/lib/ruby_llm/providers/openai/chat.rb#L21-L22)). I hope it's not too much for a single PR, but since I was already re-recording the cassettes, I figured it would be easier. I'm sorry but I don't have any API key for Bedrock/OpenRouter. I only recorded the main Anthropic cassettes. ## Type of change - [x] Bug fix - [ ] New feature - [ ] Breaking change - [ ] Documentation - [ ] Performance improvement ## Scope check - [x] I read the [Contributing Guide](https://github.com/crmne/ruby_llm/blob/main/CONTRIBUTING.md) - [x] This aligns with RubyLLM's focus on **LLM communication** - [x] This isn't application-specific logic that belongs in user code - [x] This benefits most users, not just my specific use case ## Quality check - [x] I ran `overcommit --install` and all hooks pass - [x] I tested my changes thoroughly - [ ] I updated documentation if needed - [x] I didn't modify auto-generated files manually (`models.json`, `aliases.json`) ## API changes - [ ] Breaking change - [ ] New public methods/classes - [ ] Changed method signatures - [ ] No API changes ## Related issues <!-- Link issues: "Fixes crmne#123" or "Related to crmne#123" --> --------- Co-authored-by: Carmine Paolino <[email protected]>
## What this does <!-- Clear description of what this PR does and why --> Give callers access to the Faraday response on a property of the Message called "raw" ## Type of change - [x] New feature ## Scope check - [x] I read the [Contributing Guide](https://github.com/crmne/ruby_llm/blob/main/CONTRIBUTING.md) - [x] This aligns with RubyLLM's focus on **LLM communication** - [x] This isn't application-specific logic that belongs in user code - [x] This benefits most users, not just my specific use case ## Quality check - [x] I ran `overcommit --install` and all hooks pass - [x] I tested my changes thoroughly - [x] I updated documentation if needed - [x] I didn't modify auto-generated files manually (`models.json`, `aliases.json`) ## API changes - [x] New public methods/classes ## Related issues <!-- Link issues: "Fixes crmne#123" or "Related to crmne#123" --> Resolves crmne#301 --------- Co-authored-by: Mike Robbins <[email protected]>
## What this does This PR adds a new callback hook to `Chat` that sends information when a tool call is initiated by the model. This is useful when building a coding agent to show the user progress of interactions inline with streaming responses. ## Type of change - [ ] Bug fix - [x] New feature - [ ] Breaking change - [ ] Documentation - [ ] Performance improvement ## Scope check - [x] I read the [Contributing Guide](https://github.com/crmne/ruby_llm/blob/main/CONTRIBUTING.md) - [x] This aligns with RubyLLM's focus on **LLM communication** - [x] This isn't application-specific logic that belongs in user code - [x] This benefits most users, not just my specific use case - this is beneficial to all users who want to show tool call indications to the user ## Quality check - [x] I ran `overcommit --install` and all hooks pass - [x] I tested my changes thoroughly - [x] I updated documentation if needed - [x] I didn't modify auto-generated files manually (`models.json`, `aliases.json`) ## API changes - [ ] Breaking change - [x] New public methods/classes - [ ] Changed method signatures - [ ] No API changes ## Related issues N/A --------- Co-authored-by: Carmine Paolino <[email protected]>
…y V1 and V2 (crmne#273) ## What this does When used within our app, streaming error responses were throwing an error and not being properly handled ``` worker | D, [2025-07-03T18:49:52.221013 #81269] DEBUG -- RubyLLM: Received chunk: event: error worker | data: {"type":"error","error":{"details":null,"type":"overloaded_error","message":"Overloaded"} } worker | worker | worker | 2025-07-03 18:49:52.233610 E [81269:sidekiq.default/processor chat_agent.rb:42] {jid: 7382519287f08cfa7cd1e4e4, queue: default} Rails -- Error in ChatAgent#send_with_streaming: NoMethodError - undefined method `merge' for nil:NilClass worker | worker | error_response = env.merge(body: JSON.parse(error_data), status: status) worker | ^^^^^^ worker | 2025-07-03 18:49:52.233852 E [81269:sidekiq.default/processor chat_agent.rb:43] {jid: 7382519287f08cfa7cd1e4e4, queue: default} Rails -- Backtrace: /Users/dansingerman/.rbenv/versions/3.1.6/lib/ruby/gems/3.1.0/gems/ruby_llm-1.3.1/lib/ruby_llm/streaming.rb:91:in `handle_error_chunk' worker | /Users/dansingerman/.rbenv/versions/3.1.6/lib/ruby/gems/3.1.0/gems/ruby_llm-1.3.1/lib/ruby_llm/streaming.rb:62:in `process_stream_chunk' worker | /Users/dansingerman/.rbenv/versions/3.1.6/lib/ruby/gems/3.1.0/gems/ruby_llm-1.3.1/lib/ruby_llm/streaming.rb:70:in `block in legacy_stream_processor' worker | /Users/dansingerman/.rbenv/versions/3.1.6/lib/ruby/gems/3.1.0/gems/faraday-net_http-1.0.1/lib/faraday/adapter/net_http.rb:113:in `block in perform_request' worker | /Users/dansingerman/.rbenv/versions/3.1.6/lib/ruby/gems/3.1.0/gems/net-protocol-0.2.2/lib/net/protocol.rb:535:in `call_block' worker | /Users/dansingerman/.rbenv/versions/3.1.6/lib/ruby/gems/3.1.0/gems/net-protocol-0.2.2/lib/net/protocol.rb:526:in `<<' worker | /Users/dansingerman/.rbenv/versions/3.1.6/lib/ruby/gems/3.1.0/gems/net-protocol-0.2.2/lib/net/protocol.rb ``` It looks like the [introduction of support for Faraday V1 ](crmne#173 this error, as the error handling relies on an `env` that is no longer passed. This should provide a fix for both V1 and V2. One thing to note, I had to manually construct the VCR cassettes, I'm not sure of a better way to test an intermittent error response. I have also only written the tests against `anthropic/claude-3-5-haiku-20241022` - it's possible other models with a different error format may still not be properly handled, but even in that case it won't error for the reasons fixed here. ## Type of change - [x] Bug fix - [ ] New feature - [ ] Breaking change - [ ] Documentation - [ ] Performance improvement ## Scope check - [x] I read the [Contributing Guide](https://github.com/crmne/ruby_llm/blob/main/CONTRIBUTING.md) - [x] This aligns with RubyLLM's focus on **LLM communication** - [x] This isn't application-specific logic that belongs in user code - [x] This benefits most users, not just my specific use case ## Quality check - [x] I ran `overcommit --install` and all hooks pass - [x] I tested my changes thoroughly - [x] I updated documentation if needed - [x] I didn't modify auto-generated files manually (`models.json`, `aliases.json`) ## API changes - [ ] Breaking change - [ ] New public methods/classes - [ ] Changed method signatures - [x] No API changes ## Related issues --------- Co-authored-by: Carmine Paolino <[email protected]>
## Summary - Added documentation for handling ActionCable message ordering issues - Includes a Stimulus controller solution for client-side reordering - Mentions async stack and AnyCable as alternatives ## Context This PR addresses the message ordering issues discussed in crmne#282. The documentation includes: 1. A Stimulus controller that reorders messages based on timestamps 2. Explanation of ActionCable's ordering limitations 3. Alternative approaches (async stack, AnyCable) ## Request for Review @ioquatix @palkan - I'd appreciate your review on the technical accuracy of this documentation, particularly: - Is my description of ActionCable's ordering behavior accurate? - Are the suggested solutions appropriate? - Any other approaches you'd recommend documenting? ## Test Plan - [x] Documentation builds correctly - [x] Code examples are syntactically correct - [ ] Technical accuracy verified by domain experts
Corrects ActionCable to Action Cable throughout the documentation to match Rails naming conventions.
- Add structured output with JSON schemas example - Include async support and model registry features - Expand document analysis to include CSV, JSON, XML, Markdown, and code files - Add smart configuration and automatic retry features - Show proper RubyLLM::Schema subclassing pattern for structured output - Ensure feature parity between README.md and docs/index.md
- Implement Perplexity as OpenAI-compatible provider - Add 5 Perplexity models with pricing information - Handle HTML error responses from Perplexity API - Add test coverage with proper skips for unsupported features - Update documentation and configuration
Promoted the available models documentation from guides subfolder to top-level navigation after ecosystem section for better visibility.
Implements Mistral AI as an OpenAI-compatible provider with minimal customizations: - Extends OpenAI provider for core functionality - Custom Chat module to handle system role mapping (uses 'system' instead of 'developer') - Custom render_payload to remove unsupported stream_options parameter - Custom Embeddings module that ignores dimensions parameter (not supported by Mistral) - Implements capabilities detection for vision (pixtral), embeddings, and chat models - Adds ministral-3b-latest for chat tests (cheapest option) - Adds pixtral-12b-latest for vision tests - Adds mistral-embed for embedding tests - Fetches and includes 63 Mistral models in models.json - Adds appropriate test skips for known model limitations
…ersations and custom dimensions
- Fix mistral models capabilities format (Hash -> Array) - Fix imagen models output modality (text -> image) - Add models_schema.json for validation Fixes crmne#315
…#318) ## What this does An `on_tool_call` callback was added in `1.4.0` via crmne#299, but it doens't work with a model using the rails integration via `acts_as_chat`. This PR wires up the missing method so it works with the integration. ## Type of change - [x] Bug fix - [ ] New feature - [ ] Breaking change - [ ] Documentation - [ ] Performance improvement ## Scope check - [x] I read the [Contributing Guide](https://github.com/crmne/ruby_llm/blob/main/CONTRIBUTING.md) - [x] This aligns with RubyLLM's focus on **LLM communication** - [x] This isn't application-specific logic that belongs in user code - [x] This benefits most users, not just my specific use case ## Quality check - [ ] ~I ran `overcommit --install` and all hooks pass~ - When I tried to commit the hooks generated a bunch of changes to `models.json` and `aliases.json` and broke a bunch of the specs, so I removed the hooks and ran specs and rubocop manually - [x] I tested my changes thoroughly - [x] I updated documentation if needed - No need - [x] I didn't modify auto-generated files manually (`models.json`, `aliases.json`) ## API changes - [ ] Breaking change - [ ] New public methods/classes - [ ] Changed method signatures - [x] No API changes ## Related issues <!-- Link issues: "Fixes crmne#123" or "Related to crmne#123" -->
What this does
Enable image-to-image generation with gemini-2.0-flash-preview-image-generation
Type of change
Scope check
Quality check
overcommit --install
and all hooks passmodels.json
,aliases.json
)API changes
Related issues
Screenshots
Here's what the test did.
Input

put this in a ring
Output

Second input
'change the background to blue'
Second output
