crmne · tpaulshippy · Jun 14, 2025 · Jun 14, 2025 · Jun 14, 2025 · Jun 14, 2025
diff --git a/docs/guides/chat.md b/docs/guides/chat.md
@@ -125,7 +125,7 @@ Modern AI models can often process more than just text. RubyLLM provides a unifi
 
 ### Working with Images
 
-Provide image paths or URLs to vision-capable models (like `gpt-4o`, `claude-3-opus`, `gemini-1.5-pro`).
+Provide image paths or URLs to vision-capable models (like `gpt-4o`, `claude-3-opus`, `gemini-1.5-pro`) for analysis and understanding. Some specialized models can also generate and edit images.
 
 ```ruby
 # Ensure you select a vision-capable model
@@ -146,6 +146,34 @@ puts response.content
 
 RubyLLM handles converting the image source into the format required by the specific provider API.
 
+### Image Generation with Chat
+
+While most vision models analyze images, some specialized models can generate and edit images through the chat interface. This approach is ideal for image editing workflows and iterative refinement:
+
+```ruby
+# Use a model capable of image generation
+chat = RubyLLM.chat(model: 'gemini-2.0-flash-preview-image-generation')
+
+# Edit an existing image
+response = chat.ask('make this look more futuristic', with: 'current_design.png')
+
+# Access generated images from attachments
+if response.content.attachments.any?
+  generated_image = response.content.attachments.first.image
+  puts "Generated image: #{generated_image.mime_type}"
+
+  # Save the generated image
+  generated_image.save('futuristic_design.png')
+end
+
+# Continue refining in the same conversation
+response = chat.ask('add some neon lighting effects')
+refined_image = response.content.attachments.first.image
+refined_image.save('futuristic_with_neon.png')
+```
+
+For simple text-to-image generation without existing images, see the [Image Generation Guide]({% link guides/image-generation.md %}).
+
 ### Working with Audio
 
 Provide audio file paths to audio-capable models (like `gpt-4o-audio-preview`).

diff --git a/docs/guides/image-generation.md b/docs/guides/image-generation.md
@@ -24,6 +24,8 @@ Turn your wildest imagination into reality! 🎨 Create professional artwork, pr
 After reading this guide, you will know:
 
 *   How to generate images from text prompts.
+*   How to edit and modify existing images.
+*   How to refine images through multi-turn conversations.
 *   How to select different image generation models.
 *   How to specify image sizes (for supported models).
 *   How to access and save generated image data (URL or Base64).
@@ -98,6 +100,75 @@ end
 
 Refer to the [Working with Models Guide]({% link guides/models.md %}) and the [Available Models Guide]({% link available-models.md %}) to find image models.
 
+## Image Editing & Modification
+
+Beyond generating images from text prompts, you can also edit and modify existing images using capable models like `gemini-2.0-flash-preview-image-generation`. This approach uses the chat interface rather than the `paint` method.
+
+### Basic Image Editing
+
+Use the chat interface with image generation models to edit existing images:
+
+```ruby
+# Start a chat with an image generation model
+chat = RubyLLM.chat(model: 'gemini-2.0-flash-preview-image-generation')
+
+# Edit an existing image
+response = chat.ask('put this in a ring', with: 'path/to/ruby.png')
+
+# Access the generated image from the response
+image = response.content.attachments.first.image
+
+# Check image properties
+puts "Generated image: #{image.mime_type}"
+puts "Base64 encoded: #{image.base64?}"
+puts "Data size: ~#{image.data.length} bytes" if image.base64?
+
+# Save the edited image
+saved_path = image.save('ruby_with_ring.png')
+puts "Saved to: #{saved_path}"
+```
+
+### Multi-turn Image Refinement
+
+One of the powerful features of using the chat interface is the ability to refine generated images through conversation:
+
+```ruby
+chat = RubyLLM.chat(model: 'gemini-2.0-flash-preview-image-generation')
+
+# First edit - add a ring to the ruby image
+chat.ask('put this in a ring', with: 'path/to/ruby.png')
+
+# Refine the result in the same conversation
+response = chat.ask('change the background to blue')
+
+# The model will modify the previously generated image
+refined_image = response.content.attachments.first.image
+refined_image.save('ruby_ring_blue_background.png')
+
+# Continue refining
+response = chat.ask('make the ring more ornate and golden')
+final_image = response.content.attachments.first.image
+final_image.save('ruby_ornate_golden_ring.png')
+```
+
+### Chat vs Paint Methods
+
+RubyLLM provides two approaches for image generation:
+
+- **`RubyLLM.paint`**: Best for simple text-to-image generation from scratch
+- **`RubyLLM.chat` with image models**: Best for image editing, refinement, and complex workflows
+
+Use the chat interface for:
+- Editing existing images
+- Multi-turn image refinement and iteration
+- Complex image generation workflows
+- When you need conversation context and memory
+
+Use the paint method for:
+- Simple text-to-image generation
+- One-off image creation
+- When you don't need conversation context
+
 ## Image Sizes
 
 Some models, like DALL-E 3, allow you to specify the desired image dimensions via the `size:` argument.
@@ -124,7 +195,7 @@ Not all models support size customization. If a size is specified for a model th
 
 ## Working with Generated Images
 
-The `RubyLLM::Image` object provides access to the generated image data and metadata.
+The `RubyLLM::Image` object provides access to the generated image data and metadata, whether the image was created using `RubyLLM.paint` or retrieved from a chat response.
 
 ### Accessing Image Data
 
@@ -138,10 +209,15 @@ The `RubyLLM::Image` object provides access to the generated image data and meta
 The `save` method works regardless of whether the image was delivered via URL or Base64. It fetches the data if necessary and writes it to the specified file path.
 
 ```ruby
-# Generate an image (works for DALL-E or Imagen)
+# Generate an image using paint method (works for DALL-E or Imagen)
 image = RubyLLM.paint("A steampunk mechanical owl")
 
-# Save the image to a local file
+# Or get an image from a chat response
+# chat = RubyLLM.chat(model: 'gemini-2.0-flash-preview-image-generation')
+# response = chat.ask("Create a steampunk mechanical owl")
+# image = response.content.attachments.first.image
+
+# Save the image to a local file (works the same for both methods)
 begin
   saved_path = image.save("steampunk_owl.png")
   puts "Image saved to #{saved_path}"
@@ -280,6 +356,6 @@ Image generation can take several seconds (typically 5-20 seconds depending on t
 
 ## Next Steps
 
-*   [Chatting with AI Models]({% link guides/chat.md %}): Learn about conversational AI.
+*   [Chatting with AI Models]({% link guides/chat.md %}): Learn about conversational AI and using chat for advanced image workflows.
 *   [Embeddings]({% link guides/embeddings.md %}): Explore text vector representations.
 *   [Error Handling]({% link guides/error-handling.md %}): Master handling API errors.
diff --git a/lib/ruby_llm/content.rb b/lib/ruby_llm/content.rb
@@ -19,6 +19,11 @@ def add_attachment(source, filename: nil)
       self
     end
 
+    def attach(attachment)
+      @attachments << attachment
+      self
+    end
+
     def format
       if @text && @attachments.empty?
         @text

diff --git a/lib/ruby_llm/image_attachment.rb b/lib/ruby_llm/image_attachment.rb
@@ -0,0 +1,19 @@
+# frozen_string_literal: true
+
+module RubyLLM
+  # A class representing a file attachment that is an image generated by an LLM.
+  class ImageAttachment < Attachment
+    attr_reader :image, :content
+
+    def initialize(data:, mime_type:, model_id:)
+      super(nil, filename: nil)
+      @image = Image.new(data:, mime_type:, model_id:)
+      @content = Base64.strict_decode64(data)
+      @mime_type = mime_type
+    end
+
+    def image?
+      true
+    end
+  end
+end
diff --git a/lib/ruby_llm/providers/gemini/capabilities.rb b/lib/ruby_llm/providers/gemini/capabilities.rb
@@ -280,6 +280,9 @@ def modalities_for(model_id)
           # Embedding output
           modalities[:output] << 'embeddings' if model_id.match?(/embedding|gemini-embedding/)
 
+          # Image output
+          modalities[:output] << 'image' if model_id.match?(/image-generation/)
+
           # Image output for imagen models
           modalities[:output] = ['image'] if model_id.match?(/imagen/)
 

diff --git a/lib/ruby_llm/providers/gemini/chat.rb b/lib/ruby_llm/providers/gemini/chat.rb
@@ -16,7 +16,8 @@ def render_payload(messages, tools:, temperature:, model:, stream: false, schema
           payload = {
             contents: format_messages(messages),
             generationConfig: {
-              temperature: temperature
+              temperature: temperature,
+              responseModalities: capabilities.modalities_for(model)[:output]
             }
           }
 

diff --git a/lib/ruby_llm/providers/gemini/streaming.rb b/lib/ruby_llm/providers/gemini/streaming.rb
@@ -34,7 +34,21 @@ def extract_content(data)
           return nil unless parts
 
           text_parts = parts.select { |p| p['text'] }
-          text_parts.map { |p| p['text'] }.join if text_parts.any?
+          image_parts = parts.select { |p| p['inlineData'] }
+
+          content = RubyLLM::Content.new(text_parts.map { |p| p['text'] }.join)
+
+          image_parts.map do |p|
+            content.attach(
+              ImageAttachment.new(
+                data: p['inlineData']['data'],
+                mime_type: p['inlineData']['mimeType'],
+                model_id: data['modelVersion']
+              )
+            )
+          end
+
+          content
         end
 
         def extract_input_tokens(data)

diff --git a/...basic_functionality_gemini_gemini-2_0-flash-preview-image-generation_can_paint_images.yml b/...basic_functionality_gemini_gemini-2_0-flash-preview-image-generation_can_paint_images.yml
diff --git a/..._gemini_gemini-2_0-flash-preview-image-generation_can_refine_images_in_a_conversation.yml b/..._gemini_gemini-2_0-flash-preview-image-generation_can_refine_images_in_a_conversation.yml
diff --git a/spec/ruby_llm/image_to_image_spec.rb b/spec/ruby_llm/image_to_image_spec.rb
@@ -0,0 +1,66 @@
+# frozen_string_literal: true
+
+require 'spec_helper'
+require 'tempfile'
+
+def save_and_verify_image(image)
+  # Create a temp file to save to
+  temp_file = Tempfile.new(['image', '.png'])
+  temp_path = temp_file.path
+  temp_file.close
+
+  begin
+    saved_path = image.save(temp_path)
+    expect(saved_path).to eq(temp_path)
+    expect(File.exist?(temp_path)).to be true
+
+    file_size = File.size(temp_path)
+    expect(file_size).to be > 1000 # Any real image should be larger than 1KB
+  ensure
+    # Clean up
+    File.delete(temp_path)
+  end
+end
+
+RSpec.describe RubyLLM::Image do
+  include_context 'with configured RubyLLM'
+
+  describe 'basic functionality' do
+    it 'gemini/gemini-2.0-flash-preview-image-generation can paint images' do
+      chat = RubyLLM.chat(model: 'gemini-2.0-flash-preview-image-generation')
+      response = chat.ask('put this in a ring', with: 'spec/fixtures/ruby.png')
+
+      expect(response.content.text).to include('ruby')
+
+      expect(response.content.attachments).to be_an(Array)
+      expect(response.content.attachments).not_to be_empty
+
+      image = response.content.attachments.first.image
+
+      expect(image.base64?).to be(true)
+      expect(image.data).to be_present
+      expect(image.mime_type).to include('image')
+
+      save_and_verify_image image
+    end
+
+    it 'gemini/gemini-2.0-flash-preview-image-generation can refine images in a conversation' do
+      chat = RubyLLM.chat(model: 'gemini-2.0-flash-preview-image-generation')
+      chat.ask('put this in a ring', with: 'spec/fixtures/ruby.png')
+      response = chat.ask('change the background to blue')
+
+      expect(response.content.text).to include('ruby')
+
+      expect(response.content.attachments).to be_an(Array)
+      expect(response.content.attachments).not_to be_empty
+
+      image = response.content.attachments.first.image
+
+      expect(image.base64?).to be(true)
+      expect(image.data).to be_present
+      expect(image.mime_type).to include('image')
+
+      save_and_verify_image image
+    end
+  end
+end