Skip to content

Comments

feat: auto-detect ImageContext format for image-to-image generation#342

Merged
johnnygreco merged 6 commits intomainfrom
nm/image-generation-follow-up
Feb 20, 2026
Merged

feat: auto-detect ImageContext format for image-to-image generation#342
johnnygreco merged 6 commits intomainfrom
nm/image-generation-follow-up

Conversation

@nabinchha
Copy link
Contributor

@nabinchha nabinchha commented Feb 19, 2026

Closes #341

📋 Summary

Enables chaining image generation columns so that a generated image from one ImageColumnConfig can be passed as ImageContext to a downstream column — without requiring users to specify data_type or image_format. Previously, users had to manually set data_type=ModalityDataType.BASE64 and image_format=ImageFormat.PNG, and it still broke in create mode because file paths couldn't be resolved.

🔄 Changes

✨ Added

  • Auto-detection in ImageContext.get_contexts(): when data_type is omitted, values are resolved as URLs (pass-through), file paths (loaded to base64 from disk), or raw base64 (format detected from magic bytes)
  • Shared _build_multi_modal_context() method on ColumnGeneratorWithModel base class, passing base_path so generated image file paths are resolved to base64 before being sent to model endpoints
  • ImageFormat enum moved to image_helpers.py as its canonical location (breaks circular import between models.py and image_helpers.py)
  • New tests for auto-detection (URL, base64, file path resolution) and shared context builder
  • Shared sample_png_bytes and minimal_png_base64 test fixtures in config package conftest

🔧 Changed

  • ImageContext.data_type is now optional (None by default) — existing explicit usage continues to work unchanged
  • ImageContext.get_contexts() accepts a new base_path keyword argument for file path resolution
  • Simplified ImageCellGenerator and ColumnGeneratorWithModelChatCompletion to use shared _build_multi_modal_context()

📚 Docs

  • Rewrote tutorial 6 to demonstrate image-to-image editing by chaining ImageColumnConfig columns (text → image → edited image), replacing the old HuggingFace dataset loading approach
  • Simplified tutorial 4's ImageContext usage to use auto-detection instead of explicit parameters
  • Updated tutorial README descriptions
  • Regenerated Colab notebooks

🔍 Attention Areas

⚠️ Reviewers: Please pay special attention to the following:

  • models.py - Core auto-detection logic in _auto_resolve_context_value() and _format_base64_context()
  • base.py - Shared _build_multi_modal_context() method on the generator base class
  • image_helpers.py - ImageFormat enum moved here from models.py

closes #341

@nabinchha nabinchha requested a review from a team as a code owner February 19, 2026 21:02
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 19, 2026

Greptile Summary

Enables seamless chaining of image generation columns by auto-detecting ImageContext format, eliminating the need for users to specify data_type and image_format parameters.

Key improvements:

  • ImageContext.data_type is now optional and defaults to auto-detection mode
  • Auto-detection resolves file paths (from generated images in create mode), URLs, and base64 data automatically
  • Shared _build_multi_modal_context() method on base class passes base_path for proper file path resolution
  • ImageFormat enum moved from models.py to image_helpers.py to break circular import
  • Tutorial 6 rewritten to demonstrate image-to-image chaining without external datasets
  • Tutorial 4 simplified by removing explicit ImageContext parameters

Testing:

  • Comprehensive test coverage for auto-detection (URL pass-through, base64 format detection, file path resolution)
  • Integration tests verify image-to-image workflow with file path resolution

Minor issue:

  • is_image_url() in image_helpers.py requires URLs to have file extensions, which may reject valid CDN or presigned URLs without extensions

Confidence Score: 4/5

  • This PR is safe to merge with one minor consideration about URL validation
  • The implementation is well-designed with comprehensive test coverage and backward compatibility. The auto-detection logic is sound with proper fallback behavior. The only concern is that is_image_url() may reject valid image URLs without file extensions (CDN URLs, presigned URLs), though this is a minor edge case that can be addressed in a follow-up if needed.
  • Pay attention to packages/data-designer-config/src/data_designer/config/utils/image_helpers.py:209 - URL detection may need relaxation for extension-less URLs

Important Files Changed

Filename Overview
packages/data-designer-config/src/data_designer/config/models.py Adds auto-detection logic for ImageContext with _auto_resolve_context_value() and _format_base64_context() methods, makes data_type optional
packages/data-designer-config/src/data_designer/config/utils/image_helpers.py Moves ImageFormat enum here from models.py, breaking circular import dependency
packages/data-designer-engine/src/data_designer/engine/column_generators/generators/base.py Adds shared _build_multi_modal_context() method that passes base_path to context resolution for file path support
packages/data-designer-config/tests/config/test_models.py Adds comprehensive tests for auto-detection: URL pass-through, base64 format detection, file path resolution with base_path
packages/data-designer-engine/tests/engine/column_generators/generators/test_image.py Adds tests for shared context builder and auto-detection in image-to-image generation workflow
packages/data-designer-config/src/data_designer/config/init.py Updates ImageFormat import path from models.py to utils.image_helpers

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    Start[ImageContext.get_contexts called with record] --> CheckDataType{data_type set?}
    
    CheckDataType -->|Yes - Explicit mode| ExplicitURL{data_type == URL?}
    CheckDataType -->|No - Auto-detect| AutoDetect[_auto_resolve_context_value]
    
    ExplicitURL -->|Yes| ReturnURL[Return URL as-is]
    ExplicitURL -->|No - BASE64| FormatBase64[Format with image_format]
    
    AutoDetect --> CheckPath{base_path set AND is_image_path?}
    CheckPath -->|Yes| TryLoad[load_image_path_to_base64]
    CheckPath -->|No| CheckURL{is_image_url?}
    
    TryLoad --> LoadSuccess{File loaded?}
    LoadSuccess -->|Yes| DetectFormat[_format_base64_context]
    LoadSuccess -->|No| CheckURL
    
    CheckURL -->|Yes| ReturnURL
    CheckURL -->|No| AssumeBase64[Assume base64 data]
    
    AssumeBase64 --> DetectFormat
    DetectFormat --> DecodeBytes[decode_base64_image]
    DecodeBytes --> DetectImageFormat[detect_image_format from bytes]
    DetectImageFormat --> BuildDataURI[Build data URI with detected format]
    
    FormatBase64 --> BuildDataURI
    ReturnURL --> End[Return context list]
    BuildDataURI --> End
Loading

Last reviewed commit: 179252e

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

19 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 20, 2026

Additional Comments (1)

packages/data-designer-config/src/data_designer/config/utils/image_helpers.py
is_image_url() requires URLs to contain image file extensions, but many valid image URLs don't have extensions (e.g., CDN URLs, presigned S3 URLs with query params, or URLs that return images dynamically). When auto-detection encounters such URLs, they'll fall through to base64 decoding and fail.

Consider relaxing this check to accept any http:// or https:// URL, or add a fallback that attempts to validate as a URL before trying base64 decode.

Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/data-designer-config/src/data_designer/config/utils/image_helpers.py
Line: 209

Comment:
`is_image_url()` requires URLs to contain image file extensions, but many valid image URLs don't have extensions (e.g., CDN URLs, presigned S3 URLs with query params, or URLs that return images dynamically). When auto-detection encounters such URLs, they'll fall through to base64 decoding and fail.

Consider relaxing this check to accept any `http://` or `https://` URL, or add a fallback that attempts to validate as a URL before trying base64 decode.

How can I resolve this? If you propose a fix, please make it concise.

@johnnygreco johnnygreco merged commit 8f7a720 into main Feb 20, 2026
48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support using generated images as ImageContext for downstream columns

2 participants