diff --git a/documentation/OPTIONS.es.md b/documentation/OPTIONS.es.md index 6ce17f00d..804a649ef 100644 --- a/documentation/OPTIONS.es.md +++ b/documentation/OPTIONS.es.md @@ -646,6 +646,79 @@ Muchas configuraciones se establecen a través del [dataloader config](DATALOADE - **Qué**: Desactiva el cálculo de pérdida de evaluación durante la validación. - **Por qué**: Cuando se configura un dataset de eval, la pérdida se calcula automáticamente. Si la evaluación CLIP también está habilitada, ambas se ejecutarán. Este flag te permite desactivar selectivamente la pérdida de eval manteniendo la evaluación CLIP habilitada. +### `--validation_using_datasets` + +- **Qué**: Usa imágenes de datasets de entrenamiento para validación en lugar de generación pura de texto a imagen. +- **Por qué**: Habilita el modo de validación imagen-a-imagen (img2img) donde el modelo des-ruido parcialmente imágenes de entrenamiento en lugar de generar desde ruido puro. Útil para: + - Probar modelos de edición/inpainting que requieren imágenes de entrada + - Evaluar qué tan bien el modelo preserva la estructura de imagen + - Modelos que soportan flujos duales texto-a-imagen E imagen-a-imagen (ej., Flux2, LTXVideo2) +- **Notas**: + - Requiere que el modelo tenga un pipeline `IMG2IMG` registrado + - Puede combinarse con `--eval_dataset_id` para obtener imágenes de un dataset específico + - La fuerza de des-ruido se controla con los ajustes normales de timestep de validación + +### `--eval_dataset_id` + +- **Qué**: ID de dataset específico a usar para obtener imágenes de evaluación/validación. +- **Por qué**: Al usar `--validation_using_datasets` o validación basada en conditioning, controla qué dataset provee las imágenes de entrada: + - Sin esta opción, las imágenes se seleccionan aleatoriamente de todos los datasets de entrenamiento + - Con esta opción, solo se usa el dataset especificado para entradas de validación +- **Notas**: + - El ID de dataset debe coincidir con un dataset configurado en tu config de dataloader + - Útil para mantener evaluación consistente usando un dataset de eval dedicado + - Para modelos de conditioning, los datos de conditioning del dataset (si existen) también se usarán + +--- + +## Entendiendo Modos de Conditioning y Validación + +SimpleTuner soporta tres paradigmas principales para modelos que usan entradas de conditioning (imágenes de referencia, señales de control, etc.): + +### 1. Modelos que REQUIEREN Conditioning + +Algunos modelos no pueden funcionar sin entradas de conditioning: + +- **Flux Kontext**: Siempre necesita imágenes de referencia para entrenamiento estilo edición +- **Entrenamiento ControlNet**: Requiere imágenes de señal de control + +Para estos modelos, un dataset de conditioning es obligatorio. La WebUI mostrará opciones de conditioning como requeridas, y el entrenamiento fallará sin ellas. + +### 2. Modelos que SOPORTAN Conditioning Opcional + +Algunos modelos pueden operar en modos texto-a-imagen E imagen-a-imagen: + +- **Flux2**: Soporta entrenamiento dual T2I/I2I con imágenes de referencia opcionales +- **LTXVideo2**: Soporta T2V e I2V (imagen-a-video) con conditioning de primer frame opcional +- **LongCat-Video**: Soporta conditioning de frames opcional + +Para estos modelos, PUEDES agregar datasets de conditioning pero no es obligatorio. La WebUI mostrará opciones de conditioning como opcionales. + +### 3. Modos de Validación + +| Modo | Flag | Comportamiento | +|------|------|----------------| +| **Texto-a-Imagen** | (por defecto) | Genera solo desde prompts de texto | +| **Basado en Dataset** | `--validation_using_datasets` | Des-ruido parcial de imágenes de datasets (img2img) | +| **Basado en Conditioning** | (auto cuando se configura conditioning) | Usa entradas de conditioning durante validación | + +**Combinando modos**: Cuando un modelo soporta conditioning Y `--validation_using_datasets` está habilitado: +- El sistema de validación obtiene imágenes de datasets +- Si esos datasets tienen datos de conditioning, se usan automáticamente +- Usa `--eval_dataset_id` para controlar qué dataset provee entradas + +### Tipos de Datos de Conditioning + +Diferentes modelos esperan diferentes datos de conditioning: + +| Tipo | Modelos | Configuración de Dataset | +|------|---------|-------------------------| +| `conditioning` | ControlNet, Control | `type: conditioning` en config de dataset | +| `image` | Flux Kontext | `type: image` (dataset de imagen estándar) | +| `latents` | Flux, Flux2 | Conditioning se codifica con VAE automáticamente | + +--- + ### `--caption_strategy` - **Qué**: Estrategia para derivar captions de imagen. **Opciones**: `textfile`, `filename`, `parquet`, `instanceprompt` diff --git a/documentation/OPTIONS.hi.md b/documentation/OPTIONS.hi.md index dc7e17efb..a88d7568e 100644 --- a/documentation/OPTIONS.hi.md +++ b/documentation/OPTIONS.hi.md @@ -644,6 +644,79 @@ Alternative attention mechanisms समर्थित हैं, जिनक - **What**: validation के दौरान evaluation loss गणना disable करें। - **Why**: जब eval dataset कॉन्फ़िगर हो, loss स्वतः गणना होता है। यदि CLIP evaluation सक्षम है, तो दोनों चलेंगे। यह flag eval loss को disable करने देता है जबकि CLIP evaluation चालू रहता है। +### `--validation_using_datasets` + +- **What**: pure text-to-image generation के बजाय training datasets से images validation के लिए use करें। +- **Why**: image-to-image (img2img) validation mode enable करता है जहाँ model pure noise से generate करने के बजाय training images को partially denoise करता है। उपयोगी है: + - Edit/inpainting models test करने के लिए जिन्हें input images चाहिए + - Model image structure को कितना preserve करता है evaluate करने के लिए + - Dual text-to-image AND image-to-image workflows support करने वाले models के लिए (जैसे, Flux2, LTXVideo2) +- **Notes**: + - Model में `IMG2IMG` pipeline registered होना चाहिए + - `--eval_dataset_id` के साथ combine कर सकते हैं specific dataset से images लेने के लिए + - Denoising strength normal validation timestep settings से control होती है + +### `--eval_dataset_id` + +- **What**: Evaluation/validation image sourcing के लिए specific dataset ID। +- **Why**: `--validation_using_datasets` या conditioning-based validation use करते समय, यह control करता है कौन सा dataset input images provide करे: + - इस option के बिना, images सभी training datasets से randomly select होती हैं + - इस option के साथ, केवल specified dataset validation inputs के लिए use होता है +- **Notes**: + - Dataset ID आपके dataloader config में configured dataset से match होना चाहिए + - Dedicated eval dataset use करके consistent evaluation maintain करने के लिए useful + - Conditioning models के लिए, dataset का conditioning data (यदि हो) भी use होगा + +--- + +## Conditioning और Validation Modes को समझना + +SimpleTuner conditioning inputs (reference images, control signals, आदि) use करने वाले models के लिए तीन मुख्य paradigms support करता है: + +### 1. Models जो Conditioning REQUIRE करते हैं + +कुछ models conditioning inputs के बिना function नहीं कर सकते: + +- **Flux Kontext**: Edit-style training के लिए हमेशा reference images चाहिए +- **ControlNet training**: Control signal images require करता है + +इन models के लिए, conditioning dataset mandatory है। WebUI conditioning options को required दिखाएगी, और training इनके बिना fail होगी। + +### 2. Models जो Optional Conditioning SUPPORT करते हैं + +कुछ models text-to-image AND image-to-image दोनों modes में operate कर सकते हैं: + +- **Flux2**: Optional reference images के साथ dual T2I/I2I training support करता है +- **LTXVideo2**: Optional first-frame conditioning के साथ T2V और I2V (image-to-video) दोनों support करता है +- **LongCat-Video**: Optional frame conditioning support करता है + +इन models के लिए, आप conditioning datasets ADD कर सकते हैं पर जरूरी नहीं। WebUI conditioning options को optional दिखाएगी। + +### 3. Validation Modes + +| Mode | Flag | Behavior | +|------|------|----------| +| **Text-to-Image** | (default) | केवल text prompts से generate | +| **Dataset-based** | `--validation_using_datasets` | Datasets से images partially denoise (img2img) | +| **Conditioning-based** | (auto जब conditioning configured हो) | Validation के दौरान conditioning inputs use | + +**Modes combine करना**: जब model conditioning support करता है AND `--validation_using_datasets` enabled है: +- Validation system datasets से images लेता है +- यदि उन datasets में conditioning data है, तो automatically use होता है +- `--eval_dataset_id` use करें control करने के लिए कौन सा dataset inputs provide करे + +### Conditioning Data Types + +Different models different conditioning data expect करते हैं: + +| Type | Models | Dataset Setting | +|------|--------|-----------------| +| `conditioning` | ControlNet, Control | Dataset config में `type: conditioning` | +| `image` | Flux Kontext | `type: image` (standard image dataset) | +| `latents` | Flux, Flux2 | Conditioning automatically VAE-encoded होता है | + +--- + ### `--caption_strategy` - **What**: image captions derive करने की रणनीति। **Choices**: `textfile`, `filename`, `parquet`, `instanceprompt` diff --git a/documentation/OPTIONS.ja.md b/documentation/OPTIONS.ja.md index e30f8169e..37a832ee7 100644 --- a/documentation/OPTIONS.ja.md +++ b/documentation/OPTIONS.ja.md @@ -646,6 +646,79 @@ Accelerate の既定値を使いたい項目は省略してください(例: - **内容**: 検証中の評価損失計算を無効化します。 - **理由**: 評価用データセットを設定すると損失は自動計算されます。CLIP 評価も有効な場合は両方実行されます。このフラグで CLIP を残したまま評価損失だけ無効化できます。 +### `--validation_using_datasets` + +- **内容**: 純粋なテキストから画像生成の代わりに、学習データセットから画像を検証に使用します。 +- **理由**: 画像から画像 (img2img) 検証モードを有効化し、モデルが純粋なノイズから生成するのではなく学習画像を部分的にデノイズします。以下の場合に便利です: + - 入力画像が必要な編集/インペインティングモデルのテスト + - モデルが画像構造をどの程度保持するかの評価 + - テキストから画像と画像から画像の両方のワークフローをサポートするモデル(例:Flux2、LTXVideo2) +- **注意**: + - モデルに `IMG2IMG` パイプラインが登録されている必要があります + - `--eval_dataset_id` と組み合わせて特定のデータセットから画像を取得できます + - デノイズ強度は通常の検証タイムステップ設定で制御されます + +### `--eval_dataset_id` + +- **内容**: 評価/検証画像ソーシング用の特定のデータセットID。 +- **理由**: `--validation_using_datasets` またはコンディショニングベースの検証を使用する場合、どのデータセットが入力画像を提供するかを制御します: + - このオプションなしでは、すべての学習データセットからランダムに画像が選択されます + - このオプションありでは、指定されたデータセットのみが検証入力に使用されます +- **注意**: + - データセットIDはデータローダー設定の設定済みデータセットと一致する必要があります + - 専用の評価データセットを使用して一貫した評価を維持するのに便利です + - コンディショニングモデルの場合、データセットのコンディショニングデータ(存在する場合)も使用されます + +--- + +## コンディショニングと検証モードの理解 + +SimpleTunerは、コンディショニング入力(参照画像、制御信号など)を使用するモデル向けに3つの主要なパラダイムをサポートしています: + +### 1. コンディショニングを必要とするモデル + +一部のモデルはコンディショニング入力なしでは機能しません: + +- **Flux Kontext**: 編集スタイルの学習には常に参照画像が必要 +- **ControlNet学習**: 制御信号画像が必要 + +これらのモデルではコンディショニングデータセットが必須です。WebUIはコンディショニングオプションを必須として表示し、なければ学習は失敗します。 + +### 2. オプションのコンディショニングをサポートするモデル + +一部のモデルはテキストから画像と画像から画像の両方のモードで動作できます: + +- **Flux2**: オプションの参照画像でデュアルT2I/I2I学習をサポート +- **LTXVideo2**: オプションの最初のフレームコンディショニングでT2VとI2V(画像から動画)の両方をサポート +- **LongCat-Video**: オプションのフレームコンディショニングをサポート + +これらのモデルでは、コンディショニングデータセットを追加できますが必須ではありません。WebUIはコンディショニングオプションをオプションとして表示します。 + +### 3. 検証モード + +| モード | フラグ | 動作 | +|--------|--------|------| +| **テキストから画像** | (デフォルト) | テキストプロンプトのみから生成 | +| **データセットベース** | `--validation_using_datasets` | データセットから画像を部分的にデノイズ (img2img) | +| **コンディショニングベース** | (コンディショニング設定時に自動) | 検証中にコンディショニング入力を使用 | + +**モードの組み合わせ**: モデルがコンディショニングをサポートし、かつ `--validation_using_datasets` が有効な場合: +- 検証システムはデータセットから画像を取得します +- それらのデータセットにコンディショニングデータがあれば、自動的に使用されます +- `--eval_dataset_id` を使用してどのデータセットが入力を提供するかを制御できます + +### コンディショニングデータタイプ + +異なるモデルは異なるコンディショニングデータを期待します: + +| タイプ | モデル | データセット設定 | +|--------|--------|-----------------| +| `conditioning` | ControlNet, Control | データセット設定で `type: conditioning` | +| `image` | Flux Kontext | `type: image` (標準画像データセット) | +| `latents` | Flux, Flux2 | コンディショニングは自動的にVAEエンコードされます | + +--- + ### `--caption_strategy` - **内容**: 画像キャプションを導出する戦略。**選択肢**: `textfile`, `filename`, `parquet`, `instanceprompt` diff --git a/documentation/OPTIONS.md b/documentation/OPTIONS.md index dfbceef28..4e2a0e05b 100644 --- a/documentation/OPTIONS.md +++ b/documentation/OPTIONS.md @@ -644,6 +644,79 @@ A lot of settings are instead set through the [dataloader config](DATALOADER.md) - **What**: Disable evaluation loss calculation during validation. - **Why**: When an eval dataset is configured, loss will automatically be calculated. If CLIP evaluation is also enabled, they will both run. This flag will allow you to selectively disable eval loss while keeping CLIP evaluation enabled. +### `--validation_using_datasets` + +- **What**: Use images from training datasets for validation instead of pure text-to-image generation. +- **Why**: Enables image-to-image (img2img) validation mode where the model partially denoises training images rather than generating from pure noise. This is useful for: + - Testing edit/inpainting models that require input images + - Evaluating how well the model preserves image structure + - Models that support dual text-to-image AND image-to-image workflows (e.g., Flux2, LTXVideo2) +- **Notes**: + - Requires the model to have an `IMG2IMG` pipeline registered (most dual-mode models use the same pipeline for both) + - Can be combined with `--eval_dataset_id` to source images from a specific dataset + - The denoising strength is controlled by the normal validation timestep settings + +### `--eval_dataset_id` + +- **What**: Specific dataset ID to use for evaluation/validation image sourcing. +- **Why**: When using `--validation_using_datasets` or conditioning-based validation, this controls which dataset provides the input images: + - Without this option, images are randomly selected from all training datasets + - With this option, only the specified dataset is used for validation inputs +- **Notes**: + - The dataset ID must match a configured dataset in your dataloader config + - Useful for keeping evaluation consistent by using a dedicated eval dataset + - For conditioning models, the dataset's conditioning data (if any) will also be used + +--- + +## Understanding Conditioning and Validation Modes + +SimpleTuner supports three main paradigms for models that use conditioning inputs (reference images, control signals, etc.): + +### 1. Models that REQUIRE Conditioning + +Some models cannot function without conditioning inputs: + +- **Flux Kontext**: Always needs reference images for edit-style training +- **ControlNet training**: Requires control signal images + +For these models, a conditioning dataset is mandatory. The WebUI will show conditioning options as required, and training will fail without them. + +### 2. Models that SUPPORT Optional Conditioning + +Some models can operate in both text-to-image AND image-to-image modes: + +- **Flux2**: Supports dual T2I/I2I training with optional reference images +- **LTXVideo2**: Supports both T2V and I2V (image-to-video) with optional first-frame conditioning +- **LongCat-Video**: Supports optional frame conditioning + +For these models, you CAN add conditioning datasets but don't have to. The WebUI will show conditioning options as optional. + +### 3. Validation Modes + +| Mode | Flag | Behavior | +|------|------|----------| +| **Text-to-Image** | (default) | Generate from text prompts only | +| **Dataset-based** | `--validation_using_datasets` | Partially denoise images from datasets (img2img) | +| **Conditioning-based** | (auto when conditioning configured) | Use conditioning inputs during validation | + +**Combining modes**: When a model supports conditioning AND `--validation_using_datasets` is enabled: +- The validation system sources images from datasets +- If those datasets have conditioning data, it's used automatically +- Use `--eval_dataset_id` to control which dataset provides inputs + +### Conditioning Data Types + +Different models expect different conditioning data: + +| Type | Models | Dataset Setting | +|------|--------|-----------------| +| `conditioning` | ControlNet, Control | `type: conditioning` in dataset config | +| `image` | Flux Kontext | `type: image` (standard image dataset) | +| `latents` | Flux, Flux2 | Conditioning is VAE-encoded automatically | + +--- + ### `--caption_strategy` - **What**: Strategy for deriving image captions. **Choices**: `textfile`, `filename`, `parquet`, `instanceprompt` diff --git a/documentation/OPTIONS.pt-BR.md b/documentation/OPTIONS.pt-BR.md index 4593a81ea..c784f89ba 100644 --- a/documentation/OPTIONS.pt-BR.md +++ b/documentation/OPTIONS.pt-BR.md @@ -642,6 +642,79 @@ Muitas configuracoes sao definidas no [dataloader config](DATALOADER.md), mas es - **O que**: Desabilita o calculo de eval loss durante validacao. - **Por que**: Quando um dataset de avaliacao e configurado, a loss sera calculada automaticamente. Se a avaliacao CLIP estiver habilitada, ambos rodam. Este flag permite desabilitar a eval loss mantendo CLIP. +### `--validation_using_datasets` + +- **O que**: Usa imagens dos datasets de treinamento para validacao ao inves de geracao pura texto-para-imagem. +- **Por que**: Habilita modo de validacao imagem-para-imagem (img2img) onde o modelo faz denoise parcial das imagens de treinamento ao inves de gerar de ruido puro. Util para: + - Testar modelos de edicao/inpainting que requerem imagens de entrada + - Avaliar quao bem o modelo preserva a estrutura da imagem + - Modelos que suportam workflows duais texto-para-imagem E imagem-para-imagem (ex., Flux2, LTXVideo2) +- **Notas**: + - Requer que o modelo tenha um pipeline `IMG2IMG` registrado + - Pode ser combinado com `--eval_dataset_id` para obter imagens de um dataset especifico + - A intensidade do denoise e controlada pelas configuracoes normais de timestep de validacao + +### `--eval_dataset_id` + +- **O que**: ID especifico do dataset para usar no sourcing de imagens de avaliacao/validacao. +- **Por que**: Ao usar `--validation_using_datasets` ou validacao baseada em conditioning, controla qual dataset fornece as imagens de entrada: + - Sem esta opcao, imagens sao selecionadas aleatoriamente de todos os datasets de treinamento + - Com esta opcao, apenas o dataset especificado e usado para entradas de validacao +- **Notas**: + - O ID do dataset deve corresponder a um dataset configurado no seu config do dataloader + - Util para manter avaliacao consistente usando um dataset de eval dedicado + - Para modelos de conditioning, os dados de conditioning do dataset (se houver) tambem serao usados + +--- + +## Entendendo Modos de Conditioning e Validacao + +SimpleTuner suporta tres paradigmas principais para modelos que usam entradas de conditioning (imagens de referencia, sinais de controle, etc.): + +### 1. Modelos que REQUEREM Conditioning + +Alguns modelos nao funcionam sem entradas de conditioning: + +- **Flux Kontext**: Sempre precisa de imagens de referencia para treinamento estilo edicao +- **Treinamento ControlNet**: Requer imagens de sinal de controle + +Para estes modelos, um dataset de conditioning e obrigatorio. A WebUI mostrara opcoes de conditioning como obrigatorias, e o treinamento falhara sem elas. + +### 2. Modelos que SUPORTAM Conditioning Opcional + +Alguns modelos podem operar em modos texto-para-imagem E imagem-para-imagem: + +- **Flux2**: Suporta treinamento dual T2I/I2I com imagens de referencia opcionais +- **LTXVideo2**: Suporta T2V e I2V (imagem-para-video) com conditioning de primeiro frame opcional +- **LongCat-Video**: Suporta conditioning de frames opcional + +Para estes modelos, voce PODE adicionar datasets de conditioning mas nao e obrigatorio. A WebUI mostrara opcoes de conditioning como opcionais. + +### 3. Modos de Validacao + +| Modo | Flag | Comportamento | +|------|------|---------------| +| **Texto-para-Imagem** | (padrao) | Gera apenas de prompts de texto | +| **Baseado em Dataset** | `--validation_using_datasets` | Denoise parcial de imagens de datasets (img2img) | +| **Baseado em Conditioning** | (auto quando conditioning configurado) | Usa entradas de conditioning durante validacao | + +**Combinando modos**: Quando um modelo suporta conditioning E `--validation_using_datasets` esta habilitado: +- O sistema de validacao obtem imagens de datasets +- Se esses datasets tem dados de conditioning, sao usados automaticamente +- Use `--eval_dataset_id` para controlar qual dataset fornece entradas + +### Tipos de Dados de Conditioning + +Diferentes modelos esperam diferentes dados de conditioning: + +| Tipo | Modelos | Configuracao do Dataset | +|------|---------|------------------------| +| `conditioning` | ControlNet, Control | `type: conditioning` no config do dataset | +| `image` | Flux Kontext | `type: image` (dataset de imagem padrao) | +| `latents` | Flux, Flux2 | Conditioning e VAE-encoded automaticamente | + +--- + ### `--caption_strategy` - **O que**: Estrategia para derivar captions. **Opcoes**: `textfile`, `filename`, `parquet`, `instanceprompt` diff --git a/documentation/OPTIONS.zh.md b/documentation/OPTIONS.zh.md index 167502eda..92a7e9cbf 100644 --- a/documentation/OPTIONS.zh.md +++ b/documentation/OPTIONS.zh.md @@ -648,6 +648,79 @@ TRAINING_DYNAMO_BACKEND=inductor - **内容**:在验证期间禁用评估损失计算。 - **原因**:配置评估数据集后会自动计算损失;若启用 CLIP 评估,两者都会运行。此标志可在保留 CLIP 的同时关闭评估损失。 +### `--validation_using_datasets` + +- **内容**:使用训练数据集中的图像进行验证,而非纯文本到图像生成。 +- **原因**:启用图像到图像(img2img)验证模式,模型部分去噪训练图像而非从纯噪声生成。适用于: + - 测试需要输入图像的编辑/修复模型 + - 评估模型对图像结构的保留程度 + - 支持双重文本到图像和图像到图像工作流的模型(如 Flux2、LTXVideo2) +- **注意**: + - 需要模型注册 `IMG2IMG` 管线 + - 可与 `--eval_dataset_id` 结合从特定数据集获取图像 + - 去噪强度由正常的验证时间步设置控制 + +### `--eval_dataset_id` + +- **内容**:用于评估/验证图像来源的特定数据集 ID。 +- **原因**:使用 `--validation_using_datasets` 或基于条件的验证时,控制哪个数据集提供输入图像: + - 无此选项时,从所有训练数据集随机选择图像 + - 有此选项时,仅使用指定数据集进行验证输入 +- **注意**: + - 数据集 ID 必须与数据加载器配置中的已配置数据集匹配 + - 适用于使用专用评估数据集保持评估一致性 + - 对于条件模型,数据集的条件数据(如有)也会被使用 + +--- + +## 理解条件化和验证模式 + +SimpleTuner 为使用条件输入(参考图像、控制信号等)的模型支持三种主要范式: + +### 1. 需要条件化的模型 + +部分模型没有条件输入无法运行: + +- **Flux Kontext**:编辑式训练始终需要参考图像 +- **ControlNet 训练**:需要控制信号图像 + +对于这些模型,条件数据集是强制性的。WebUI 将条件选项显示为必需,没有它们训练将失败。 + +### 2. 支持可选条件化的模型 + +部分模型可在文本到图像和图像到图像两种模式下运行: + +- **Flux2**:支持可选参考图像的双重 T2I/I2I 训练 +- **LTXVideo2**:支持可选首帧条件的 T2V 和 I2V(图像到视频) +- **LongCat-Video**:支持可选帧条件 + +对于这些模型,你可以添加条件数据集但不是必需的。WebUI 将条件选项显示为可选。 + +### 3. 验证模式 + +| 模式 | 标志 | 行为 | +|------|------|------| +| **文本到图像** | (默认) | 仅从文本提示生成 | +| **基于数据集** | `--validation_using_datasets` | 对数据集中的图像部分去噪(img2img) | +| **基于条件** | (配置条件时自动) | 验证时使用条件输入 | + +**组合模式**:当模型支持条件化且 `--validation_using_datasets` 已启用时: +- 验证系统从数据集获取图像 +- 如果这些数据集有条件数据,会自动使用 +- 使用 `--eval_dataset_id` 控制哪个数据集提供输入 + +### 条件数据类型 + +不同模型期望不同的条件数据: + +| 类型 | 模型 | 数据集设置 | +|------|------|-----------| +| `conditioning` | ControlNet, Control | 数据集配置中 `type: conditioning` | +| `image` | Flux Kontext | `type: image`(标准图像数据集) | +| `latents` | Flux, Flux2 | 条件自动 VAE 编码 | + +--- + ### `--caption_strategy` - **内容**:派生图像字幕的策略。**选项**:`textfile`, `filename`, `parquet`, `instanceprompt` diff --git a/simpletuner/helpers/models/common.py b/simpletuner/helpers/models/common.py index 74f447cd4..d97ef4a69 100644 --- a/simpletuner/helpers/models/common.py +++ b/simpletuner/helpers/models/common.py @@ -993,11 +993,55 @@ def model_predict(self, prepared_batch, custom_timesteps: list = None): """ raise NotImplementedError("model_predict must be implemented in the child class.") + # ------------------------------------------------------------------------- + # Conditioning Capability Methods + # ------------------------------------------------------------------------- + # These methods define how a model handles conditioning inputs (reference + # images, control signals, etc.). The WebUI and training pipeline use these + # to determine what UI elements to show and how to process data. + # + # There are two categories: + # - "requires_*" methods: Model CANNOT function without this capability. + # Training will fail if the requirement is not met. + # - "supports_*" methods: Model CAN use this capability but doesn't require + # it. The WebUI will show the option, but it's optional. + # + # Example model patterns: + # - Flux Kontext: requires_conditioning_dataset() = True (always needs refs) + # - Flux2: supports_conditioning_dataset() = True (optional dual T2I/I2I) + # - LTXVideo2: supports_conditioning_dataset() = True (optional I2V) + # - SD with ControlNet: requires_conditioning_dataset() = True (via config) + # ------------------------------------------------------------------------- + def requires_conditioning_dataset(self) -> bool: + """ + Returns True when the model REQUIRES a conditioning dataset to train. + + Override this to return True when: + - The model architecture inherently needs conditioning inputs (e.g., edit models) + - A config option like controlnet/control is enabled + + When True, the dataloader will fail if no conditioning dataset is configured. + The WebUI will mark conditioning as mandatory. + """ if self.config.controlnet or self.config.control: return True return False + def supports_conditioning_dataset(self) -> bool: + """ + Returns True when the model SUPPORTS optional conditioning datasets. + + Override this to return True when: + - The model can operate in both text-to-image AND image-to-image modes + - Conditioning is useful but not mandatory (e.g., Flux2, LTXVideo2) + + When True and requires_conditioning_dataset() is False, the WebUI will + show conditioning options without making them mandatory. This enables + dual T2I/I2I training workflows. + """ + return False + def text_embed_cache_key(self) -> TextEmbedCacheKey: """ Controls how prompt embeddings are keyed inside the cache. Most models can @@ -1013,9 +1057,27 @@ def requires_text_embed_image_context(self) -> bool: return False def requires_conditioning_latents(self) -> bool: + """ + Returns True when conditioning inputs should be VAE-encoded latents + instead of raw pixel values. + + Override to True when: + - The model processes conditioning through the latent space (e.g., Flux, Flux2) + - ControlNet-style conditioning uses latent inputs + + When True, collate.py will collect VAE-encoded latents for conditioning + instead of pixel tensors. + """ return False def requires_conditioning_image_embeds(self) -> bool: + """ + Returns True when conditioning requires pre-computed image embeddings + (e.g., from CLIP or similar vision encoder). + + Override to True for models that use image embeddings as conditioning + signals rather than raw pixels or latents. + """ return False def supports_audio_inputs(self) -> bool: @@ -1042,9 +1104,29 @@ def requires_validation_edit_captions(self) -> bool: return False def requires_conditioning_validation_inputs(self) -> bool: + """ + Returns True when validation requires conditioning inputs (images/latents). + + Override to True when: + - The model needs reference images to generate meaningful validation outputs + - Validation without conditioning would produce unusable results + + When True, the validation system will load images from configured + validation datasets or use eval_dataset_id to source inputs. + """ return False - def conditioning_validation_dataset_type(self) -> bool: + def conditioning_validation_dataset_type(self) -> str: + """ + Returns the dataset type to use for conditioning during validation. + + Common values: + - "conditioning": Use datasets marked as conditioning type (default) + - "image": Use standard image datasets (e.g., for edit models like Kontext) + + Override this when the model expects a specific dataset type for its + conditioning inputs during validation. + """ return "conditioning" def validation_image_input_edge_length(self): diff --git a/simpletuner/helpers/models/flux2/model.py b/simpletuner/helpers/models/flux2/model.py index 524142d62..c5b934891 100644 --- a/simpletuner/helpers/models/flux2/model.py +++ b/simpletuner/helpers/models/flux2/model.py @@ -687,6 +687,17 @@ def requires_conditioning_latents(self) -> bool: # If controlnet is configured, base class already returns True. return True + def supports_conditioning_dataset(self) -> bool: + """ + FLUX.2 optionally supports reference image conditioning for dual T2I/I2I training. + + Unlike Flux Kontext which *requires* conditioning inputs, FLUX.2 can operate in + either text-to-image mode (no conditioning) or image-to-image mode (with reference + images). This allows the WebUI to show conditioning dataset options without + making them mandatory. + """ + return True + def prepare_batch_conditions(self, batch: dict, state: dict): """ Prepare conditioning inputs for FLUX.2 reference image conditioning. diff --git a/simpletuner/helpers/models/model_metadata.json b/simpletuner/helpers/models/model_metadata.json index a714de05c..7f8d5261d 100644 --- a/simpletuner/helpers/models/model_metadata.json +++ b/simpletuner/helpers/models/model_metadata.json @@ -4,7 +4,19 @@ "module_path": "simpletuner.helpers.models.flux2.model", "name": "Flux.2", "flavour_choices": [ - "dev" + "dev", + "klein-4b", + "klein-9b" + ] + }, + "ltxvideo2": { + "class_name": "LTXVideo2", + "module_path": "simpletuner.helpers.models.ltxvideo2.model", + "name": "LTXVideo2", + "flavour_choices": [ + "dev", + "dev-fp4", + "dev-fp8" ] }, "sana": { @@ -47,6 +59,7 @@ "name": "Qwen-Image", "flavour_choices": [ "v1.0", + "v2.0", "edit-v1", "edit-v2", "edit-v2+", @@ -156,14 +169,6 @@ "0.9.0" ] }, - "ltxvideo2": { - "class_name": "LTXVideo2", - "module_path": "simpletuner.helpers.models.ltxvideo2.model", - "name": "LTX-2", - "flavour_choices": [ - "2.0" - ] - }, "longcat_video": { "class_name": "LongCatVideo", "module_path": "simpletuner.helpers.models.longcat_video.model", diff --git a/simpletuner/templates/trainer_htmx.html b/simpletuner/templates/trainer_htmx.html index c92266342..08aeab4d5 100644 --- a/simpletuner/templates/trainer_htmx.html +++ b/simpletuner/templates/trainer_htmx.html @@ -1319,21 +1319,24 @@ return normalized; }, computeModelContext(info, config, details) { + // Compute conservative defaults from config flags. + // Model-specific capabilities (conditioning support, dataset types, etc.) + // are provided by the backend via evaluateModelRequirements() and will + // override these defaults. This avoids hardcoding model/flavour checks + // in the frontend. const normalized = this.buildNormalizedConfig(config); const controlnetEnabled = this.normalizeBoolean(info.controlnet || normalized.controlnet); const controlEnabled = this.normalizeBoolean(info.control || normalized.control); - const modelFamily = info.modelFamily || ''; const modelFlavour = info.modelFlavour || ''; - const fluxKontext = modelFamily === 'flux' && modelFlavour === 'kontext'; - const requiresDataset = Boolean(controlnetEnabled || controlEnabled || fluxKontext); - const requiresLatents = Boolean(controlnetEnabled || controlEnabled || fluxKontext); - const requiresValidationInputs = Boolean(controlnetEnabled || fluxKontext); - const requiresValidationEditCaptions = Boolean(controlnetEnabled || fluxKontext); - const conditioningDatasetType = fluxKontext ? 'image' : 'conditioning'; + // Only controlnet/control are known from config; other conditioning + // capabilities come from backend model class methods. + const requiresDataset = Boolean(controlnetEnabled || controlEnabled); + const requiresLatents = Boolean(controlnetEnabled || controlEnabled); + const requiresValidationInputs = Boolean(controlnetEnabled || controlEnabled); const hasControlnetPipeline = Boolean(details?.capabilities?.has_controlnet_pipeline); const supportsConditioningGenerators = Boolean( - hasControlnetPipeline || controlnetEnabled || controlEnabled || fluxKontext, + hasControlnetPipeline || controlnetEnabled || controlEnabled, ); return { @@ -1342,8 +1345,8 @@ requiresConditioningLatents: requiresLatents, requiresConditioningImageEmbeds: false, requiresConditioningValidationInputs: requiresValidationInputs, - requiresValidationEditCaptions, - conditioningDatasetType, + requiresValidationEditCaptions: false, + conditioningDatasetType: 'conditioning', supportsConditioningGenerators, hasControlnetPipeline, modelFlavour: modelFlavour || null,