diff --git a/documentation/OPTIONS.es.md b/documentation/OPTIONS.es.md index 6ce17f00d..a53ccbc48 100644 --- a/documentation/OPTIONS.es.md +++ b/documentation/OPTIONS.es.md @@ -964,6 +964,61 @@ CREPA es una técnica de regularización para fine-tuning de modelos de difusió - **Por qué**: Los modelos DINOv2 funcionan mejor a su resolución de entrenamiento. El modelo gigante usa 518x518. - **Predeterminado**: `518` +### `--crepa_scheduler` + +- **Qué**: Programa para el decaimiento del coeficiente CREPA durante el entrenamiento. +- **Por qué**: Permite reducir la fuerza de regularización CREPA a medida que avanza el entrenamiento, previniendo el sobreajuste en características profundas del encoder. +- **Opciones**: `constant`, `linear`, `cosine`, `polynomial` +- **Predeterminado**: `constant` + +### `--crepa_warmup_steps` + +- **Qué**: Número de pasos para incrementar linealmente el peso CREPA desde 0 hasta `crepa_lambda`. +- **Por qué**: Un calentamiento gradual puede ayudar a estabilizar el entrenamiento temprano antes de que la regularización CREPA entre en efecto. +- **Predeterminado**: `0` + +### `--crepa_decay_steps` + +- **Qué**: Pasos totales para el decaimiento (después del calentamiento). Establece a 0 para decaer durante todo el entrenamiento. +- **Por qué**: Controla la duración de la fase de decaimiento. El decaimiento comienza después de que se completa el calentamiento. +- **Predeterminado**: `0` (usa `max_train_steps`) + +### `--crepa_lambda_end` + +- **Qué**: Peso CREPA final después de que se completa el decaimiento. +- **Por qué**: Establecerlo a 0 desactiva efectivamente CREPA al final del entrenamiento, útil para text2video donde CREPA puede causar artefactos. +- **Predeterminado**: `0.0` + +### `--crepa_power` + +- **Qué**: Factor de potencia para el decaimiento polinomial. 1.0 = lineal, 2.0 = cuadrático, etc. +- **Por qué**: Valores más altos causan un decaimiento inicial más rápido que se ralentiza hacia el final. +- **Predeterminado**: `1.0` + +### `--crepa_cutoff_step` + +- **Qué**: Paso de corte duro después del cual CREPA se desactiva. +- **Por qué**: Útil para desactivar CREPA después de que el modelo ha convergido en el alineamiento temporal. +- **Predeterminado**: `0` (sin corte basado en pasos) + +### `--crepa_similarity_threshold` + +- **Qué**: Umbral de EMA de similitud en el cual se activa el corte de CREPA. +- **Por qué**: Cuando el promedio móvil exponencial de similitud alcanza este valor, CREPA se desactiva para prevenir el sobreajuste en características profundas del encoder. Esto es particularmente útil para entrenamiento text2video. +- **Predeterminado**: None (desactivado) + +### `--crepa_similarity_ema_decay` + +- **Qué**: Factor de decaimiento del promedio móvil exponencial para el seguimiento de similitud. +- **Por qué**: Valores más altos proporcionan un seguimiento más suave (0.99 ≈ ventana de 100 pasos), valores más bajos reaccionan más rápido a los cambios. +- **Predeterminado**: `0.99` + +### `--crepa_threshold_mode` + +- **Qué**: Comportamiento cuando se alcanza el umbral de similitud. +- **Opciones**: `permanent` (CREPA permanece desactivado una vez que se alcanza el umbral), `recoverable` (CREPA se reactiva si la similitud cae) +- **Predeterminado**: `permanent` + ### Ejemplo de configuración ```toml @@ -981,6 +1036,15 @@ crepa_encoder_frames_batch_size = -1 crepa_use_backbone_features = false # crepa_teacher_block_index = 16 crepa_encoder_image_size = 518 + +# Programación CREPA (opcional) +# crepa_scheduler = "cosine" # Tipo de decaimiento: constant, linear, cosine, polynomial +# crepa_warmup_steps = 100 # Calentamiento antes de que CREPA entre en efecto +# crepa_decay_steps = 1000 # Pasos para el decaimiento (0 = todo el entrenamiento) +# crepa_lambda_end = 0.0 # Peso final después del decaimiento +# crepa_cutoff_step = 5000 # Paso de corte duro (0 = desactivado) +# crepa_similarity_threshold = 0.9 # Corte basado en similitud +# crepa_threshold_mode = "permanent" # permanent o recoverable ``` --- diff --git a/documentation/OPTIONS.hi.md b/documentation/OPTIONS.hi.md index dc7e17efb..134ceb0cc 100644 --- a/documentation/OPTIONS.hi.md +++ b/documentation/OPTIONS.hi.md @@ -962,6 +962,61 @@ CREPA एक regularization तकनीक है जो video diffusion models - **Why**: DINOv2 models अपने training resolution पर बेहतर काम करते हैं। giant model 518x518 उपयोग करता है। - **Default**: `518` +### `--crepa_scheduler` + +- **What**: training के दौरान CREPA coefficient decay का schedule। +- **Why**: जैसे-जैसे training आगे बढ़े, CREPA regularization strength को कम करने देता है, deep encoder features पर overfitting रोकता है। +- **Options**: `constant`, `linear`, `cosine`, `polynomial` +- **Default**: `constant` + +### `--crepa_warmup_steps` + +- **What**: CREPA weight को 0 से `crepa_lambda` तक linearly ramp करने के लिए steps की संख्या। +- **Why**: gradual warmup CREPA regularization शुरू होने से पहले early training को stabilize करने में मदद कर सकता है। +- **Default**: `0` + +### `--crepa_decay_steps` + +- **What**: decay के लिए कुल steps (warmup के बाद)। 0 सेट करने पर पूरी training run पर decay होगा। +- **Why**: decay phase की duration नियंत्रित करता है। warmup पूरा होने के बाद decay शुरू होता है। +- **Default**: `0` (`max_train_steps` उपयोग होगा) + +### `--crepa_lambda_end` + +- **What**: decay पूरा होने के बाद final CREPA weight। +- **Why**: 0 सेट करने पर training के अंत में CREPA प्रभावी रूप से disable हो जाता है, text2video के लिए उपयोगी जहाँ CREPA artifacts पैदा कर सकता है। +- **Default**: `0.0` + +### `--crepa_power` + +- **What**: polynomial decay के लिए power factor। 1.0 = linear, 2.0 = quadratic, आदि। +- **Why**: higher values शुरुआत में तेज decay करते हैं जो अंत की ओर धीमा हो जाता है। +- **Default**: `1.0` + +### `--crepa_cutoff_step` + +- **What**: hard cutoff step जिसके बाद CREPA disable हो जाता है। +- **Why**: model temporal alignment पर converge होने के बाद CREPA disable करने के लिए उपयोगी। +- **Default**: `0` (कोई step-based cutoff नहीं) + +### `--crepa_similarity_threshold` + +- **What**: similarity EMA threshold जिस पर CREPA cutoff trigger होता है। +- **Why**: जब similarity का exponential moving average इस मान तक पहुँचता है, तो deep encoder features पर overfitting रोकने के लिए CREPA disable हो जाता है। text2video training के लिए विशेष रूप से उपयोगी। +- **Default**: None (disabled) + +### `--crepa_similarity_ema_decay` + +- **What**: similarity tracking के लिए exponential moving average decay factor। +- **Why**: higher values smoother tracking देते हैं (0.99 ≈ 100-step window), lower values changes पर तेज react करते हैं। +- **Default**: `0.99` + +### `--crepa_threshold_mode` + +- **What**: similarity threshold पहुँचने पर व्यवहार। +- **Options**: `permanent` (threshold hit होने पर CREPA permanently off रहता है), `recoverable` (similarity गिरने पर CREPA फिर से enable होता है) +- **Default**: `permanent` + ### Example Configuration ```toml @@ -979,6 +1034,15 @@ crepa_encoder_frames_batch_size = -1 crepa_use_backbone_features = false # crepa_teacher_block_index = 16 crepa_encoder_image_size = 518 + +# CREPA Scheduling (optional) +# crepa_scheduler = "cosine" # Decay type: constant, linear, cosine, polynomial +# crepa_warmup_steps = 100 # Warmup before CREPA kicks in +# crepa_decay_steps = 1000 # Steps for decay (0 = entire training) +# crepa_lambda_end = 0.0 # Final weight after decay +# crepa_cutoff_step = 5000 # Hard cutoff step (0 = disabled) +# crepa_similarity_threshold = 0.9 # Similarity-based cutoff +# crepa_threshold_mode = "permanent" # permanent or recoverable ``` --- diff --git a/documentation/OPTIONS.ja.md b/documentation/OPTIONS.ja.md index e30f8169e..4a6d44da7 100644 --- a/documentation/OPTIONS.ja.md +++ b/documentation/OPTIONS.ja.md @@ -966,12 +966,67 @@ CREPA は動画拡散モデルのファインチューニング向け正則化 - **理由**: DINOv2 は学習時の解像度で最も良く動作します。巨大モデルは 518x518 を使用します。 - **既定**: `518` +### `--crepa_scheduler` + +- **内容**: 学習中の CREPA 係数減衰スケジュール。 +- **理由**: 学習が進むにつれて CREPA 正則化の強度を下げることで、深層エンコーダ特徴への過学習を防ぎます。 +- **選択肢**: `constant`、`linear`、`cosine`、`polynomial` +- **既定**: `constant` + +### `--crepa_warmup_steps` + +- **内容**: CREPA 重みを 0 から `crepa_lambda` まで線形に上昇させるステップ数。 +- **理由**: 段階的なウォームアップにより、CREPA 正則化が有効になる前の初期学習を安定させます。 +- **既定**: `0` + +### `--crepa_decay_steps` + +- **内容**: 減衰の総ステップ数(ウォームアップ後)。0 に設定すると学習全体で減衰します。 +- **理由**: 減衰フェーズの期間を制御します。減衰はウォームアップ完了後に開始されます。 +- **既定**: `0`(`max_train_steps` を使用) + +### `--crepa_lambda_end` + +- **内容**: 減衰完了後の最終 CREPA 重み。 +- **理由**: 0 に設定すると学習終了時に CREPA を実質的に無効化できます。text2video で CREPA がアーティファクトを引き起こす場合に有用です。 +- **既定**: `0.0` + +### `--crepa_power` + +- **内容**: 多項式減衰のべき乗係数。1.0 = 線形、2.0 = 二次など。 +- **理由**: 値が大きいほど初期の減衰が速く、終盤に向けて緩やかになります。 +- **既定**: `1.0` + +### `--crepa_cutoff_step` + +- **内容**: CREPA を無効化するハードカットオフステップ。 +- **理由**: モデルが時間的整合に収束した後に CREPA を無効化するのに有用です。 +- **既定**: `0`(ステップベースのカットオフなし) + +### `--crepa_similarity_threshold` + +- **内容**: CREPA カットオフをトリガーする類似度 EMA 閾値。 +- **理由**: 類似度の指数移動平均がこの値に達すると、深層エンコーダ特徴への過学習を防ぐために CREPA が無効化されます。text2video 学習に特に有用です。 +- **既定**: なし(無効) + +### `--crepa_similarity_ema_decay` + +- **内容**: 類似度追跡の指数移動平均減衰係数。 +- **理由**: 値が大きいほど滑らかな追跡(0.99 ≈ 100 ステップウィンドウ)、値が小さいほど変化に素早く反応します。 +- **既定**: `0.99` + +### `--crepa_threshold_mode` + +- **内容**: 類似度閾値に達した際の動作。 +- **選択肢**: `permanent`(閾値に達すると CREPA はオフのまま)、`recoverable`(類似度が下がると CREPA が再有効化) +- **既定**: `permanent` + ### 設定例 ```toml -# Enable CREPA for video fine-tuning +# 動画ファインチューニング用 CREPA を有効化 crepa_enabled = true -crepa_block_index = 8 # Adjust based on your model +crepa_block_index = 8 # モデルに応じて調整 crepa_lambda = 0.5 crepa_adjacent_distance = 1 crepa_adjacent_tau = 1.0 @@ -983,6 +1038,15 @@ crepa_encoder_frames_batch_size = -1 crepa_use_backbone_features = false # crepa_teacher_block_index = 16 crepa_encoder_image_size = 518 + +# CREPA スケジューリング(オプション) +# crepa_scheduler = "cosine" # 減衰タイプ: constant, linear, cosine, polynomial +# crepa_warmup_steps = 100 # CREPA 有効化前のウォームアップ +# crepa_decay_steps = 1000 # 減衰ステップ数(0 = 学習全体) +# crepa_lambda_end = 0.0 # 減衰後の最終重み +# crepa_cutoff_step = 5000 # ハードカットオフステップ(0 = 無効) +# crepa_similarity_threshold = 0.9 # 類似度ベースのカットオフ +# crepa_threshold_mode = "permanent" # permanent または recoverable ``` --- diff --git a/documentation/OPTIONS.md b/documentation/OPTIONS.md index dfbceef28..c7dc94ca7 100644 --- a/documentation/OPTIONS.md +++ b/documentation/OPTIONS.md @@ -962,6 +962,61 @@ CREPA is a regularization technique for fine-tuning video diffusion models that - **Why**: DINOv2 models work best at their training resolution. The giant model uses 518x518. - **Default**: `518` +### `--crepa_scheduler` + +- **What**: Schedule for CREPA coefficient decay over training. +- **Why**: Allows reducing CREPA regularization strength as training progresses, preventing overfitting on deep encoder features. +- **Options**: `constant`, `linear`, `cosine`, `polynomial` +- **Default**: `constant` + +### `--crepa_warmup_steps` + +- **What**: Number of steps to linearly ramp CREPA weight from 0 to `crepa_lambda`. +- **Why**: Gradual warmup can help stabilize early training before CREPA regularization kicks in. +- **Default**: `0` + +### `--crepa_decay_steps` + +- **What**: Total steps for decay (after warmup). Set to 0 to decay over entire training run. +- **Why**: Controls the duration of the decay phase. Decay starts after warmup completes. +- **Default**: `0` (uses `max_train_steps`) + +### `--crepa_lambda_end` + +- **What**: Final CREPA weight after decay completes. +- **Why**: Setting to 0 effectively disables CREPA at end of training, useful for text2video where CREPA may cause artifacts. +- **Default**: `0.0` + +### `--crepa_power` + +- **What**: Power factor for polynomial decay. 1.0 = linear, 2.0 = quadratic, etc. +- **Why**: Higher values cause faster initial decay that slows down towards the end. +- **Default**: `1.0` + +### `--crepa_cutoff_step` + +- **What**: Hard cutoff step after which CREPA is disabled. +- **Why**: Useful for disabling CREPA after model has converged on temporal alignment. +- **Default**: `0` (no step-based cutoff) + +### `--crepa_similarity_threshold` + +- **What**: Similarity EMA threshold at which CREPA cutoff triggers. +- **Why**: When the exponential moving average of similarity reaches this value, CREPA is disabled to prevent overfitting on deep encoder features. This is particularly useful for text2video training. +- **Default**: None (disabled) + +### `--crepa_similarity_ema_decay` + +- **What**: Exponential moving average decay factor for similarity tracking. +- **Why**: Higher values provide smoother tracking (0.99 ≈ 100-step window), lower values react faster to changes. +- **Default**: `0.99` + +### `--crepa_threshold_mode` + +- **What**: Behavior when similarity threshold is reached. +- **Options**: `permanent` (CREPA stays off once threshold is hit), `recoverable` (CREPA re-enables if similarity drops) +- **Default**: `permanent` + ### Example Configuration ```toml @@ -979,6 +1034,15 @@ crepa_encoder_frames_batch_size = -1 crepa_use_backbone_features = false # crepa_teacher_block_index = 16 crepa_encoder_image_size = 518 + +# CREPA Scheduling (optional) +# crepa_scheduler = "cosine" # Decay type: constant, linear, cosine, polynomial +# crepa_warmup_steps = 100 # Warmup before CREPA kicks in +# crepa_decay_steps = 1000 # Steps for decay (0 = entire training) +# crepa_lambda_end = 0.0 # Final weight after decay +# crepa_cutoff_step = 5000 # Hard cutoff step (0 = disabled) +# crepa_similarity_threshold = 0.9 # Similarity-based cutoff +# crepa_threshold_mode = "permanent" # permanent or recoverable ``` --- diff --git a/documentation/OPTIONS.pt-BR.md b/documentation/OPTIONS.pt-BR.md index 4593a81ea..801492df9 100644 --- a/documentation/OPTIONS.pt-BR.md +++ b/documentation/OPTIONS.pt-BR.md @@ -960,6 +960,61 @@ CREPA e uma tecnica de regularizacao para fine-tuning de modelos de difusao de v - **Por que**: Modelos DINOv2 funcionam melhor na resolucao de treino. O modelo giant usa 518x518. - **Padrao**: `518` +### `--crepa_scheduler` + +- **O que**: Agendamento para decaimento do coeficiente CREPA durante o treinamento. +- **Por que**: Permite reduzir a forca da regularizacao CREPA conforme o treinamento progride, prevenindo overfitting nas features profundas do encoder. +- **Opcoes**: `constant`, `linear`, `cosine`, `polynomial` +- **Padrao**: `constant` + +### `--crepa_warmup_steps` + +- **O que**: Numero de passos para aumentar linearmente o peso CREPA de 0 ate `crepa_lambda`. +- **Por que**: Aquecimento gradual pode ajudar a estabilizar o treinamento inicial antes da regularizacao CREPA entrar em acao. +- **Padrao**: `0` + +### `--crepa_decay_steps` + +- **O que**: Total de passos para decaimento (apos warmup). Defina como 0 para decair durante todo o treinamento. +- **Por que**: Controla a duracao da fase de decaimento. O decaimento comeca apos o warmup completar. +- **Padrao**: `0` (usa `max_train_steps`) + +### `--crepa_lambda_end` + +- **O que**: Peso CREPA final apos o decaimento completar. +- **Por que**: Definir como 0 efetivamente desabilita o CREPA no final do treinamento, util para text2video onde CREPA pode causar artefatos. +- **Padrao**: `0.0` + +### `--crepa_power` + +- **O que**: Fator de potencia para decaimento polinomial. 1.0 = linear, 2.0 = quadratico, etc. +- **Por que**: Valores maiores causam decaimento inicial mais rapido que desacelera no final. +- **Padrao**: `1.0` + +### `--crepa_cutoff_step` + +- **O que**: Passo de corte rigido apos o qual o CREPA e desabilitado. +- **Por que**: Util para desabilitar o CREPA apos o modelo convergir no alinhamento temporal. +- **Padrao**: `0` (sem corte baseado em passo) + +### `--crepa_similarity_threshold` + +- **O que**: Limiar de EMA de similaridade no qual o corte CREPA e acionado. +- **Por que**: Quando a media movel exponencial da similaridade atinge este valor, o CREPA e desabilitado para prevenir overfitting nas features profundas do encoder. Isto e particularmente util para treinamento text2video. +- **Padrao**: None (desabilitado) + +### `--crepa_similarity_ema_decay` + +- **O que**: Fator de decaimento da media movel exponencial para rastreamento de similaridade. +- **Por que**: Valores maiores fornecem rastreamento mais suave (0.99 ≈ janela de 100 passos), valores menores reagem mais rapido a mudancas. +- **Padrao**: `0.99` + +### `--crepa_threshold_mode` + +- **O que**: Comportamento quando o limiar de similaridade e atingido. +- **Opcoes**: `permanent` (CREPA permanece desligado apos atingir o limiar), `recoverable` (CREPA reabilita se a similaridade cair) +- **Padrao**: `permanent` + ### Exemplo de configuracao ```toml @@ -977,6 +1032,15 @@ crepa_encoder_frames_batch_size = -1 crepa_use_backbone_features = false # crepa_teacher_block_index = 16 crepa_encoder_image_size = 518 + +# Agendamento CREPA (opcional) +# crepa_scheduler = "cosine" # Tipo de decaimento: constant, linear, cosine, polynomial +# crepa_warmup_steps = 100 # Warmup antes do CREPA entrar em acao +# crepa_decay_steps = 1000 # Passos para decaimento (0 = treinamento inteiro) +# crepa_lambda_end = 0.0 # Peso final apos decaimento +# crepa_cutoff_step = 5000 # Passo de corte rigido (0 = desabilitado) +# crepa_similarity_threshold = 0.9 # Corte baseado em similaridade +# crepa_threshold_mode = "permanent" # permanent ou recoverable ``` --- diff --git a/documentation/OPTIONS.zh.md b/documentation/OPTIONS.zh.md index 167502eda..2a61104b8 100644 --- a/documentation/OPTIONS.zh.md +++ b/documentation/OPTIONS.zh.md @@ -968,12 +968,67 @@ CREPA 是一种用于视频扩散模型微调的正则化技术,通过将隐 - **原因**:DINOv2 在训练分辨率上效果最好。巨型模型使用 518x518。 - **默认**:`518` +### `--crepa_scheduler` + +- **内容**:训练过程中 CREPA 系数的衰减调度方式。 +- **原因**:允许在训练进行时逐渐降低 CREPA 正则化强度,防止对深层编码器特征过拟合。 +- **选项**:`constant`、`linear`、`cosine`、`polynomial` +- **默认**:`constant` + +### `--crepa_warmup_steps` + +- **内容**:将 CREPA 权重从 0 线性升至 `crepa_lambda` 的步数。 +- **原因**:渐进预热有助于在 CREPA 正则化生效前稳定早期训练。 +- **默认**:`0` + +### `--crepa_decay_steps` + +- **内容**:衰减总步数(预热后)。设为 0 则在整个训练过程中衰减。 +- **原因**:灵活控制衰减的时间跨度。 +- **默认**:`0`(使用 `max_train_steps`) + +### `--crepa_lambda_end` + +- **内容**:衰减完成后的最终 CREPA 权重。 +- **原因**:设为 0 可在训练末期有效禁用 CREPA,适用于 text2video 等可能产生伪影的场景。 +- **默认**:`0.0` + +### `--crepa_power` + +- **内容**:多项式衰减的幂因子。1.0 = 线性,2.0 = 二次,以此类推。 +- **原因**:控制衰减曲线形状。 +- **默认**:`1.0` + +### `--crepa_cutoff_step` + +- **内容**:硬截止步数,超过此步后禁用 CREPA。 +- **原因**:适用于模型时序对齐收敛后禁用 CREPA 的场景。 +- **默认**:`0`(无基于步数的截止) + +### `--crepa_similarity_threshold` + +- **内容**:触发 CREPA 截止的相似度 EMA 阈值。 +- **原因**:当相似度的指数移动平均达到此值时,CREPA 被禁用以防止对深层编码器特征过拟合。对于 text2video 训练尤其有用。 +- **默认**:None(禁用) + +### `--crepa_similarity_ema_decay` + +- **内容**:相似度跟踪的指数移动平均衰减因子。 +- **原因**:控制相似度指标的平滑程度。 +- **默认**:`0.99` + +### `--crepa_threshold_mode` + +- **内容**:达到相似度阈值后的行为。 +- **选项**:`permanent`(一旦达到阈值,CREPA 保持关闭)、`recoverable`(若相似度下降,CREPA 重新启用) +- **默认**:`permanent` + ### 配置示例 ```toml -# Enable CREPA for video fine-tuning +# 启用 CREPA 用于视频微调 crepa_enabled = true -crepa_block_index = 8 # Adjust based on your model +crepa_block_index = 8 # 根据模型调整 crepa_lambda = 0.5 crepa_adjacent_distance = 1 crepa_adjacent_tau = 1.0 @@ -985,6 +1040,15 @@ crepa_encoder_frames_batch_size = -1 crepa_use_backbone_features = false # crepa_teacher_block_index = 16 crepa_encoder_image_size = 518 + +# CREPA 调度(可选) +# crepa_scheduler = "cosine" # 衰减类型:constant、linear、cosine、polynomial +# crepa_warmup_steps = 100 # CREPA 生效前的预热步数 +# crepa_decay_steps = 1000 # 衰减步数(0 = 整个训练过程) +# crepa_lambda_end = 0.0 # 衰减后的最终权重 +# crepa_cutoff_step = 5000 # 硬截止步数(0 = 禁用) +# crepa_similarity_threshold = 0.9 # 基于相似度的截止 +# crepa_threshold_mode = "permanent" # permanent 或 recoverable ``` --- diff --git a/documentation/experimental/VIDEO_CREPA.es.md b/documentation/experimental/VIDEO_CREPA.es.md index bc32d0aa4..147d4c1bc 100644 --- a/documentation/experimental/VIDEO_CREPA.es.md +++ b/documentation/experimental/VIDEO_CREPA.es.md @@ -48,6 +48,96 @@ Agrega lo siguiente a tu `config.json` o args de CLI: - `crepa_use_backbone_features=true`: omite el encoder externo y alinea con un bloque transformer más profundo; configura `crepa_teacher_block_index` para elegir el maestro. - Tamaño de encoder: baja a `dinov2_vits14` + `224` si la VRAM es ajustada; mantén `dinov2_vitg14` + `518` para la mejor calidad. +## Programación del coeficiente + +CREPA soporta programar el coeficiente (`crepa_lambda`) durante el entrenamiento con calentamiento, decaimiento y corte automático basado en umbral de similitud. Esto es particularmente útil para entrenamiento text2video donde CREPA puede causar franjas horizontales/verticales o una apariencia deslavada si se aplica demasiado fuerte durante demasiado tiempo. + +### Programación básica + +```json +{ + "crepa_enabled": true, + "crepa_lambda": 0.5, + "crepa_scheduler": "cosine", + "crepa_warmup_steps": 100, + "crepa_decay_steps": 5000, + "crepa_lambda_end": 0.0 +} +``` + +Esta configuración: +1. Aumenta el peso de CREPA de 0 a 0.5 durante los primeros 100 pasos +2. Decae de 0.5 a 0.0 usando un programa coseno durante 5000 pasos +3. Después del paso 5100, CREPA está efectivamente desactivado + +### Tipos de programador + +- `constant`: Sin decaimiento, el peso se mantiene en `crepa_lambda` (por defecto) +- `linear`: Interpolación lineal de `crepa_lambda` a `crepa_lambda_end` +- `cosine`: Annealing coseno suave (recomendado para la mayoría de casos) +- `polynomial`: Decaimiento polinomial con potencia configurable mediante `crepa_power` + +### Corte basado en pasos + +Para un corte duro después de un paso específico: + +```json +{ + "crepa_cutoff_step": 3000 +} +``` + +CREPA se desactiva completamente después del paso 3000. + +### Corte basado en similitud + +Este es el enfoque más flexible: CREPA se desactiva automáticamente cuando la métrica de similitud se estabiliza, indicando que el modelo ha aprendido suficiente alineación temporal: + +```json +{ + "crepa_similarity_threshold": 0.9, + "crepa_similarity_ema_decay": 0.99, + "crepa_threshold_mode": "permanent" +} +``` + +- `crepa_similarity_threshold`: Cuando la media móvil exponencial de similitud alcanza este valor, CREPA se corta +- `crepa_similarity_ema_decay`: Factor de suavizado (0.99 ≈ ventana de 100 pasos) +- `crepa_threshold_mode`: `permanent` (permanece apagado) o `recoverable` (puede reactivarse si la similitud baja) + +### Configuraciones recomendadas + +**Para image2video (i2v)**: +```json +{ + "crepa_scheduler": "constant", + "crepa_lambda": 0.5 +} +``` +CREPA estándar funciona bien para i2v ya que el frame de referencia ancla la consistencia. + +**Para text2video (t2v)**: +```json +{ + "crepa_scheduler": "cosine", + "crepa_lambda": 0.5, + "crepa_warmup_steps": 100, + "crepa_decay_steps": 0, + "crepa_lambda_end": 0.1, + "crepa_similarity_threshold": 0.85, + "crepa_threshold_mode": "permanent" +} +``` +Decae CREPA durante el entrenamiento y corta cuando la similitud se satura para prevenir artefactos. + +**Para fondos sólidos (t2v)**: +```json +{ + "crepa_cutoff_step": 2000 +} +``` +El corte temprano previene artefactos de franjas en fondos uniformes. +
Cómo funciona (practicante) diff --git a/documentation/experimental/VIDEO_CREPA.hi.md b/documentation/experimental/VIDEO_CREPA.hi.md index d2ebbbf45..a5550196a 100644 --- a/documentation/experimental/VIDEO_CREPA.hi.md +++ b/documentation/experimental/VIDEO_CREPA.hi.md @@ -48,6 +48,96 @@ Cross-frame Representation Alignment (CREPA) वीडियो मॉडल् - `crepa_use_backbone_features=true`: external encoder छोड़कर deeper transformer block के साथ align करें; teacher चुनने के लिए `crepa_teacher_block_index` सेट करें। - Encoder size: VRAM कम हो तो `dinov2_vits14` + `224` पर जाएँ; सर्वोत्तम गुणवत्ता के लिए `dinov2_vitg14` + `518` रखें। +## Coefficient scheduling + +CREPA ट्रेनिंग के दौरान coefficient (`crepa_lambda`) को warmup, decay, और similarity threshold आधारित automatic cutoff के साथ schedule करने का समर्थन करता है। यह विशेष रूप से text2video ट्रेनिंग के लिए उपयोगी है जहाँ CREPA बहुत अधिक या बहुत लंबे समय तक लागू करने पर horizontal/vertical stripes या washed-out feel पैदा कर सकता है। + +### बेसिक scheduling + +```json +{ + "crepa_enabled": true, + "crepa_lambda": 0.5, + "crepa_scheduler": "cosine", + "crepa_warmup_steps": 100, + "crepa_decay_steps": 5000, + "crepa_lambda_end": 0.0 +} +``` + +यह कॉन्फ़िगरेशन: +1. पहले 100 steps में CREPA weight को 0 से 0.5 तक ramp करता है +2. 5000 steps में cosine schedule से 0.5 से 0.0 तक decay करता है +3. Step 5100 के बाद, CREPA प्रभावी रूप से disable हो जाता है + +### Scheduler types + +- `constant`: कोई decay नहीं, weight `crepa_lambda` पर रहता है (डिफ़ॉल्ट) +- `linear`: `crepa_lambda` से `crepa_lambda_end` तक linear interpolation +- `cosine`: Smooth cosine annealing (ज़्यादातर मामलों के लिए अनुशंसित) +- `polynomial`: `crepa_power` के ज़रिए configurable power के साथ polynomial decay + +### Step-based cutoff + +किसी विशेष step के बाद hard cutoff के लिए: + +```json +{ + "crepa_cutoff_step": 3000 +} +``` + +Step 3000 के बाद CREPA पूरी तरह disable हो जाता है। + +### Similarity-based cutoff + +यह सबसे flexible approach है—जब similarity metric plateau हो जाता है, तो CREPA स्वचालित रूप से disable हो जाता है, जो दर्शाता है कि मॉडल ने पर्याप्त temporal alignment सीख लिया है: + +```json +{ + "crepa_similarity_threshold": 0.9, + "crepa_similarity_ema_decay": 0.99, + "crepa_threshold_mode": "permanent" +} +``` + +- `crepa_similarity_threshold`: जब similarity का exponential moving average इस value तक पहुँचता है, CREPA cut off हो जाता है +- `crepa_similarity_ema_decay`: Smoothing factor (0.99 ≈ 100-step window) +- `crepa_threshold_mode`: `permanent` (बंद रहता है) या `recoverable` (similarity गिरे तो फिर से enable हो सकता है) + +### अनुशंसित कॉन्फ़िगरेशन + +**image2video (i2v) के लिए**: +```json +{ + "crepa_scheduler": "constant", + "crepa_lambda": 0.5 +} +``` +मानक CREPA i2v के लिए अच्छा काम करता है क्योंकि reference frame consistency को anchor करता है। + +**text2video (t2v) के लिए**: +```json +{ + "crepa_scheduler": "cosine", + "crepa_lambda": 0.5, + "crepa_warmup_steps": 100, + "crepa_decay_steps": 0, + "crepa_lambda_end": 0.1, + "crepa_similarity_threshold": 0.85, + "crepa_threshold_mode": "permanent" +} +``` +ट्रेनिंग के दौरान CREPA को decay करता है और artifacts रोकने के लिए similarity saturate होने पर cut off करता है। + +**solid backgrounds (t2v) के लिए**: +```json +{ + "crepa_cutoff_step": 2000 +} +``` +Early cutoff uniform backgrounds पर stripe artifacts को रोकता है। +
कैसे काम करता है (प्रैक्टिशनर) diff --git a/documentation/experimental/VIDEO_CREPA.ja.md b/documentation/experimental/VIDEO_CREPA.ja.md index d623d286b..a2f940f3b 100644 --- a/documentation/experimental/VIDEO_CREPA.ja.md +++ b/documentation/experimental/VIDEO_CREPA.ja.md @@ -48,6 +48,96 @@ Cross-frame Representation Alignment(CREPA)は動画モデル向けの軽量 - `crepa_use_backbone_features=true`: 外部エンコーダを使わず、より深い Transformer ブロックへの整合に切り替え。教師は `crepa_teacher_block_index` で指定。 - エンコーダサイズ: VRAM が厳しければ `dinov2_vits14` + `224` にダウン。品質重視なら `dinov2_vitg14` + `518`。 +## 係数スケジューリング + +CREPA は訓練中に係数(`crepa_lambda`)をスケジュールする機能をサポートしています。ウォームアップ、減衰、類似度閾値に基づく自動カットオフが利用可能です。これは特に text2video 訓練で有用で、CREPA を強く長く適用しすぎると水平/垂直の縞模様や色褪せた感じが出る場合があります。 + +### 基本スケジューリング + +```json +{ + "crepa_enabled": true, + "crepa_lambda": 0.5, + "crepa_scheduler": "cosine", + "crepa_warmup_steps": 100, + "crepa_decay_steps": 5000, + "crepa_lambda_end": 0.0 +} +``` + +この設定では: +1. 最初の 100 ステップで CREPA 重みを 0 から 0.5 にウォームアップ +2. 5000 ステップかけてコサインスケジュールで 0.5 から 0.0 に減衰 +3. ステップ 5100 以降、CREPA は実質的に無効 + +### スケジューラタイプ + +- `constant`: 減衰なし、重みは `crepa_lambda` のまま(既定) +- `linear`: `crepa_lambda` から `crepa_lambda_end` への線形補間 +- `cosine`: 滑らかなコサインアニーリング(ほとんどのケースで推奨) +- `polynomial`: `crepa_power` で設定可能な累乗による多項式減衰 + +### ステップベースのカットオフ + +特定のステップ以降でハードカットオフする場合: + +```json +{ + "crepa_cutoff_step": 3000 +} +``` + +ステップ 3000 以降、CREPA は完全に無効化されます。 + +### 類似度ベースのカットオフ + +最も柔軟なアプローチです。類似度メトリクスがプラトーに達したときに自動的に CREPA を無効化し、モデルが十分な時間的整合を学習したことを示します: + +```json +{ + "crepa_similarity_threshold": 0.9, + "crepa_similarity_ema_decay": 0.99, + "crepa_threshold_mode": "permanent" +} +``` + +- `crepa_similarity_threshold`: 類似度の指数移動平均がこの値に達するとカットオフ +- `crepa_similarity_ema_decay`: 平滑化係数(0.99 ≈ 100 ステップのウィンドウ) +- `crepa_threshold_mode`: `permanent`(オフのまま)または `recoverable`(類似度が下がると再有効化) + +### 推奨設定 + +**image2video (i2v) 向け**: +```json +{ + "crepa_scheduler": "constant", + "crepa_lambda": 0.5 +} +``` +参照フレームが一貫性のアンカーになるため、i2v では標準の CREPA でうまく機能します。 + +**text2video (t2v) 向け**: +```json +{ + "crepa_scheduler": "cosine", + "crepa_lambda": 0.5, + "crepa_warmup_steps": 100, + "crepa_decay_steps": 0, + "crepa_lambda_end": 0.1, + "crepa_similarity_threshold": 0.85, + "crepa_threshold_mode": "permanent" +} +``` +訓練中に CREPA を減衰させ、類似度が飽和するとカットオフしてアーティファクトを防ぎます。 + +**単色背景向け (t2v)**: +```json +{ + "crepa_cutoff_step": 2000 +} +``` +早期カットオフにより均一な背景での縞模様アーティファクトを防ぎます。 +
仕組み(実務者向け) diff --git a/documentation/experimental/VIDEO_CREPA.md b/documentation/experimental/VIDEO_CREPA.md index 3a9d90624..a36807a55 100644 --- a/documentation/experimental/VIDEO_CREPA.md +++ b/documentation/experimental/VIDEO_CREPA.md @@ -48,6 +48,96 @@ Add the following to your `config.json` or CLI args: - `crepa_use_backbone_features=true`: skip the external encoder and align to a deeper transformer block; set `crepa_teacher_block_index` to choose the teacher. - Encoder size: downshift to `dinov2_vits14` + `224` if VRAM is tight; keep `dinov2_vitg14` + `518` for best quality. +## Coefficient scheduling + +CREPA supports scheduling the coefficient (`crepa_lambda`) over training with warmup, decay, and automatic cutoff based on similarity threshold. This is particularly useful for text2video training where CREPA may cause horizontal/vertical stripes or a washed-out feel if applied too strongly for too long. + +### Basic scheduling + +```json +{ + "crepa_enabled": true, + "crepa_lambda": 0.5, + "crepa_scheduler": "cosine", + "crepa_warmup_steps": 100, + "crepa_decay_steps": 5000, + "crepa_lambda_end": 0.0 +} +``` + +This configuration: +1. Ramps CREPA weight from 0 to 0.5 over the first 100 steps +2. Decays from 0.5 to 0.0 using a cosine schedule over 5000 steps +3. After step 5100, CREPA is effectively disabled + +### Scheduler types + +- `constant`: No decay, weight stays at `crepa_lambda` (default) +- `linear`: Linear interpolation from `crepa_lambda` to `crepa_lambda_end` +- `cosine`: Smooth cosine annealing (recommended for most cases) +- `polynomial`: Polynomial decay with configurable power via `crepa_power` + +### Step-based cutoff + +For a hard cutoff after a specific step: + +```json +{ + "crepa_cutoff_step": 3000 +} +``` + +CREPA is completely disabled after step 3000. + +### Similarity-based cutoff + +This is the most flexible approach—CREPA automatically disables when the similarity metric plateaus, indicating the model has learned sufficient temporal alignment: + +```json +{ + "crepa_similarity_threshold": 0.9, + "crepa_similarity_ema_decay": 0.99, + "crepa_threshold_mode": "permanent" +} +``` + +- `crepa_similarity_threshold`: When the exponential moving average of similarity reaches this value, CREPA cuts off +- `crepa_similarity_ema_decay`: Smoothing factor (0.99 ≈ 100-step window) +- `crepa_threshold_mode`: `permanent` (stays off) or `recoverable` (can re-enable if similarity drops) + +### Recommended configurations + +**For image2video (i2v)**: +```json +{ + "crepa_scheduler": "constant", + "crepa_lambda": 0.5 +} +``` +Standard CREPA works well for i2v since the reference frame anchors consistency. + +**For text2video (t2v)**: +```json +{ + "crepa_scheduler": "cosine", + "crepa_lambda": 0.5, + "crepa_warmup_steps": 100, + "crepa_decay_steps": 0, + "crepa_lambda_end": 0.1, + "crepa_similarity_threshold": 0.85, + "crepa_threshold_mode": "permanent" +} +``` +Decays CREPA over training and cuts off when similarity saturates to prevent artifacts. + +**For solid backgrounds (t2v)**: +```json +{ + "crepa_cutoff_step": 2000 +} +``` +Early cutoff prevents stripe artifacts on uniform backgrounds. +
How it works (practitioner) diff --git a/documentation/experimental/VIDEO_CREPA.pt-BR.md b/documentation/experimental/VIDEO_CREPA.pt-BR.md index ddc64368a..f2504c0d4 100644 --- a/documentation/experimental/VIDEO_CREPA.pt-BR.md +++ b/documentation/experimental/VIDEO_CREPA.pt-BR.md @@ -48,6 +48,96 @@ Adicione o seguinte ao seu `config.json` ou args da CLI: - `crepa_use_backbone_features=true`: pula o encoder externo e alinha com um bloco transformer mais profundo; defina `crepa_teacher_block_index` para escolher o teacher. - Tamanho do encoder: reduza para `dinov2_vits14` + `224` se VRAM estiver apertada; mantenha `dinov2_vitg14` + `518` para melhor qualidade. +## Agendamento de coeficiente + +O CREPA suporta agendamento do coeficiente (`crepa_lambda`) ao longo do treinamento com warmup, decaimento e corte automatico baseado em limiar de similaridade. Isso e particularmente util para treinamento text2video onde o CREPA pode causar listras horizontais/verticais ou uma aparencia desbotada se aplicado muito forte por muito tempo. + +### Agendamento basico + +```json +{ + "crepa_enabled": true, + "crepa_lambda": 0.5, + "crepa_scheduler": "cosine", + "crepa_warmup_steps": 100, + "crepa_decay_steps": 5000, + "crepa_lambda_end": 0.0 +} +``` + +Esta configuracao: +1. Aumenta o peso do CREPA de 0 para 0.5 nos primeiros 100 steps +2. Decai de 0.5 para 0.0 usando um agendamento cosseno em 5000 steps +3. Apos o step 5100, o CREPA esta efetivamente desabilitado + +### Tipos de agendamento + +- `constant`: Sem decaimento, o peso permanece em `crepa_lambda` (padrao) +- `linear`: Interpolacao linear de `crepa_lambda` ate `crepa_lambda_end` +- `cosine`: Anelamento cosseno suave (recomendado para a maioria dos casos) +- `polynomial`: Decaimento polinomial com potencia configuravel via `crepa_power` + +### Corte baseado em steps + +Para um corte rigido apos um step especifico: + +```json +{ + "crepa_cutoff_step": 3000 +} +``` + +O CREPA e completamente desabilitado apos o step 3000. + +### Corte baseado em similaridade + +Esta e a abordagem mais flexivel: o CREPA desabilita automaticamente quando a metrica de similaridade estabiliza, indicando que o modelo aprendeu alinhamento temporal suficiente: + +```json +{ + "crepa_similarity_threshold": 0.9, + "crepa_similarity_ema_decay": 0.99, + "crepa_threshold_mode": "permanent" +} +``` + +- `crepa_similarity_threshold`: Quando a media movel exponencial da similaridade atinge este valor, o CREPA e cortado +- `crepa_similarity_ema_decay`: Fator de suavizacao (0.99 ≈ janela de 100 steps) +- `crepa_threshold_mode`: `permanent` (permanece desligado) ou `recoverable` (pode reabilitar se a similaridade cair) + +### Configuracoes recomendadas + +**Para image2video (i2v)**: +```json +{ + "crepa_scheduler": "constant", + "crepa_lambda": 0.5 +} +``` +O CREPA padrao funciona bem para i2v ja que o frame de referencia ancora a consistencia. + +**Para text2video (t2v)**: +```json +{ + "crepa_scheduler": "cosine", + "crepa_lambda": 0.5, + "crepa_warmup_steps": 100, + "crepa_decay_steps": 0, + "crepa_lambda_end": 0.1, + "crepa_similarity_threshold": 0.85, + "crepa_threshold_mode": "permanent" +} +``` +Decai o CREPA ao longo do treinamento e corta quando a similaridade satura para prevenir artefatos. + +**Para fundos solidos (t2v)**: +```json +{ + "crepa_cutoff_step": 2000 +} +``` +Corte antecipado previne artefatos de listras em fundos uniformes. +
Como funciona (pratico) diff --git a/documentation/experimental/VIDEO_CREPA.zh.md b/documentation/experimental/VIDEO_CREPA.zh.md index f5f21b4e9..5662f642f 100644 --- a/documentation/experimental/VIDEO_CREPA.zh.md +++ b/documentation/experimental/VIDEO_CREPA.zh.md @@ -48,6 +48,96 @@ Cross-frame Representation Alignment(CREPA)是视频模型的轻量正则项 - `crepa_use_backbone_features=true`:跳过外部编码器,改为对齐更深的 Transformer 块;通过 `crepa_teacher_block_index` 指定教师。 - 编码器大小:显存紧张可用 `dinov2_vits14` + `224`;追求质量建议 `dinov2_vitg14` + `518`。 +## 系数调度 + +CREPA 支持在训练过程中对系数(`crepa_lambda`)进行调度,包括预热、衰减以及基于相似度阈值的自动截止。这对于 text2video 训练尤其有用,因为如果 CREPA 应用过强或过久,可能会导致水平/垂直条纹或画面发灰。 + +### 基本调度 + +```json +{ + "crepa_enabled": true, + "crepa_lambda": 0.5, + "crepa_scheduler": "cosine", + "crepa_warmup_steps": 100, + "crepa_decay_steps": 5000, + "crepa_lambda_end": 0.0 +} +``` + +此配置: +1. 在前 100 步将 CREPA 权重从 0 预热到 0.5 +2. 使用余弦调度在 5000 步内从 0.5 衰减到 0.0 +3. 第 5100 步后,CREPA 实际上已被禁用 + +### 调度器类型 + +- `constant`:无衰减,权重保持在 `crepa_lambda`(默认) +- `linear`:从 `crepa_lambda` 到 `crepa_lambda_end` 的线性插值 +- `cosine`:平滑余弦退火(大多数情况推荐) +- `polynomial`:多项式衰减,可通过 `crepa_power` 配置幂次 + +### 基于步数的截止 + +若需在特定步数后硬截止: + +```json +{ + "crepa_cutoff_step": 3000 +} +``` + +第 3000 步后 CREPA 将完全禁用。 + +### 基于相似度的截止 + +这是最灵活的方式——当相似度指标趋于平稳(表明模型已学会足够的时间对齐)时,CREPA 自动禁用: + +```json +{ + "crepa_similarity_threshold": 0.9, + "crepa_similarity_ema_decay": 0.99, + "crepa_threshold_mode": "permanent" +} +``` + +- `crepa_similarity_threshold`:当相似度的指数移动平均达到此值时,CREPA 截止 +- `crepa_similarity_ema_decay`:平滑系数(0.99 ≈ 100 步窗口) +- `crepa_threshold_mode`:`permanent`(保持关闭)或 `recoverable`(相似度下降时可重新启用) + +### 推荐配置 + +**image2video (i2v)**: +```json +{ + "crepa_scheduler": "constant", + "crepa_lambda": 0.5 +} +``` +标准 CREPA 对 i2v 效果良好,因为参考帧可锚定一致性。 + +**text2video (t2v)**: +```json +{ + "crepa_scheduler": "cosine", + "crepa_lambda": 0.5, + "crepa_warmup_steps": 100, + "crepa_decay_steps": 0, + "crepa_lambda_end": 0.1, + "crepa_similarity_threshold": 0.85, + "crepa_threshold_mode": "permanent" +} +``` +在训练过程中衰减 CREPA,并在相似度饱和时截止以防止伪影。 + +**纯色背景 (t2v)**: +```json +{ + "crepa_cutoff_step": 2000 +} +``` +早期截止可防止均匀背景上的条纹伪影。 +
工作原理(实践视角) diff --git a/simpletuner/helpers/models/common.py b/simpletuner/helpers/models/common.py index 74f447cd4..f451ccc7c 100644 --- a/simpletuner/helpers/models/common.py +++ b/simpletuner/helpers/models/common.py @@ -59,6 +59,7 @@ build_gguf_quantization_config, get_pipeline_quantization_builder, ) +from simpletuner.helpers.training.state_tracker import StateTracker from simpletuner.helpers.training.wrappers import unwrap_model from simpletuner.helpers.utils import ramtorch as ramtorch_utils from simpletuner.helpers.utils.hidden_state_buffer import HiddenStateBuffer @@ -4728,7 +4729,14 @@ def _init_crepa_regularizer(self): if hidden_size is None: raise ValueError("CREPA enabled but unable to infer transformer hidden size.") - self.crepa_regularizer = CrepaRegularizer(self.config, self.accelerator, hidden_size, model_foundation=self) + max_train_steps = int(getattr(self.config, "max_train_steps", 0) or 0) + self.crepa_regularizer = CrepaRegularizer( + self.config, + self.accelerator, + hidden_size, + model_foundation=self, + max_train_steps=max_train_steps, + ) model_component = self.get_trained_component(unwrap_model=False) if model_component is None: raise ValueError("CREPA requires an attached diffusion model to register its projector.") @@ -4793,6 +4801,7 @@ def auxiliary_loss(self, model_output, prepared_batch: dict, loss: torch.Tensor) latents=prepared_batch.get("latents"), vae=self.get_vae(), frame_features=crepa_frame_features, + step=StateTracker.get_global_step(), ) if crepa_loss is not None: loss = loss + crepa_loss diff --git a/simpletuner/helpers/training/crepa.py b/simpletuner/helpers/training/crepa.py index 04f766136..29702370b 100644 --- a/simpletuner/helpers/training/crepa.py +++ b/simpletuner/helpers/training/crepa.py @@ -12,6 +12,116 @@ logger = logging.getLogger(__name__) +class CrepaScheduler: + """Schedules the CREPA coefficient (crepa_lambda) over training with warmup, decay, and cutoff support.""" + + def __init__(self, config, max_train_steps: int): + self.scheduler_type = str(getattr(config, "crepa_scheduler", "constant") or "constant").lower() + self.base_weight = float(getattr(config, "crepa_lambda", 0.5) or 0.0) + self.warmup_steps = int(getattr(config, "crepa_warmup_steps", 0) or 0) + raw_decay_steps = getattr(config, "crepa_decay_steps", 0) or 0 + self.decay_steps = int(raw_decay_steps) if int(raw_decay_steps) > 0 else max_train_steps + self.lambda_end = float(getattr(config, "crepa_lambda_end", 0.0) or 0.0) + self.cutoff_step = int(getattr(config, "crepa_cutoff_step", 0) or 0) + self.similarity_threshold = getattr(config, "crepa_similarity_threshold", None) + if self.similarity_threshold is not None: + self.similarity_threshold = float(self.similarity_threshold) + raw_ema_decay = getattr(config, "crepa_similarity_ema_decay", None) + self.similarity_ema_decay = float(raw_ema_decay) if raw_ema_decay is not None else 0.99 + self.threshold_mode = str(getattr(config, "crepa_threshold_mode", "permanent") or "permanent").lower() + self.power = float(getattr(config, "crepa_power", 1.0) or 1.0) + + self._similarity_ema: Optional[float] = None + self._cutoff_triggered = False + + def _compute_scheduled_weight(self, step: int) -> float: + """Compute the scheduled weight at the given step (without cutoff logic).""" + # Warmup phase: linear ramp from 0 to base_weight (applies to all scheduler types) + if self.warmup_steps > 0 and step < self.warmup_steps: + return self.base_weight * (step / self.warmup_steps) + + # Constant scheduler: no decay after warmup + if self.scheduler_type == "constant": + return self.base_weight + + # Decay phase: step relative to end of warmup + decay_step = step - self.warmup_steps + total_decay_steps = max(self.decay_steps - self.warmup_steps, 1) + progress = min(decay_step / total_decay_steps, 1.0) + + if self.scheduler_type == "linear": + return self.base_weight + (self.lambda_end - self.base_weight) * progress + elif self.scheduler_type == "cosine": + return self.lambda_end + (self.base_weight - self.lambda_end) * (1 + math.cos(math.pi * progress)) / 2 + elif self.scheduler_type == "polynomial": + return (self.base_weight - self.lambda_end) * ((1 - progress) ** self.power) + self.lambda_end + else: + return self.base_weight + + def _update_similarity_ema(self, similarity: Optional[float]) -> None: + """Update the exponential moving average of similarity.""" + if similarity is None: + return + if self._similarity_ema is None: + self._similarity_ema = similarity + else: + self._similarity_ema = ( + self.similarity_ema_decay * self._similarity_ema + (1 - self.similarity_ema_decay) * similarity + ) + + def _check_cutoff(self, step: int) -> bool: + """Check if cutoff conditions are met.""" + # Step-based cutoff + if self.cutoff_step > 0 and step >= self.cutoff_step: + return True + + # Similarity threshold cutoff + if self.similarity_threshold is not None and self._similarity_ema is not None: + if self._similarity_ema >= self.similarity_threshold: + return True + + return False + + def get_weight(self, step: int, similarity: Optional[float] = None) -> float: + """ + Get the scheduled weight for the given step. + + Args: + step: Current training step (global step from trainer/accelerator). + similarity: Current similarity value for EMA tracking (optional). + + Returns: + Scheduled CREPA coefficient weight (0.0 if cutoff is active). + """ + # Update similarity EMA + self._update_similarity_ema(similarity) + + # Handle cutoff logic + if self._cutoff_triggered and self.threshold_mode == "permanent": + return 0.0 + + cutoff_active = self._check_cutoff(step) + + if cutoff_active: + if self.threshold_mode == "permanent": + self._cutoff_triggered = True + return 0.0 + + # Recoverable mode: reset trigger if cutoff is no longer active + if self.threshold_mode == "recoverable" and self._cutoff_triggered: + self._cutoff_triggered = False + + return self._compute_scheduled_weight(step) + + def is_cutoff(self) -> bool: + """Check if CREPA is currently cut off.""" + return self._cutoff_triggered + + def get_similarity_ema(self) -> Optional[float]: + """Get the current similarity EMA value.""" + return self._similarity_ema + + class CrepaRegularizer: """Implements Cross-frame Representation Alignment (CREPA) as defined in Eq. (6) of the paper.""" @@ -22,6 +132,7 @@ def __init__( hidden_size: int, *, model_foundation: Optional["ModelFoundation"] = None, + max_train_steps: int = 0, ): self.config = config self.device = accelerator.device @@ -38,7 +149,10 @@ def __init__( self.tau = 1.0 if raw_tau is None else float(raw_tau) if self.tau <= 0: raise ValueError("crepa_adjacent_tau must be greater than zero.") - self.weight = float(getattr(config, "crepa_lambda", 0.5) or 0.0) + self.base_weight = float(getattr(config, "crepa_lambda", 0.5) or 0.0) + + # Initialize scheduler for coefficient scheduling + self.scheduler = CrepaScheduler(config, max_train_steps) if self.enabled else None # Prefer explicit crepa_model, fall back to legacy crepa_encoder name. raw_encoder = getattr(config, "crepa_model", None) or getattr(config, "crepa_encoder", None) self.encoder_name = self._resolve_encoder_name(raw_encoder) @@ -107,6 +221,7 @@ def compute_loss( vae: Optional[nn.Module] = None, *, frame_features: Optional[torch.Tensor] = None, + step: int = 0, ) -> Tuple[Optional[torch.Tensor], Optional[dict]]: if not self.enabled: return None, None @@ -123,7 +238,7 @@ def compute_loss( raise ValueError("CREPA backbone feature mode requires frame_features from the model.") if self.projector is None: raise RuntimeError("CREPA projector was not initialised on the diffusion model.") - if self.weight == 0: + if self.base_weight == 0: return None, None if not self.use_backbone_features: @@ -176,12 +291,37 @@ def compute_loss( if self.normalize_by_frames: per_video_sum = per_video_sum / float(num_frames) - align_loss = -per_video_sum.mean() * self.weight + # Get current similarity for EMA tracking + current_similarity = total_sim.mean().detach().item() + + # Get scheduled weight (handles warmup, decay, and cutoff) + if self.scheduler is not None: + scheduled_weight = self.scheduler.get_weight(step, similarity=current_similarity) + else: + scheduled_weight = self.base_weight + + # Early exit if weight is zero (cutoff active or decayed to zero) + if scheduled_weight == 0: + log_data = { + "crepa_loss": 0.0, + "crepa_similarity": current_similarity, + "crepa_weight": 0.0, + "crepa_cutoff": True, + } + if self.scheduler is not None and self.scheduler.get_similarity_ema() is not None: + log_data["crepa_similarity_ema"] = self.scheduler.get_similarity_ema() + return None, log_data + + align_loss = -per_video_sum.mean() * scheduled_weight log_data = { "crepa_loss": align_loss.detach().item(), - "crepa_similarity": total_sim.mean().detach().item(), + "crepa_similarity": current_similarity, + "crepa_weight": scheduled_weight, + "crepa_cutoff": False, } + if self.scheduler is not None and self.scheduler.get_similarity_ema() is not None: + log_data["crepa_similarity_ema"] = self.scheduler.get_similarity_ema() return align_loss, log_data # --------------------------- private helpers --------------------------- diff --git a/simpletuner/simpletuner_sdk/server/services/field_registry/sections/loss.py b/simpletuner/simpletuner_sdk/server/services/field_registry/sections/loss.py index a5b41c24e..a97b53ea9 100644 --- a/simpletuner/simpletuner_sdk/server/services/field_registry/sections/loss.py +++ b/simpletuner/simpletuner_sdk/server/services/field_registry/sections/loss.py @@ -409,6 +409,200 @@ def register_loss_fields(registry: "FieldRegistry") -> None: ) ) + # CREPA Scheduling Options + registry._add_field( + ConfigField( + name="crepa_scheduler", + arg_name="--crepa_scheduler", + ui_label="CREPA Scheduler", + field_type=FieldType.SELECT, + tab="training", + section="loss_functions", + default_value="constant", + choices=[ + {"value": "constant", "label": "Constant"}, + {"value": "linear", "label": "Linear Decay"}, + {"value": "cosine", "label": "Cosine Decay"}, + {"value": "polynomial", "label": "Polynomial Decay"}, + ], + dependencies=[FieldDependency(field="crepa_enabled", operator="equals", value=True)], + help_text="Schedule for CREPA coefficient decay over training. Constant keeps the weight fixed.", + tooltip="Use decay schedules to reduce CREPA regularization strength as training progresses.", + importance=ImportanceLevel.EXPERIMENTAL, + order=22, + documentation="OPTIONS.md#--crepa_scheduler", + ) + ) + + registry._add_field( + ConfigField( + name="crepa_warmup_steps", + arg_name="--crepa_warmup_steps", + ui_label="CREPA Warmup Steps", + field_type=FieldType.NUMBER, + tab="training", + section="loss_functions", + default_value=0, + validation_rules=[ValidationRule(ValidationRuleType.MIN, value=0, message="Must be non-negative")], + dependencies=[FieldDependency(field="crepa_enabled", operator="equals", value=True)], + help_text="Number of steps to linearly ramp CREPA weight from 0 to crepa_lambda.", + tooltip="Gradual warmup can help stabilize early training before CREPA regularization kicks in.", + importance=ImportanceLevel.EXPERIMENTAL, + order=23, + documentation="OPTIONS.md#--crepa_warmup_steps", + ) + ) + + registry._add_field( + ConfigField( + name="crepa_decay_steps", + arg_name="--crepa_decay_steps", + ui_label="CREPA Decay Steps", + field_type=FieldType.NUMBER, + tab="training", + section="loss_functions", + default_value=0, + validation_rules=[ValidationRule(ValidationRuleType.MIN, value=0, message="Must be non-negative")], + dependencies=[ + FieldDependency(field="crepa_enabled", operator="equals", value=True), + FieldDependency(field="crepa_scheduler", operator="not_equals", value="constant"), + ], + help_text="Total steps for decay (after warmup). 0 means decay over entire training run.", + tooltip="Controls the duration of the decay phase. Decay starts after warmup completes.", + importance=ImportanceLevel.EXPERIMENTAL, + order=24, + documentation="OPTIONS.md#--crepa_decay_steps", + ) + ) + + registry._add_field( + ConfigField( + name="crepa_lambda_end", + arg_name="--crepa_lambda_end", + ui_label="CREPA End Weight", + field_type=FieldType.NUMBER, + tab="training", + section="loss_functions", + default_value=0.0, + validation_rules=[ValidationRule(ValidationRuleType.MIN, value=0.0, message="Must be non-negative")], + dependencies=[ + FieldDependency(field="crepa_enabled", operator="equals", value=True), + FieldDependency(field="crepa_scheduler", operator="not_equals", value="constant"), + ], + help_text="Final CREPA weight after decay completes. 0 effectively disables CREPA at end of training.", + tooltip="The coefficient decays from crepa_lambda to this value over decay_steps.", + importance=ImportanceLevel.EXPERIMENTAL, + order=25, + documentation="OPTIONS.md#--crepa_lambda_end", + ) + ) + + registry._add_field( + ConfigField( + name="crepa_power", + arg_name="--crepa_power", + ui_label="CREPA Polynomial Power", + field_type=FieldType.NUMBER, + tab="training", + section="loss_functions", + default_value=1.0, + validation_rules=[ValidationRule(ValidationRuleType.MIN, value=0.1, message="Must be > 0")], + dependencies=[ + FieldDependency(field="crepa_enabled", operator="equals", value=True), + FieldDependency(field="crepa_scheduler", operator="equals", value="polynomial"), + ], + help_text="Power factor for polynomial decay. 1.0 = linear, 2.0 = quadratic, etc.", + tooltip="Higher values cause faster initial decay that slows down towards the end.", + importance=ImportanceLevel.EXPERIMENTAL, + order=26, + documentation="OPTIONS.md#--crepa_power", + ) + ) + + registry._add_field( + ConfigField( + name="crepa_cutoff_step", + arg_name="--crepa_cutoff_step", + ui_label="CREPA Cutoff Step", + field_type=FieldType.NUMBER, + tab="training", + section="loss_functions", + default_value=0, + validation_rules=[ValidationRule(ValidationRuleType.MIN, value=0, message="Must be non-negative")], + dependencies=[FieldDependency(field="crepa_enabled", operator="equals", value=True)], + help_text="Hard cutoff step after which CREPA is disabled. 0 means no step-based cutoff.", + tooltip="Useful for disabling CREPA after model has converged on temporal alignment.", + importance=ImportanceLevel.EXPERIMENTAL, + order=27, + documentation="OPTIONS.md#--crepa_cutoff_step", + ) + ) + + registry._add_field( + ConfigField( + name="crepa_similarity_threshold", + arg_name="--crepa_similarity_threshold", + ui_label="CREPA Similarity Threshold", + field_type=FieldType.NUMBER, + tab="training", + section="loss_functions", + validation_rules=[ + ValidationRule(ValidationRuleType.MIN, value=0.0, message="Must be between 0 and 1"), + ValidationRule(ValidationRuleType.MAX, value=1.0, message="Must be between 0 and 1"), + ], + dependencies=[FieldDependency(field="crepa_enabled", operator="equals", value=True)], + help_text="Similarity EMA threshold at which CREPA cutoff triggers. Leave empty to disable.", + tooltip="When the exponential moving average of similarity reaches this value, CREPA is disabled to prevent overfitting.", + importance=ImportanceLevel.EXPERIMENTAL, + order=28, + documentation="OPTIONS.md#--crepa_similarity_threshold", + ) + ) + + registry._add_field( + ConfigField( + name="crepa_similarity_ema_decay", + arg_name="--crepa_similarity_ema_decay", + ui_label="CREPA Similarity EMA Decay", + field_type=FieldType.NUMBER, + tab="training", + section="loss_functions", + default_value=0.99, + validation_rules=[ + ValidationRule(ValidationRuleType.MIN, value=0.0, message="Must be between 0 and 1"), + ValidationRule(ValidationRuleType.MAX, value=1.0, message="Must be between 0 and 1"), + ], + dependencies=[FieldDependency(field="crepa_similarity_threshold", operator="is_set", value=True)], + help_text="Exponential moving average decay factor for similarity tracking. Higher = smoother.", + tooltip="0.99 provides a ~100 step smoothing window. Lower values react faster to changes.", + importance=ImportanceLevel.EXPERIMENTAL, + order=29, + documentation="OPTIONS.md#--crepa_similarity_ema_decay", + ) + ) + + registry._add_field( + ConfigField( + name="crepa_threshold_mode", + arg_name="--crepa_threshold_mode", + ui_label="CREPA Threshold Mode", + field_type=FieldType.SELECT, + tab="training", + section="loss_functions", + default_value="permanent", + choices=[ + {"value": "permanent", "label": "Permanent"}, + {"value": "recoverable", "label": "Recoverable"}, + ], + dependencies=[FieldDependency(field="crepa_similarity_threshold", operator="is_set", value=True)], + help_text="Behavior when similarity threshold is reached: permanent disables forever, recoverable allows re-enabling.", + tooltip="Permanent: once threshold is hit, CREPA stays off. Recoverable: CREPA re-enables if similarity drops.", + importance=ImportanceLevel.EXPERIMENTAL, + order=30, + documentation="OPTIONS.md#--crepa_threshold_mode", + ) + ) + registry._add_field( ConfigField( name="twinflow_enabled", diff --git a/tests/test_crepa.py b/tests/test_crepa.py index c9dcacdce..7da356a96 100644 --- a/tests/test_crepa.py +++ b/tests/test_crepa.py @@ -1,9 +1,10 @@ +import math import unittest from types import SimpleNamespace import torch -from simpletuner.helpers.training.crepa import CrepaRegularizer +from simpletuner.helpers.training.crepa import CrepaRegularizer, CrepaScheduler class _DummyVAE(torch.nn.Module): @@ -109,5 +110,217 @@ def test_hidden_projection_casts_to_projector_dtype(self): self.assertEqual(projected.shape, (1, 2, 3, 4)) +class CrepaSchedulerTests(unittest.TestCase): + def _make_config(self, **kwargs): + defaults = { + "crepa_scheduler": "constant", + "crepa_lambda": 0.5, + "crepa_warmup_steps": 0, + "crepa_decay_steps": 0, + "crepa_lambda_end": 0.0, + "crepa_cutoff_step": 0, + "crepa_similarity_threshold": None, + "crepa_similarity_ema_decay": 0.99, + "crepa_threshold_mode": "permanent", + "crepa_power": 1.0, + } + defaults.update(kwargs) + return SimpleNamespace(**defaults) + + def test_constant_scheduler_returns_base_weight(self): + config = self._make_config(crepa_scheduler="constant", crepa_lambda=0.5) + scheduler = CrepaScheduler(config, max_train_steps=1000) + + self.assertAlmostEqual(scheduler.get_weight(0), 0.5) + self.assertAlmostEqual(scheduler.get_weight(500), 0.5) + self.assertAlmostEqual(scheduler.get_weight(1000), 0.5) + + def test_warmup_ramps_from_zero(self): + config = self._make_config( + crepa_scheduler="constant", + crepa_lambda=1.0, + crepa_warmup_steps=100, + ) + scheduler = CrepaScheduler(config, max_train_steps=1000) + + self.assertAlmostEqual(scheduler.get_weight(0), 0.0) + self.assertAlmostEqual(scheduler.get_weight(50), 0.5) + self.assertAlmostEqual(scheduler.get_weight(100), 1.0) + self.assertAlmostEqual(scheduler.get_weight(200), 1.0) + + def test_linear_decay(self): + config = self._make_config( + crepa_scheduler="linear", + crepa_lambda=1.0, + crepa_lambda_end=0.0, + crepa_warmup_steps=0, + crepa_decay_steps=100, + ) + scheduler = CrepaScheduler(config, max_train_steps=1000) + + self.assertAlmostEqual(scheduler.get_weight(0), 1.0) + self.assertAlmostEqual(scheduler.get_weight(50), 0.5) + self.assertAlmostEqual(scheduler.get_weight(100), 0.0) + + def test_cosine_decay(self): + config = self._make_config( + crepa_scheduler="cosine", + crepa_lambda=1.0, + crepa_lambda_end=0.0, + crepa_warmup_steps=0, + crepa_decay_steps=100, + ) + scheduler = CrepaScheduler(config, max_train_steps=1000) + + self.assertAlmostEqual(scheduler.get_weight(0), 1.0) + expected_midpoint = 0.5 # Cosine decay at 50% should give 0.5 + self.assertAlmostEqual(scheduler.get_weight(50), expected_midpoint, places=2) + self.assertAlmostEqual(scheduler.get_weight(100), 0.0, places=5) + + def test_polynomial_decay_with_power(self): + config = self._make_config( + crepa_scheduler="polynomial", + crepa_lambda=1.0, + crepa_lambda_end=0.0, + crepa_warmup_steps=0, + crepa_decay_steps=100, + crepa_power=2.0, + ) + scheduler = CrepaScheduler(config, max_train_steps=1000) + + self.assertAlmostEqual(scheduler.get_weight(0), 1.0) + # At 50%: (1 - 0.5)^2 = 0.25 + expected = (1.0 - 0.0) * ((1 - 0.5) ** 2.0) + 0.0 + self.assertAlmostEqual(scheduler.get_weight(50), expected, places=5) + self.assertAlmostEqual(scheduler.get_weight(100), 0.0, places=5) + + def test_step_cutoff(self): + config = self._make_config( + crepa_scheduler="constant", + crepa_lambda=1.0, + crepa_cutoff_step=50, + ) + scheduler = CrepaScheduler(config, max_train_steps=1000) + + self.assertAlmostEqual(scheduler.get_weight(0), 1.0) + self.assertAlmostEqual(scheduler.get_weight(49), 1.0) + self.assertAlmostEqual(scheduler.get_weight(50), 0.0) + self.assertAlmostEqual(scheduler.get_weight(100), 0.0) + + def test_similarity_threshold_permanent_cutoff(self): + config = self._make_config( + crepa_scheduler="constant", + crepa_lambda=1.0, + crepa_similarity_threshold=0.9, + crepa_similarity_ema_decay=0.0, # No smoothing for predictable tests + crepa_threshold_mode="permanent", + ) + scheduler = CrepaScheduler(config, max_train_steps=1000) + + # Below threshold - should return weight + self.assertAlmostEqual(scheduler.get_weight(0, similarity=0.5), 1.0) + self.assertAlmostEqual(scheduler.get_weight(1, similarity=0.8), 1.0) + + # At threshold - should trigger cutoff + self.assertAlmostEqual(scheduler.get_weight(2, similarity=0.95), 0.0) + + # Permanent: even if similarity drops, stays cut off + self.assertAlmostEqual(scheduler.get_weight(3, similarity=0.5), 0.0) + self.assertTrue(scheduler.is_cutoff()) + + def test_similarity_threshold_recoverable_cutoff(self): + config = self._make_config( + crepa_scheduler="constant", + crepa_lambda=1.0, + crepa_similarity_threshold=0.9, + crepa_similarity_ema_decay=0.0, # No smoothing for predictable tests + crepa_threshold_mode="recoverable", + ) + scheduler = CrepaScheduler(config, max_train_steps=1000) + + # Below threshold + self.assertAlmostEqual(scheduler.get_weight(0, similarity=0.5), 1.0) + + # At threshold - should return 0 + self.assertAlmostEqual(scheduler.get_weight(1, similarity=0.95), 0.0) + + # Recoverable: if similarity drops, CREPA re-enables + self.assertAlmostEqual(scheduler.get_weight(2, similarity=0.5), 1.0) + + def test_similarity_ema_tracking(self): + config = self._make_config( + crepa_scheduler="constant", + crepa_lambda=1.0, + crepa_similarity_threshold=0.95, + crepa_similarity_ema_decay=0.5, # Fast decay for testing + crepa_threshold_mode="permanent", + ) + scheduler = CrepaScheduler(config, max_train_steps=1000) + + # First call initializes EMA + scheduler.get_weight(0, similarity=0.5) + self.assertAlmostEqual(scheduler.get_similarity_ema(), 0.5) + + # Second call updates EMA: 0.5 * 0.5 + 0.5 * 0.9 = 0.7 + scheduler.get_weight(1, similarity=0.9) + self.assertAlmostEqual(scheduler.get_similarity_ema(), 0.7) + + def test_warmup_then_decay(self): + config = self._make_config( + crepa_scheduler="linear", + crepa_lambda=1.0, + crepa_lambda_end=0.0, + crepa_warmup_steps=100, + crepa_decay_steps=200, + ) + scheduler = CrepaScheduler(config, max_train_steps=1000) + + # Warmup phase + self.assertAlmostEqual(scheduler.get_weight(0), 0.0) + self.assertAlmostEqual(scheduler.get_weight(50), 0.5) + self.assertAlmostEqual(scheduler.get_weight(100), 1.0) + + # Decay phase: starts at step 100, ends at step 200 + # At step 150: (150-100)/(200-100) = 0.5 progress + self.assertAlmostEqual(scheduler.get_weight(150), 0.5) + self.assertAlmostEqual(scheduler.get_weight(200), 0.0) + + def test_combined_warmup_decay_cutoff(self): + config = self._make_config( + crepa_scheduler="linear", + crepa_lambda=1.0, + crepa_lambda_end=0.2, + crepa_warmup_steps=50, + crepa_decay_steps=150, + crepa_cutoff_step=100, + ) + scheduler = CrepaScheduler(config, max_train_steps=1000) + + # Warmup + self.assertAlmostEqual(scheduler.get_weight(25), 0.5) + + # After warmup, before cutoff + weight_at_75 = scheduler.get_weight(75) + self.assertGreater(weight_at_75, 0.2) + self.assertLess(weight_at_75, 1.0) + + # After cutoff + self.assertAlmostEqual(scheduler.get_weight(100), 0.0) + + def test_decay_steps_zero_uses_max_train_steps(self): + config = self._make_config( + crepa_scheduler="linear", + crepa_lambda=1.0, + crepa_lambda_end=0.0, + crepa_warmup_steps=0, + crepa_decay_steps=0, # Should use max_train_steps + ) + scheduler = CrepaScheduler(config, max_train_steps=100) + + self.assertAlmostEqual(scheduler.get_weight(0), 1.0) + self.assertAlmostEqual(scheduler.get_weight(50), 0.5) + self.assertAlmostEqual(scheduler.get_weight(100), 0.0) + + if __name__ == "__main__": unittest.main()