support Wan2.2-Fun-5B-Control[-Camera] inference #930

lifeiteng · 2025-09-19T16:26:38Z

No description provided.

gemini-code-assist · 2025-09-19T16:26:53Z

Summary of Changes

Hello @lifeiteng, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request expands the system's capabilities by integrating two new video generation models, 'Wan2.2-Fun-5B-Control' and 'Wan2.2-Fun-5B-Camera-Control'. It refines core components to allow for more flexible model configurations and dynamic handling of video dimensions, ensuring broader compatibility and easier integration of future models. The changes also provide practical examples for immediate use of these new models.

Highlights

New Model Support: Added configurations and support for two new video generation models: 'Wan2.2-Fun-5B-Control' and 'Wan2.2-Fun-5B-Control-Camera'.
Flexible Adapter Configuration: Enhanced the SimpleAdapter class to accept a configurable downscale_factor, making it more versatile for different model architectures.
Dynamic Spatial Dimension Calculation: Updated the video pipeline to use dynamic division factors for spatial dimensions in mask and latent tensor calculations, improving adaptability.
New Inference Examples: Included new example scripts demonstrating how to perform inference with the newly supported 'Wan2.2-Fun-5B-Control' and 'Wan2.2-Fun-5B-Control-Camera' models.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

lifeiteng · 2025-09-19T16:27:37Z

VAE encoding: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.72it/s]100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:24<00:00,  2.02it/s]VAE decoding: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.12s/it]Saving video:   0%|                                                                                                  | 0/81 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Saving video: 100%|████████████████████████████████████████████████████████████████████████████████████████| 81/81 [00:00<00:00, 136.69it/s]
VAE encoding: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.72it/s]VAE encoding: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.72it/s]100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:24<00:00,  2.02it/s]VAE decoding: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.09s/it]Saving video:   0%|                                                                                                  | 0/81 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
        - Avoid using `tokenizers` before the fork if possible
        - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Saving video: 100%|████████████████████████████████████████████████████████████████████████████████████████| 81/81 [00:00<00:00, 255.08it/s]

gemini-code-assist

Code Review

This pull request adds support for Wan2.2-Fun-5B-Control and Wan2.2-Fun-5B-Control-Camera models. The changes include adding model configurations, making the model definitions more flexible, and refactoring the pipeline to accommodate different VAEs. New example scripts are also included. The changes are generally good, but I've identified a few minor issues related to code style, an outdated comment, and an opportunity to refactor duplicated code in one of the new examples for better maintainability.

gemini-code-assist · 2025-09-19T16:28:39Z

diffsynth/configs/model_config.py

    (None, "5b013604280dd715f8457c6ed6d6a626", ["wan_video_dit"], [WanModel], "civitai"),
    (None, "2267d489f0ceb9f21836532952852ee5", ["wan_video_dit"], [WanModel], "civitai"),
+    (None, "8cf5720f1d99f2d3d9f4d059c99f7e25", ["wan_video_dit"], [WanModel], "civitai"),
+    (None, "0e2ab7dec4711919374f3d7ffdea90be", ["wan_video_dit"], [WanModel], "civitai"),    


There is trailing whitespace on this line which should be removed to maintain code style consistency.

Suggested change

(None, "0e2ab7dec4711919374f3d7ffdea90be", ["wan_video_dit"], [WanModel], "civitai"),

(None, "0e2ab7dec4711919374f3d7ffdea90be", ["wan_video_dit"], [WanModel], "civitai"),

gemini-code-assist · 2025-09-19T16:28:40Z

diffsynth/models/wan_video_camera_controller.py

+    def __init__(self, in_dim, out_dim, kernel_size, stride, downscale_factor=8, num_residual_blocks=1):
        super(SimpleAdapter, self).__init__()

        # Pixel Unshuffle: reduce spatial dimensions by a factor of 8


The comment is now outdated as the downscale_factor is a parameter. Please update the comment to reflect this change for better code clarity.

Suggested change

# Pixel Unshuffle: reduce spatial dimensions by a factor of 8

# Pixel Unshuffle: reduce spatial dimensions by a configurable factor

gemini-code-assist · 2025-09-19T16:28:40Z

diffsynth/pipelines/wan_video_new.py

+            y = torch.zeros((1, y_dim, (num_frames - 1) // 4 + 1, height//height_division_factor, width//width_division_factor), dtype=pipe.torch_dtype, device=pipe.device)
        else:
            y = y[:, -y_dim:]
+


This blank line is unnecessary and can be removed to improve code conciseness.

gemini-code-assist · 2025-09-19T16:28:40Z

examples/wanvideo/model_inference/Wan2.2-Fun-5B-Control-Camera.py

+dataset_snapshot_download(
+    dataset_id="DiffSynth-Studio/examples_in_diffsynth",
+    local_dir="./",
+    allow_file_pattern=f"data/examples/wan/input_image.jpg"


This f-string does not contain any expressions and can be converted to a regular string.

Suggested change

allow_file_pattern=f"data/examples/wan/input_image.jpg"

allow_file_pattern="data/examples/wan/input_image.jpg"

gemini-code-assist · 2025-09-19T16:28:40Z

examples/wanvideo/model_inference/Wan2.2-Fun-5B-Control-Camera.py

+video = pipe(
+    prompt="一艘小船正勇敢地乘风破浪前行。蔚蓝的大海波涛汹涌，白色的浪花拍打着船身，但小船毫不畏惧，坚定地驶向远方。阳光洒在水面上，闪烁着金色的光芒，为这壮丽的场景增添了一抹温暖。镜头拉近，可以看到船上的旗帜迎风飘扬，象征着不屈的精神与冒险的勇气。这段画面充满力量，激励人心，展现了面对挑战时的无畏与执着。",
+    negative_prompt="色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走",
+    seed=0, tiled=True,
+    input_image=input_image,
+    camera_control_direction="Left", camera_control_speed=0.01,
+)
+save_video(video, "video_left.mp4", fps=15, quality=5)
+
+video = pipe(
+    prompt="一艘小船正勇敢地乘风破浪前行。蔚蓝的大海波涛汹涌，白色的浪花拍打着船身，但小船毫不畏惧，坚定地驶向远方。阳光洒在水面上，闪烁着金色的光芒，为这壮丽的场景增添了一抹温暖。镜头拉近，可以看到船上的旗帜迎风飘扬，象征着不屈的精神与冒险的勇气。这段画面充满力量，激励人心，展现了面对挑战时的无畏与执着。",
+    negative_prompt="色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走",
+    seed=0, tiled=True,
+    input_image=input_image,
+    camera_control_direction="Up", camera_control_speed=0.01,
+)
+save_video(video, "video_up.mp4", fps=15, quality=5)


The two calls to pipe are almost identical, with only camera_control_direction and the output filename changing. This duplicated code can be refactored into a loop to improve readability and maintainability.

common_args = { "prompt": "一艘小船正勇敢地乘风破浪前行。蔚蓝的大海波涛汹涌，白色的浪花拍打着船身，但小船毫不畏惧，坚定地驶向远方。阳光洒在水面上，闪烁着金色的光芒，为这壮丽的场景增添了一抹温暖。镜头拉近，可以看到船上的旗帜迎风飘扬，象征着不屈的精神与冒险的勇气。这段画面充满力量，激励人心，展现了面对挑战时的无畏与执着。", "negative_prompt": "色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走", "seed": 0, "tiled": True, "input_image": input_image, "camera_control_speed": 0.01, } for direction in ["Left", "Up"]: video = pipe( **common_args, camera_control_direction=direction, ) save_video(video, f"video_{direction.lower()}.mp4", fps=15, quality=5)

support Wan2.2-Fun-5B-Control[-Camera] inference

cad5e92

gemini-code-assist bot reviewed Sep 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

support Wan2.2-Fun-5B-Control[-Camera] inference #930

support Wan2.2-Fun-5B-Control[-Camera] inference #930

Uh oh!

lifeiteng commented Sep 19, 2025

Uh oh!

gemini-code-assist bot commented Sep 19, 2025

Uh oh!

lifeiteng commented Sep 19, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 19, 2025

Uh oh!

gemini-code-assist bot Sep 19, 2025

Uh oh!

gemini-code-assist bot Sep 19, 2025

Uh oh!

gemini-code-assist bot Sep 19, 2025

Uh oh!

gemini-code-assist bot Sep 19, 2025

Uh oh!

Uh oh!

	(None, "0e2ab7dec4711919374f3d7ffdea90be", ["wan_video_dit"], [WanModel], "civitai"),
	(None, "0e2ab7dec4711919374f3d7ffdea90be", ["wan_video_dit"], [WanModel], "civitai"),

	# Pixel Unshuffle: reduce spatial dimensions by a factor of 8
	# Pixel Unshuffle: reduce spatial dimensions by a configurable factor

	allow_file_pattern=f"data/examples/wan/input_image.jpg"
	allow_file_pattern="data/examples/wan/input_image.jpg"

support Wan2.2-Fun-5B-Control[-Camera] inference #930

Are you sure you want to change the base?

support Wan2.2-Fun-5B-Control[-Camera] inference #930

Uh oh!

Conversation

lifeiteng commented Sep 19, 2025

Uh oh!

gemini-code-assist bot commented Sep 19, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

lifeiteng commented Sep 19, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!