kDiT

High-Performance DiT (Diffusion Transformer) Inference Framework for Video & Image Generation

📖 Introduction

kDiT is a high-performance inference framework specifically designed for Diffusion Transformers (DiT), supporting video generation (T2V/I2V) and image generation (T2I) tasks. The framework provides a rich set of optimization techniques and flexible configuration options, enabling efficient execution of large-scale DiT models on single or multi-GPU environments.

✨ Key Features

🚀 High-Performance Inference: FP8 quantization, QKV Fuse, Torch Compile, and various attention optimizations
🎯 Multiple Attention Backends: SLA Attention, Flash Attention, Sage Attention, Radial Sage Attention, Torch SDPA
🎬 Multi-Modal Generation: Text-to-Video (T2V), Image-to-Video (I2V), Video Controllable Editing (Vace), Text-to-Image (T2I)
💾 Smart Caching: Built-in caching strategies (DBCache, EasyCache, MagCache, TeaCache, CustomStepCache, HybridCache)
🔧 Flexible Configuration: LoRA support, multiple samplers (Euler, UniPC, DPM++), custom sigma scheduling
🌐 Distributed Support: Single-GPU, multi-GPU (torchrun), Ray distributed inference, Model Pool management
🔌 ComfyUI Integration: ComfyUI node support (standalone submodule) for visual workflow design
🖥️ Multi-Platform Support: GPU, NPU, XPU (WIP)

📦 Supported Models

Video Generation Models

Model	Type	Parameters	Tasks	Status
Turbo Diffusion	Image-to-Video	14B	I2V	✅
Wan2.2-T2V	Text-to-Video	5B/14B	T2V	✅
Wan2.2-I2V	Image-to-Video	14B	I2V	✅
Wan2.1-Vace	Video Controllable Editing	14B	Vace	✅

Image Generation Models

Model	Type	Parameters	Tasks	Status
Qwen-Image	Text-to-Image	20B	T2I	✅
Qwen-Image Edit	Image Editing	20B	Image Edit	✅

🛠️ Installation

Docker

We are actively working on Dockerfiles. Stay tuned!

Requirements

Python: >= 3.10, < 4.0
PyTorch: >= 2.0
GPU Environment:
- CUDA >= 12.8
- Recommended: NVIDIA GPUs
NPU Environment:
- CANN >= 8.0
- torch_npu adapter

Installation Steps

# Clone the repository
git clone https://github.com/Tencent/KsanaDiT.git
cd KsanaDiT

# Run the installation script (automatically handles all dependencies)
bash scripts/install_public.sh

The installation script will automatically detect your hardware environment and install the appropriate dependencies.

🔌 Interface Support

kDiT provides multiple usage methods to meet different scenario requirements:

Local Pipeline Mode

Run locally through the Python Pipeline API, suitable for scripted batch generation or integration into your own systems:

from kdit import Pipeline

# Create inference pipeline
pipeline = Pipeline.from_models("path/to/model")

# Generate video/image
result = pipeline.generate(prompt, ...)

For detailed usage, refer to Quick Start and the examples directory.

ComfyUI Integration

kDiT supports usage as ComfyUI custom nodes, providing a visual workflow experience:

# 1. Clone the kDiT repository
git clone https://github.com/Tencent/KsanaDiT.git

# 2. Enter the kDiT directory and run the install script
cd KsanaDiT
./scripts/install_public.sh

During installation, the script will interactively prompt you to enter the ComfyUI installation root directory. After installation, restart ComfyUI and you will see kDiT-related nodes in the node list.

🚀 Quick Start

For detailed code examples, refer to examples.

Text-to-Video (T2V)

import torch
from kdit import Pipeline
from kdit.config import (
    DistributedConfig,
    RuntimeConfig,
    SampleConfig,
)

# Create inference pipeline
pipeline = Pipeline.from_models(
    "path/to/Wan2.2-T2V-A14B",
    dist_config=DistributedConfig(num_gpus=1)
)

# Generate video
video = pipeline.generate(
    "Street photography, cool girl with headphones skateboarding, New York streets, graffiti wall background",
    sample_config=SampleConfig(steps=40),
    runtime_config=RuntimeConfig(
        seed=1234,
        size=(720, 480),
        frame_num=17,
        return_frames=True,
    ),
)

print(f"Generated video shape: {video.shape}")

Image-to-Video (I2V)

from kdit import Pipeline
from kdit.config import RuntimeConfig, SampleConfig
from kdit.pipelines.context_builders.wan import WanI2VExtraInputs

pipeline = Pipeline.from_models("path/to/Wan2.2-I2V-A14B")

video = pipeline.generate(
    "Girl gently waves her fan, blows a breath of fairy air, lightning flies from her hand into the sky and thunder begins",
    extra_inputs=WanI2VExtraInputs(start_img_path="input.png"),
    sample_config=SampleConfig(steps=40),
    runtime_config=RuntimeConfig(
        seed=1234,
        size=(512, 512),
        frame_num=17,
    ),
)

Turbo Diffusion

See run_turbo_diffusion

Text-to-Image (T2I)

import torch
from kdit import Pipeline
from kdit.config import (
    ModelConfig,
    RuntimeConfig,
    SampleConfig,
    SolverType,
)

pipeline = Pipeline.from_models(
    "path/to/Qwen-Image",
    model_config=ModelConfig(run_dtype=torch.bfloat16),
)

image = pipeline.generate(
    "A cute orange cat sitting on a windowsill, sunlight streaming through the window onto its fur",
    sample_config=SampleConfig(
        steps=20,
        cfg_scale=4.0,
        solver=SolverType.FLOWMATCH_EULER,
    ),
    runtime_config=RuntimeConfig(
        seed=42,
        size=(1024, 1024),
    ),
)

🎯 Advanced Features

FP8 Quantized Inference

import torch
from kdit import Pipeline
from kdit.config import (
    ModelConfig,
    KsanaAttentionConfig,
    KsanaAttentionBackend,
    KsanaLinearBackend,
)

model_config = ModelConfig(
    run_dtype=torch.float16,
    attention_config=KsanaAttentionConfig(backend=KsanaAttentionBackend.SAGE_ATTN),
    linear_backend=KsanaLinearBackend.FP8_GEMM,
)

pipeline = Pipeline.from_models(
    ("high_noise_fp8.safetensors", "low_noise_fp8.safetensors"),
    model_config=model_config,
)

LoRA Accelerated Inference

from kdit import Pipeline
from kdit.config import LoraConfig, SampleConfig

pipeline = Pipeline.from_models(
    "path/to/Wan2.2-T2V-A14B",
    lora_config=LoraConfig("path/to/Wan2.2-Lightning-4steps-lora"),
)

# Fast generation with 4 steps
video = pipeline.generate(
    prompt,
    sample_config=SampleConfig(
        steps=4,
        cfg_scale=1.0,
        sigmas=[1.0, 0.9375, 0.6333, 0.225, 0.0],
    ),
)

Smart Cache Optimization - Under Active Development

from kdit.config.cache_config import (
    DCacheConfig,
    DBCacheConfig,
    HybridCacheConfig,
)

# Use hybrid caching strategy
cache_config = HybridCacheConfig(
    step_cache=DCacheConfig(fast_degree=50),
    block_cache=DBCacheConfig(),
)

video = pipeline.generate(
    prompt,
    cache_config=cache_config,
)

Multi-GPU Distributed Inference

# Method 1: Using CUDA_VISIBLE_DEVICES
CUDA_VISIBLE_DEVICES=0,1,2,3 python your_script.py

# Method 2: Using torchrun
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 your_script.py

from kdit import Pipeline
from kdit.config import DistributedConfig

pipeline = Pipeline.from_models(
    model_path,
    dist_config=DistributedConfig(num_gpus=4),
)

📊 Performance Optimization Techniques

Quantization & Compute Optimization

Technique	Description	Effect
FP8 GEMM	FP8 quantized matrix multiplication	Reduced memory, improved speed
Torchao FP8 Dynamic	Dynamic FP8 quantization	Adaptive precision, balanced quality and performance
QKV Fuse	QKV projection fusion	Reduced memory access, improved throughput
torch.compile	Graph compilation optimization	10-30% end-to-end speedup

Attention Backends

Backend	Characteristics	Use Case
Flash Attention	High performance, memory efficient	General recommendation
Sage Attention	Optimized attention computation	Long sequences
Sage SLA	Top-k sparse attention	Turbo Diffusion
Radial Sage Attention	Radial sparse attention	Very long sequences
Torch SDPA	PyTorch native implementation	Compatibility priority

Caching Strategies

Strategy	Description	Use Case
DCache	Step-level caching with degree-based polynomial	General video generation
TeaCache	Temporal-aware step-level caching	Video generation optimization
MagCache	Adaptive step-level caching	Balanced quality and speed
EasyCache	Lightweight step-level caching without pre-prepared parameters	Fast inference with minimal overhead
DBCache	Block-level caching	Image generation
HybridCache	Step-level + block-level hybrid caching	Maximum acceleration

Samplers

Sampler	Description	Use Case
Euler	Fast sampling	4-8 step inference
UniPC	High-quality sampling	20-40 step inference
DPM++	Efficient multi-step sampling	General purpose
Turbo Diffusion	Ultra-fast sampling	4-step inference
FlowMatch Euler	Flow matching sampling	Image generation

🔧 Configuration

Environment Variables

# Log level: debug/info/warn/error
export KSANA_LOGGER_LEVEL=info

Model Configuration

The framework supports model parameter configuration via YAML files, located in the kdit/settings/ directory:

qwen/t2i_20b.yaml - Qwen image generation model config
qwen/edit_20b.yaml - Qwen image editing model config
wan/t2v_14b.yaml - Wan2.2 T2V model config
wan/ti2v_5b.yaml - Wan2.2 TI2V 5B model config
wan/i2v_14b.yaml - Wan2.2 I2V model config
wan/vace_14b.yaml - Wan2.1 Vace model config

📚 Code Examples

Complete example code is available in the examples/ directory:

examples/local/wan/wan2_2_t2v.py - Text-to-Video example
examples/local/wan/wan2_2_i2v.py - Image-to-Video example
examples/local/wan/wan2_1_vace.py - Video controllable editing example
examples/local/qwen/qwen_image_t2i.py - Text-to-Image example
examples/local/qwen/qwen_image_edit.py - Image Editing example

🧪 Testing

We have comprehensive test coverage. Tests are currently time-consuming; we will continue to streamline them. For developers only.

# Run all tests
pytest tests/

# Run specific tests
pytest tests/kdit/pipelines/wan2_2_t2v_test.py

# Run GPU tests
bash scripts/ci_tests/ci_kdit_gpus.sh

🤝 Contributing

We welcome community contributions! Before submitting a PR, please ensure:

Code passes all tests
Follows project code style (using git commit hook)
Includes necessary documentation and comments
Updates relevant README and examples

# Install development dependencies
pip install -e ".[dev]"

# Run code style checks
pre-commit run --all-files

# Run tests
pytest tests/

📋 Changelog

For a detailed list of changes in each version, see the CHANGELOG.

📄 License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

🙏 Acknowledgments

This project benefits from the following excellent open-source projects:

Wan-Video - Wan2.2 video generation model
ComfyUI-WanVideoWrapper - ComfyUI integration reference
FastVideo - Video generation optimization techniques
Nunchaku - Quantization optimization solutions
TurboDiffusion - Inference acceleration solutions

📮 Contact

Bug Reports: GitHub Issues
Feature Requests: GitHub Discussions

🗺️ Roadmap

Completed ✅

In Progress 🚧

Support for more generation models (Z-Image, Hunyuan, etc.)
Memory optimization for longer video generation
Cache strategy performance tuning
Model quantization toolchain
XPU full feature support optimization

If this project helps you, please give us a ⭐️ Star!

Made with ❤️ by the kDiT Team

Name		Name	Last commit message	Last commit date
Latest commit History 232 Commits
.claude		.claude
.roo/skills		.roo/skills
.skills		.skills
benchmark/comfyui/benchmarks		benchmark/comfyui/benchmarks
docker		docker
docs		docs
examples		examples
kdit		kdit
scripts		scripts
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
README_cn.md		README_cn.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

kDiT

📖 Introduction

✨ Key Features

📦 Supported Models

Video Generation Models

Image Generation Models

🛠️ Installation

Docker

Requirements

Installation Steps

🔌 Interface Support

Local Pipeline Mode

ComfyUI Integration

🚀 Quick Start

Text-to-Video (T2V)

Image-to-Video (I2V)

Turbo Diffusion

Text-to-Image (T2I)

🎯 Advanced Features

FP8 Quantized Inference

LoRA Accelerated Inference

Smart Cache Optimization - Under Active Development

Multi-GPU Distributed Inference

📊 Performance Optimization Techniques

Quantization & Compute Optimization

Attention Backends

Caching Strategies

Samplers

🔧 Configuration

Environment Variables

Model Configuration

📚 Code Examples

🧪 Testing

🤝 Contributing

📋 Changelog

📄 License

🙏 Acknowledgments

📮 Contact

🗺️ Roadmap

Completed ✅

In Progress 🚧

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages