CFSynthesis: Controllable and Free-view 3D Human Video Synthesis

Liyuan Cui¹ Xiaogang Xu² Wenqi Dong¹ Zesong Yang¹ Zhaopeng Cui¹ Hujun Bao¹

¹Zhejiang University²The Chinese University of Hong Kong

ICMR 2025

Overeview

⚒️ Installation

Prerequisites: python>=3.10, CUDA>=11.7, and ffmpeg.

Install dependencies:

Tested GPUs: A100, We require at least 40 GB of GPU memory.

pip install -r requirements.txt

🚀 Training and Inference

Prepare Datasets

The data processing code is located in CFSynthesis/render_dataset.
Prepare your training data in the following format (we use ASIT as an example) in the corresponding folder:

render_dataset/path/to/datasets
  ├── gBR_sFM_c08_d06_mBR5
  │   ├── gBR_sFM_c08_d06_mBR5_0001.png
  │   ├── gBR_sFM_c08_d06_mBR5_0002.png
  │   ...
  ├── gLO_sFM_c01_d13_mLO1
  │   ├── gLO_sFM_c01_d13_mLO1_0001.png
  │   ├── gLO_sFM_c01_d13_mLO1_0002.png

⚠️ Note: You need to save the data at a resolution of 512×512.

We use the following tools (please ensure that all dependencies and pretrained checkpoints are properly set up):

UV map generation: SMPLitex
Segmentation: SemanticGuidedHumanMatting
3D pose estimation: 4D-Humans

Install Detectron2:

git clone https://github.com/facebookresearch/detectron2.git
python -m pip install -e detectron2
wget https://dl.fbaipublicfiles.com/densepose/densepose_rcnn_R_101_FPN_s1x/165712084/model_final_c6ab63.pkl \
  -P /path/to/detectron2/projects/DensePose/checkpoints/

Generate UV maps and backgrounds:

bash process.sh path/to/datasets /absolute/path/to/detectron2

Generate foregrounds:

python select_inference.py --input_img_path path/to/datasets/gt --save_path path/to/datasets/ref

python SemanticGuidedHumanMatting/seg_bg_image_folder.py \
    --images_dir path/to/datasets/ref \
    --result_dir path/to/datasets/ref_seg \
    --pretrained_weight SemanticGuidedHumanMatting/pretrained/SGHM-ResNet50.pth

You can now obtain the following data structure:

/path/to/datasets
  ├── gt
  │   ├── gBR_sFM_c08_d06_mBR5_0001.png
  │   ├── gBR_sFM_c08_d06_mBR5_0002.png
  │   ├── gLO_sFM_c01_d13_mLO1_0001.png
  │   ├── gLO_sFM_c01_d13_mLO1_0002.png
  │   ...
  ├── ref_control
  │   ├── gBR_sFM_c08_d06_mBR5_0001.png
  │   ├── gBR_sFM_c08_d06_mBR5_0002.png
  │   ├── gLO_sFM_c01_d13_mLO1_0001.png
  │   ├── gLO_sFM_c01_d13_mLO1_0002.png
  ├── cond
  │   ├── gBR_sFM_c08_d06_mBR5_0001.png
  │   ├── gBR_sFM_c08_d06_mBR5_0002.png
  │   ├── gLO_sFM_c01_d13_mLO1_0001.png
  │   ├── gLO_sFM_c01_d13_mLO1_0002.png
  ├── images-seg
  │   ├── gBR_sFM_c08_d06_mBR5_0001.png
  │   ├── gBR_sFM_c08_d06_mBR5_0002.png
  │   ├── gLO_sFM_c01_d13_mLO1_0001.png
  │   ├── gLO_sFM_c01_d13_mLO1_0002.png
  ├── ref
  │   ├── gLO_sFM_c01_d13_mLO1.png 
  │   ├── gLO_sFM_c08_d13_mLO1.png

Synthesize videos:

python tools/synthesize_video.py --root 4d_2/ --fps 30 --clean

/path/to/datasets
  ├── gt
  │   ├── gBR_sFM_c08_d06_mBR5.mp4
  │   ├── gLO_sFM_c01_d13_mLO1.mp4
  │   ...
  ├── ref_control
  │   ├── gBR_sFM_c08_d06_mBR5.mp4
  │   ├── gLO_sFM_c01_d13_mLO1.mp4
  ├── cond
  │   ├── gBR_sFM_c08_d06_mBR5.mp4
  │   ├── gLO_sFM_c01_d13_mLO1.mp4
  ├── images-seg
  │   ├── gBR_sFM_c08_d06_mBR5.mp4
  │   ├── gLO_sFM_c01_d13_mLO1.mp4
  ├── ref
  │   ├── gLO_sFM_c01_d13_mLO1.png 
  │   ├── gLO_sFM_c08_d13_mLO1.png

Inference

Run inference:

python -m scripts.pipeline.pose2vid \
  --config ./configs/animation/animation.yaml -W 512 -H 512 -L 96

🏋️‍♂️ Training

Data Preparation

Extract the meta info of your dataset:

python tools/extract_meta_info.py --root_path /path/to/your/video_dir/gt --dataset_name asit

Update the training config:

data:
  meta_paths:
    - "./data/asit_meta.json"

Stage 1

Download base models from Hugging Face.
We recommend using git lfs to download large files.

Place the models as follows:

pretrained_weights
|-- ckpts  
|   |-- denoising_unet.pth
|   |-- guidance_encoder_depth.pth
|   |-- guidance_encoder_dwpose.pth
|   |-- guidance_encoder_normal.pth
|   |-- guidance_encoder_semantic_map.pth
|   |-- reference_unet.pth
|-- control_v11p_sd15_openpose
|   |-- diffusion_pytorch_model.bin
|-- image_encoder
|   |-- config.json
|   `-- pytorch_model.bin
|-- sd-vae-ft-mse
|   |-- config.json
|   |-- diffusion_pytorch_model.bin
|   `-- diffusion_pytorch_model.safetensors
`-- stable-diffusion-v1-5
    |-- feature_extractor
    |   `-- preprocessor_config.json
    |-- model_index.json
    |-- unet
    |   |-- config.json
    |   `-- diffusion_pytorch_model.bin
    `-- v1-inference.yaml

Run Stage 1 training:

accelerate launch train_stage_1.py --config configs/train/stage1.yaml

Stage 2

Download the pretrained motion module weights
mm_sd_v15_v2.ckpt
and place it under ./pretrained_weights.

Specify Stage 1 weights in the config file stage2.yaml:

stage1_ckpt_dir: './exp_output/stage1'
stage1_ckpt_step: 30000

Run Stage 2 training:

accelerate launch train_stage_2.py --config configs/train/stage2.yaml

🙏 Acknowledgements

This project builds upon the excellent work of:

We thank the authors for releasing their code and models.

🎓 Citation

If you find this codebase useful, please cite:

@inproceedings{cui2025cfsynthesis,
  title={CFSynthesis: Controllable and Free-view 3D Human Video Synthesis},
  author={Cui, Liyuan and Xu, Xiaogang and Dong, Wenqi and Yang, Zesong and Bao, Hujun and Cui, Zhaopeng},
  booktitle={Proceedings of the 2025 International Conference on Multimedia Retrieval},
  pages={135--144},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
assets		assets
configs		configs
render_dataset		render_dataset
scripts/pipeline		scripts/pipeline
src		src
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
train_stage_1.py		train_stage_1.py
train_stage_2.py		train_stage_2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CFSynthesis: Controllable and Free-view 3D Human Video Synthesis

Overeview

⚒️ Installation

🚀 Training and Inference

Prepare Datasets

Inference

🏋️‍♂️ Training

Data Preparation

Stage 1

Stage 2

🙏 Acknowledgements

🎓 Citation

About

Uh oh!

Releases

Packages

Languages

License

zju3dv/CFSynthesis

Folders and files

Latest commit

History

Repository files navigation

CFSynthesis: Controllable and Free-view 3D Human Video Synthesis

Overeview

⚒️ Installation

🚀 Training and Inference

Prepare Datasets

Inference

🏋️‍♂️ Training

Data Preparation

Stage 1

Stage 2

🙏 Acknowledgements

🎓 Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages