Prerequisites: python>=3.10
, CUDA>=11.7
, and ffmpeg
.
Install dependencies:
- Tested GPUs: A100, We require at least 40 GB of GPU memory.
pip install -r requirements.txt
The data processing code is located in CFSynthesis/render_dataset
.
Prepare your training data in the following format (we use ASIT as an example) in the corresponding folder:
render_dataset/path/to/datasets
├── gBR_sFM_c08_d06_mBR5
│ ├── gBR_sFM_c08_d06_mBR5_0001.png
│ ├── gBR_sFM_c08_d06_mBR5_0002.png
│ ...
├── gLO_sFM_c01_d13_mLO1
│ ├── gLO_sFM_c01_d13_mLO1_0001.png
│ ├── gLO_sFM_c01_d13_mLO1_0002.png
We use the following tools (please ensure that all dependencies and pretrained checkpoints are properly set up):
- UV map generation: SMPLitex
- Segmentation: SemanticGuidedHumanMatting
- 3D pose estimation: 4D-Humans
Install Detectron2:
git clone https://github.com/facebookresearch/detectron2.git
python -m pip install -e detectron2
wget https://dl.fbaipublicfiles.com/densepose/densepose_rcnn_R_101_FPN_s1x/165712084/model_final_c6ab63.pkl \
-P /path/to/detectron2/projects/DensePose/checkpoints/
Generate UV maps and backgrounds:
bash process.sh path/to/datasets /absolute/path/to/detectron2
Generate foregrounds:
python select_inference.py --input_img_path path/to/datasets/gt --save_path path/to/datasets/ref
python SemanticGuidedHumanMatting/seg_bg_image_folder.py \
--images_dir path/to/datasets/ref \
--result_dir path/to/datasets/ref_seg \
--pretrained_weight SemanticGuidedHumanMatting/pretrained/SGHM-ResNet50.pth
You can now obtain the following data structure:
/path/to/datasets
├── gt
│ ├── gBR_sFM_c08_d06_mBR5_0001.png
│ ├── gBR_sFM_c08_d06_mBR5_0002.png
│ ├── gLO_sFM_c01_d13_mLO1_0001.png
│ ├── gLO_sFM_c01_d13_mLO1_0002.png
│ ...
├── ref_control
│ ├── gBR_sFM_c08_d06_mBR5_0001.png
│ ├── gBR_sFM_c08_d06_mBR5_0002.png
│ ├── gLO_sFM_c01_d13_mLO1_0001.png
│ ├── gLO_sFM_c01_d13_mLO1_0002.png
├── cond
│ ├── gBR_sFM_c08_d06_mBR5_0001.png
│ ├── gBR_sFM_c08_d06_mBR5_0002.png
│ ├── gLO_sFM_c01_d13_mLO1_0001.png
│ ├── gLO_sFM_c01_d13_mLO1_0002.png
├── images-seg
│ ├── gBR_sFM_c08_d06_mBR5_0001.png
│ ├── gBR_sFM_c08_d06_mBR5_0002.png
│ ├── gLO_sFM_c01_d13_mLO1_0001.png
│ ├── gLO_sFM_c01_d13_mLO1_0002.png
├── ref
│ ├── gLO_sFM_c01_d13_mLO1.png
│ ├── gLO_sFM_c08_d13_mLO1.png
Synthesize videos:
python tools/synthesize_video.py --root 4d_2/ --fps 30 --clean
/path/to/datasets
├── gt
│ ├── gBR_sFM_c08_d06_mBR5.mp4
│ ├── gLO_sFM_c01_d13_mLO1.mp4
│ ...
├── ref_control
│ ├── gBR_sFM_c08_d06_mBR5.mp4
│ ├── gLO_sFM_c01_d13_mLO1.mp4
├── cond
│ ├── gBR_sFM_c08_d06_mBR5.mp4
│ ├── gLO_sFM_c01_d13_mLO1.mp4
├── images-seg
│ ├── gBR_sFM_c08_d06_mBR5.mp4
│ ├── gLO_sFM_c01_d13_mLO1.mp4
├── ref
│ ├── gLO_sFM_c01_d13_mLO1.png
│ ├── gLO_sFM_c08_d13_mLO1.png
Run inference:
python -m scripts.pipeline.pose2vid \
--config ./configs/animation/animation.yaml -W 512 -H 512 -L 96
Extract the meta info of your dataset:
python tools/extract_meta_info.py --root_path /path/to/your/video_dir/gt --dataset_name asit
Update the training config:
data:
meta_paths:
- "./data/asit_meta.json"
Download base models from Hugging Face.
We recommend using git lfs
to download large files.
Place the models as follows:
pretrained_weights
|-- ckpts
| |-- denoising_unet.pth
| |-- guidance_encoder_depth.pth
| |-- guidance_encoder_dwpose.pth
| |-- guidance_encoder_normal.pth
| |-- guidance_encoder_semantic_map.pth
| |-- reference_unet.pth
|-- control_v11p_sd15_openpose
| |-- diffusion_pytorch_model.bin
|-- image_encoder
| |-- config.json
| `-- pytorch_model.bin
|-- sd-vae-ft-mse
| |-- config.json
| |-- diffusion_pytorch_model.bin
| `-- diffusion_pytorch_model.safetensors
`-- stable-diffusion-v1-5
|-- feature_extractor
| `-- preprocessor_config.json
|-- model_index.json
|-- unet
| |-- config.json
| `-- diffusion_pytorch_model.bin
`-- v1-inference.yaml
Run Stage 1 training:
accelerate launch train_stage_1.py --config configs/train/stage1.yaml
Download the pretrained motion module weights
mm_sd_v15_v2.ckpt
and place it under ./pretrained_weights
.
Specify Stage 1 weights in the config file stage2.yaml
:
stage1_ckpt_dir: './exp_output/stage1'
stage1_ckpt_step: 30000
Run Stage 2 training:
accelerate launch train_stage_2.py --config configs/train/stage2.yaml
This project builds upon the excellent work of:
We thank the authors for releasing their code and models.
If you find this codebase useful, please cite:
@inproceedings{cui2025cfsynthesis,
title={CFSynthesis: Controllable and Free-view 3D Human Video Synthesis},
author={Cui, Liyuan and Xu, Xiaogang and Dong, Wenqi and Yang, Zesong and Bao, Hujun and Cui, Zhaopeng},
booktitle={Proceedings of the 2025 International Conference on Multimedia Retrieval},
pages={135--144},
year={2025}
}