a tool for leveraging text2image tools such as StableDiffusion for generating videos.
Mainly based on PyTTI and deforum, but adding and focusing around:
- designed around a CLI interface with well-structured and very versatile YAML configuration
- audio-reactivity and other multi-modality to come
- PyTTI style functions for most generative parameters
- extensible design for adding arbitrary mechanisms for generating animations
- prerequisite: Install python3.10+
- create venv:
python3 -m venv venv - source venv:
source venv/Scripts/activate(orsource venv/bin/activateon linux/macOS) - install requirements:
pip install -r requirements.txt - get the remaining dependencies that don't come with wheels/proper build systems:
./init.sh(or just copy the fewgit clonecommands from the file on windows)
- use https://github.com/AUTOMATIC1111/stable-diffusion-webui
- add the
--apistartup flag (in webui.sh or webui.bat or however you launch it) to expose the REST API (you can verify this by going to http://localhost:7860/docs and ensure there is a/sdapi/...endpoint there) - Set your mechanism to
apiand configure thehostparameter (if running locally it's always just http://localhost:7860) - ensure web-ui is running
- check out the examples or the doc and build your config
- make sure the venv is sourced (see above)
- Run your scenario with
python3 -cp config -cn=yourconfig main.pywhere-cpspecifies the path to your config directory and-cnspecifies which config to run.
Once you have generated your frames, pyttv's job is done. Encoding to a video file is done by other tools.
A useful tool to use is Flowframes or similar tools that use RIFE interpolation or other decent interpolation mechanisms.
rife-ncnn-vulkan seems to work on macOS (M1).
If you just want to encode your frames directly to a video file, you can of course use ffmpeg. cd into your output frame directory and run
cat *.png | ffmpeg -framerate 18 -f image2pipe -i - -c:v libx264 -pix_fmt yuv420p out.mp4where 18 is to be replaced by your fps of course and out.mp4 is the output filename.
or to encode in a lossless format:
ffmpeg -framerate 18 -i directory\%05d.png -c:v copy output.mkvffmpeg -i out.mp4 -i audio.flac out_audio.mp4Example: Append audio, but the audio gets offset by 55s (-ss) and the audio duration is 5 seconds:
ffmpeg -i out.mp4 -ss 00:00:55.0 -t 00:00:05.0 -i audio.flac out_audio.mp4If you created your video at a lower resolution, decent tools are RealESRGAN and UIs for it, e.g. cupscale.
Of course you can also use the WebUI for this, either by running a text conditioned upscale with batch img2img mode + SD upscale script or effectively only running ESRGAN by setting the denoising strength to 0.
If you use an M1 mac, use torch_device: cpu in your configs. unfortunately the depth model currently does not work directly on the mps device.