@zhengkw18 @qsh-zh Excellent work! Could you clarify why the scaling factor reported in the paper is 1.0 and 0.01, whereas in the code the scales for SCM and DMD are set to 100.0 and 1.0, respectively? Additionally, what are the typical magnitudes of the various loss terms during training for text-to-image (T2I) and text-to-video (T2V) models?