Thank you for your great work!!! I have a few questions on the training code in rcm/models/t2v_model_distill_rcm.py
- As the consistency function in rCM paper stated
fθ(x, t) = cskip(t)x + cout(t) * Fθ(cin(t)x, cnoise(t)), I suppose dx/dt can be replaced by the teacher output Fteacher only when Fθ(cin(t)x, cnoise(t)) represent the velocity. However, the code in the self.denoise function output F_pred_B_C_T_H_W = (torch.cos(time_B_1_T_1_1) * xt_B_C_T_H_W - x0_pred_B_C_T_H_W) / torch.sin(time_B_1_T_1_1) which is the prediction of the noise by using the trigflow function? I wonder why not output net_output_B_C_T_H_W?
- Have you try using global velocity instead of teacher output to calculate g_B_C_T_H_W ?
Maybe I misunderstood, looking forward to your reply, many thanks!