DiffusionOPD introduces an online policy distillation framework for multi-task diffusion alignment. Instead of jointly optimizing several rewards from scratch or cascading RL stages, it first learns ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results