DiffusionOPD introduces an online policy distillation framework for multi-task diffusion alignment. Instead of jointly optimizing several rewards from scratch or cascading RL stages, it first learns ...