1 articles with this tag
TIDE framework enables cross-architecture distillation for diffusion large language models, achieving significant performance gains with smaller student models.