#Model Optimization
3 articles with this tag

AI Research
Nvidia's Ziv Ilan on Faster Diffusion Models
Nvidia's Ziv Ilan explains how to reduce diffusion model latency using quantization, caching, and distillation, plus the new FastGen library.
4 days ago
AI Research
DoRA Efficiency Breakthrough
New factored norm and fused kernels unlock DoRA's potential, delivering 1.5-2x speedups and significant VRAM reduction.
3 months ago
AI Research
Pretraining's Hidden Experts: A New Post-Training Paradigm
Large pretrained models are dense with task-experts, enabling simple random sampling and ensembling to rival complex post-training AI optimization methods.
3 months ago