The reign of computationally intensive Diffusion Transformers (DiT) for high-fidelity image synthesis is being challenged by the practical need for on-device deployment. The massive resource demands of these models have historically confined them to high-end GPUs, leaving resource-constrained edge devices behind. This paper introduces EdgeDiT, a novel family of hardware-efficient generative transformers specifically engineered for mobile NPUs like Qualcomm Hexagon and Apple Neural Engine.
Hardware-Aware Pruning for Mobile Efficiency
EdgeDiT systematically identifies and prunes structural redundancies within the DiT architecture that are particularly detrimental to mobile data flows. This hardware-aware optimization framework results in a 20-30% reduction in parameters and a 36-46% decrease in FLOPs. Crucially, this efficiency is achieved without sacrificing the core scaling advantages or expressive capacity of the original transformer architecture, paving the way for EdgeDiT mobile AI applications.
Superior Pareto Frontier for Mobile Generative AI
Benchmarking reveals EdgeDiT achieves a 1.65-fold reduction in on-device latency. This performance leap translates to a superior Pareto-optimal trade-off between Frechet Inception Distance (FID) and inference latency when compared to optimized mobile U-Nets and vanilla DiT variants. The implications for EdgeDiT mobile AI are profound, enabling responsive, private, and offline generative capabilities directly on user devices.