The reign of computationally intensive Diffusion Transformers (DiT) for high-fidelity image synthesis is being challenged by the practical need for on-device deployment. The massive resource demands of these models have historically confined them to high-end GPUs, leaving resource-constrained edge devices behind. This paper introduces EdgeDiT, a novel family of hardware-efficient generative transformers specifically engineered for mobile NPUs like Qualcomm Hexagon and Apple Neural Engine.
Hardware-Aware Pruning for Mobile Efficiency
EdgeDiT systematically identifies and prunes structural redundancies within the DiT architecture that are particularly detrimental to mobile data flows. This hardware-aware optimization framework results in a 20-30% reduction in parameters and a 36-46% decrease in FLOPs. Crucially, this efficiency is achieved without sacrificing the core scaling advantages or expressive capacity of the original transformer architecture, paving the way for EdgeDiT mobile AI applications.