As large language models balloon in size and complexity, the efficiency of GPU kernels has become a critical bottleneck. While custom kernels can bridge performance gaps left by standard libraries like PyTorch, their creation demands scarce, specialized expertise. LinkedIn's open-source Liger Kernel project aims to democratize these optimizations.
Liger Kernel delivers substantial gains, boasting a 20% throughput improvement and 60% memory reduction across nearly 40 model architectures. It integrates seamlessly with popular tools like HuggingFace Transformers and works with Flash Attention, PyTorch FSDP, and DeepSpeed. The project has seen strong adoption, with over 7 million downloads and contributions from 100+ companies.
However, maintaining such an extensive project presents its own hurdles. Developing new kernels, optimizing existing ones, and integrating support for new models each require significant expert time—a pace that struggles to keep up with rapid model innovation.
To address this, LinkedIn is deploying AI agents to automate the heavy lifting of kernel engineering. This initiative, detailed in a recent post, applies the philosophy of "AI helping build better AI" to GPU kernel development.
Agentic Workflows for Kernel Engineering
The development of Liger Kernel follows well-defined patterns: analysis, implementation, testing, and benchmarking. These repeatable steps are ideal for agentic automation, but the complexity of arbitrary shapes, multiple precision modes, and diverse model architectures necessitates sophisticated workflows.