AI Engineer: Small Models, Big Impact

Maxime Labonne of Liquid AI discusses the unique challenges and advantages of small AI models, detailing their architecture, training, and techniques to overcome issues like doom looping.

Maxime Labonne presenting at AI Engineer Europe
Image credit: AI Engineer Europe· AI Engineer

Maxime Labonne, head of pre-training at Liquid AI, recently shared insights into the development and deployment of small AI models, emphasizing their unique characteristics and the challenges they present. In his presentation, titled "Everything I Learned Training Frontier Small Models," Labonne detailed how these models, ranging from 350 million to 24 billion parameters, are optimized for edge AI applications, focusing on text, vision, and audio processing.

AI Engineer: Small Models, Big Impact - AI Engineer
AI Engineer: Small Models, Big Impact — from AI Engineer

Understanding Edge Model Characteristics

Labonne highlighted three key characteristics that define edge models. Firstly, they are memory-bound, typically operating with less than 3 billion parameters, making them suitable for resource-constrained environments like smartphones and cars. Secondly, these models are task-specific, designed to excel at a particular function rather than general-purpose tasks like large language models such as ChatGPT. This specialization allows them to perform tasks like summarization or reasoning very effectively within their defined scope. Thirdly, edge models are latency-sensitive, requiring fast inference throughput to deliver real-time responses.

A crucial point Labonne emphasized is that edge models are not merely scaled-down versions of larger models. They possess unique challenges and require tailored approaches. For instance, the architecture of models like Google's Gemma 3 (270M LLM) and Qwen 3.5 (0.8B VLM) showcases a hybrid approach with features like sliding window attention and gated attention mechanisms. However, the significant portion of parameters dedicated to embedding layers in Gemma 3 (63%) and Qwen 3.5 (29%) indicates potential inefficiencies, as these layers do not contribute as directly to reasoning or knowledge capacity.

Related startups

Architectural Innovations for Efficiency

Labonne discussed the LFM2.5 architecture, which incorporates short convolutions and gated convolutions, offering a more efficient parameter distribution. This architecture demonstrated better performance and reduced memory usage compared to other models. The team at Liquid AI conducted on-device profiling to validate these architectural choices, testing models on hardware like the AMD Ryzen AI Max 395 and the Samsung Galaxy S25 Ultra. The results showed that the short convolution approach significantly boosted speed and reduced memory footprint, particularly crucial for latency-sensitive applications.

The Importance of Pre-training and Fine-tuning

The presentation also touched upon the training methodology, which involves pre-training on vast datasets (28 trillion tokens for LFM2.5) followed by supervised fine-tuning, preference alignment, and reinforcement learning. Labonne referenced research suggesting that pre-training small models on more tokens, even beyond current optimal scaling laws, can yield better performance. This is particularly advantageous given the lower cost and computational resources required for training smaller models compared to their larger counterparts.

The fine-tuning process, specifically supervised fine-tuning (SFT), is most effective when it is highly task-specific. Preference alignment, using techniques like direct preference optimization (DPO), further enhances model performance by aligning outputs with human preferences, leading to general improvements beyond benchmark metrics. Reinforcement learning, while also effective, is best applied when models have a narrow focus and can be trained with verifiable rewards and repetition penalties to avoid issues like 'doom looping'.

Addressing 'Doom Looping' in Small Models

A significant challenge highlighted was 'doom looping,' where models repeatedly generate the same sequence of words. Labonne explained that this issue is more pronounced in small, reasoning-focused models tasked with complex problems. The team at Liquid AI tackled this through two primary methods:

  • Preference Alignment Data Generation: By generating multiple rollouts with varying temperature sampling, the team could identify and select diverse responses, including those less prone to doom looping. They then trained the model to prefer these non-looping outputs.
  • Reinforcement Learning with Repetition Penalties: Applying reinforcement learning with verifiable rewards and n-gram repetition penalties proved highly effective in drastically reducing doom loop occurrences. This method rewards accurate, non-repetitive outputs, ensuring more reliable model behavior.

The data presented showed a dramatic reduction in doom loop ratios after employing these reinforcement learning techniques, transforming a problematic behavior into a near-non-existent one.

Future Directions and Collaboration

Labonne concluded by emphasizing that while small models may have limitations in long-context capabilities or general knowledge, these can be overcome through creative solutions like incorporating web search tools. This approach allows small models to access external information, significantly boosting their performance on complex tasks. He encouraged the audience to consider the unique strengths of small models and explore their potential, inviting collaboration on future projects like LFM3.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.