"Evaluation/evals" stands as the single most painful aspect of AI Engineering today, a stark revelation from Amplify Partners' recent 2025 AI Engineering Report. Barr Yaron, an investment partner at Amplify Partners, unveiled early findings from the "2025 State of AI Engineering" survey at the AI Engineer World's Fair in San Francisco. Her presentation offered a data-driven snapshot of the rapidly evolving AI engineering landscape, touching upon workforce demographics, model deployment, customization techniques, and the pervasive challenges faced by practitioners.
The survey, drawing responses from 500 individuals, revealed a fascinating demographic shift within the field. While the conference itself is dedicated to AI Engineering, many attendees and respondents hold diverse titles beyond "AI Engineer," including software engineers, founders, and product managers. This fluidity underscores the nascent nature of the AI engineering role, a sentiment echoed by Yaron: "The largest group called themselves engineers, whether software engineers or AI engineers." Despite years of software experience, a significant portion of these seasoned developers are newcomers to AI/ML, with nearly half of those with 10+ years in software having less than three years of AI/ML experience.
The report highlights the widespread adoption of Large Language Models (LLMs) in production. "More than half of the respondents are using LLMs for both internal and external use cases," signaling a rapid integration into enterprise operations. Notably, OpenAI models dominate the external, customer-facing product landscape, with three out of the top five models and half of the top ten originating from the company. The primary use cases for LLMs today are unsurprising: code intelligence/generation and writing assistance, reflecting immediate productivity gains.
Customizing AI systems is a critical frontier. Beyond few-shot learning, Retrieval-Augmented Generation (RAG) emerged as the most popular technique, adopted by over 70% of respondents. Interestingly, fine-tuning, including parameter-efficient methods like LoRA/QLoRA, is also surprisingly prevalent, especially among researchers and research engineers. This indicates a deeper engagement with model adaptation than might be commonly perceived.
Teams are updating their prompts with astonishing frequency. For instance, "70% of respondents are updating prompts at least monthly; 10% are doing it daily." Despite this relentless iteration, prompt management remains a significant pain point, as a full 31% of respondents reported having no formal way of managing their prompts. This informal approach suggests a notable gap in tooling and best practices for a core component of AI development.
While text generation and LLMs are firmly entrenched in production, other modalities like image, audio, and video generation lag significantly. This "multimodal production gap" points to challenges in bringing these advanced capabilities to market, though intent to adopt, particularly for audio, is high. Looking ahead, AI agents, defined as systems where an LLM controls core decision-making, are still early in their adoption curve compared to general LLMs. However, the future looks agent-driven, with only a small minority of respondents indicating no plans to utilize them whatsoever. The majority of agents already in production possess "write access" to systems, often with human-in-the-loop oversight, but some even operate with high autonomy.
The survey clearly delineates the most pressing challenges in AI engineering. Topping the list, by a significant margin, is "Evaluation/evals," which Yaron stated is the "number one most painful thing about AI Engineering today." This points to a critical need for more robust, standardized, and efficient methods for assessing AI model performance and quality. Other significant pain points include keeping up with rapid changes, fragmented tool ecosystems, and the high cost of GPU compute.
The Amplify report paints a picture of a dynamic, fast-moving field, characterized by rapid adoption of foundational models, a diverse workforce quickly adapting to new paradigms, and persistent, yet addressable, technical and operational hurdles. The emphasis on practical application, continuous iteration, and the critical role of human oversight in complex AI systems defines the current state of AI engineering.

