"DSPy allows you to decompose logic into a program that treats LLMs as a first class citizen... without having to tweak prompts (unless you want to)." This emphatic statement from Kevin Madura of AlixPartners at the AI Engineer Code Summit encapsulates the core value proposition of Declarative Self-improving Language Programs (DSPy). For founders, VCs, and AI professionals building enterprise-grade applications, the shift from brittle prompt engineering to structured programming with LLMs is not just a preference—it’s a necessity for robustness and scalability.
Madura spoke with an audience of technical leaders about why DSPy represents a critical evolution in building reliable AI software. He argued that as LLMs become foundational to enterprise workflows—from PDF processing to complex research agents—developers must move beyond iterative string manipulation. The challenge, as Madura highlighted, is moving from prompt tweaking to true programming with LLMs.
DSPy addresses this by introducing a set of opinionated primitives designed to structure AI applications rigorously. Madura introduced the six core concepts: Signatures, Modules, Tools, Adapters, Optimizers, and Metrics. Signatures, for instance, are declarative specifications of what the program should achieve, not how the LLM achieves it. This separation of concerns is crucial for maintainability.
The framework’s strength lies in its ability to compile these declarative programs into effective prompts and even fine-tune model weights, moving the engineering effort away from manual prompt crafting. Madura stressed the importance of this structure, noting, "Your program design likely moves slower than AI advancements (at least so far)." By encoding intent and structure in a transferable way, DSPy applications are insulated from the rapid, unpredictable advancements in foundational models.
A key insight Madura provided was the distinction between programming and mere prompting. He emphasized that DSPy allows developers to "create computer programs that use LLMs as inline function calls." This treats the LLM less like an oracle and more like a library component, which can be reasoned about, tested, and optimized with traditional software engineering discipline.
The concept of Modules further solidifies this engineering mindset, acting as the base abstraction layer for DSPy programs, enabling logical separation and composability. The built-in standard library (`stdlib`) offers essential tools like `dspy.Predict`, `dspy.ChainOfThought`, and agentic modules like `dspy.ReAct`, which handle complex reasoning and external tool interaction directly within the structured program flow.
Madura also touched on the importance of Metrics, which define what success looks like, often involving multiple, potentially conflicting objectives. These metrics are then fed into Optimizers—algorithms that tune the program's parameters (prompts or weights) to maximize the desired outcomes, such as accuracy or cost efficiency. This ML-style optimization is a powerful differentiator from manual prompt iteration.
He shared a cautionary quote from Andrej Karpathy: "So you can't do this for too long. You do maybe 10 steps or 20 steps, and maybe it will work, but you can’t do 100 or 1,000. I understand it’s not obvious, but basically the model will find little cracks. It will find all these spurious things in the nooks and crannies of the giant model and find a way to cheat it." This highlights the inherent fragility of prompt engineering when dealing with complex, large-scale models.
The presentation concluded with live demonstrations, including a multimodal example where DSPy correctly interpreted parking signs by feeding an image attachment directly into the signature, showcasing its capability to handle various data types beyond simple text. The core message for attendees was clear: DSPy provides the necessary structure to build rigorous, testable, and robust AI applications that scale effectively in the fast-moving LLM landscape.

