Artificial Intelligence

Preferred on Google

Benjamin Cowen on Fine-Tuning AI Models with Modal

Benjamin Cowen from Modal discusses the shift towards custom, fine-tuned AI models and how serverless platforms simplify this process.

Jun 2 at 7:02 PM7 min read

Benjamin Cowen presenting on 'What Lies Beneath the API' at an AI Engineer Europe event. — AI Engineer

Visual TL;DR. Frontier APIs leads to Model Spectrum. Scratch Servers leads to Model Spectrum. Domain-Specific Models leads to Need for Fine-Tuning. Need for Fine-Tuning requires Serverless Infrastructure. Serverless Infrastructure enables Accessible Custom AI. Model Spectrum shows Domain-Specific Models. Model Spectrum highlights Need for Fine-Tuning. Key Fine-Tuning Signals informs Need for Fine-Tuning.

Frontier APIs: quick start, no infrastructure, powerful pre-trained models
Scratch Servers: full control, precise fine-tuning for specific needs
Model Spectrum: progression from general APIs to custom solutions
Domain-Specific Models: growing trend for tailored AI performance
Need for Fine-Tuning: customization unlocks better, predictable AI performance
Serverless Infrastructure: simplifies AI training and inference processes
Accessible Custom AI: making fine-tuning easier for companies
Key Fine-Tuning Signals: identifying when custom models are beneficial

Visual TL;DRQuickExplainDeeper

Benjamin Cowen, a Forward Deployed Machine Learning Engineer at Modal, recently presented on the topic of "What Lies Beneath the API," exploring the evolving landscape of AI model development and deployment. Cowen discussed the growing trend of companies fine-tuning their own models rather than solely relying on general-purpose APIs, and how serverless platforms are making this more accessible.

Benjamin Cowen on Fine-Tuning AI Models with Modal - AI Engineer — Benjamin Cowen on Fine-Tuning AI Models with Modal — from AI Engineer

The Model Spectrum: From Frontier API to Custom Solutions

Cowen introduced the concept of the "Model Spectrum," illustrating a progression from using readily available "Frontier APIs" to building and managing models on "Scratch Servers." Frontier APIs offer a quick start with no infrastructure overhead and access to powerful, pre-trained models. However, they lack customization and can sometimes yield unpredictable performance.

On the other end of the spectrum, Scratch Servers provide full control and the ability to fine-tune models precisely to specific needs. This approach offers maximum customization and allows for the definition of custom metrics. The trade-off is the significant burden of infrastructure management, including cluster management and self-maintenance of software stacks.

The Rise of Domain-Specific Models and the Need for Fine-Tuning

Cowen highlighted that as companies mature, they increasingly need to fine-tune models on proprietary data to achieve better performance, lower latency, and custom functionality. He cited examples like Intercom's Fin Apex, which reportedly beat GPT-5.4 at 1/5th the cost, and Pinterest CEO Ben Silbermann's statement about achieving "orders of magnitude reduction in cost" by fine-tuning open-source models versus using frontier APIs.

This trend signifies a shift in how AI is viewed: models are becoming raw materials, and the fine-tuned, domain-specific system is the actual product. Cowen emphasized that this fine-tuning process is becoming more accessible.

Serverless Infrastructure for AI Training and Inference

The presentation showcased how serverless platforms like Modal are bridging the gap between ease of use and control. Cowen explained that Modal's infrastructure, which includes unified GPUs and sandboxed environments, makes large-scale AI training and inference feasible with significantly less code and management overhead.

He demonstrated that fine-tuning models, such as those for large language models (LLMs) or reinforcement learning (RL) tasks, can be achieved with surprisingly concise codebases, often in as little as 300 lines of Python. This is facilitated by open-source libraries and serverless infrastructure that handles parallel hyperparameter sweeps and scaling.

Cowen provided code examples illustrating how to set up fine-tuning jobs and deploy models efficiently. He noted that the ability to scale containers on demand and the abstraction of infrastructure management are key benefits of using such platforms. This allows developers to focus on model development and data curation rather than infrastructure plumbing.

Key Signals for Fine-Tuning

Cowen also outlined several signals that indicate it might be time for a product to transition to a fine-tuned, domain-specific model:

Evaluations are plateauing despite prompt work.
There is a need for lower latency or higher throughput.
Unit economics are not scaling effectively.
Core functionality is still developing.
There's a lack of collected, relevant data for prompt engineering.

He concluded by emphasizing that if a product has already involved agent harnessing, evaluation suites, AI engineers, and data collection, the hard part of building a domain-specific model may already be done, making the transition to fine-tuning a logical next step.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.

#Benjamin Cowen #Modal #AI #Machine Learning #Serverless #Fine-tuning #Reinforcement Learning #LLM