LLMs Slash Neural Architecture Search Costs

Delta-Code Generation uses LLMs to produce compact architecture refinements, dramatically cutting costs and improving NAS efficiency.

Diagram illustrating the Delta-Code Generation pipeline for LLM-driven NAS
The Delta-Code Generation process refines baseline architectures using LLM-generated deltas.

The quest for optimal neural network architectures has been hampered by the immense computational burden of traditional Neural Architecture Search (NAS). Existing methods, which rely on Large Language Models (LLMs) to synthesize complete model implementations from scratch, are prohibitively expensive and generate verbose code. This paper introduces a paradigm shift with Delta-Code Generation, a novel pipeline that leverages fine-tuned LLMs to produce compact unified diffs (deltas) that refine existing baseline architectures.

Efficiency Through Incremental Refinement

The core innovation lies in shifting from full model synthesis to incremental modification. By fine-tuning LLMs, specifically three 7B-class models (DeepSeek-Coder-7B, Qwen2.5-Coder-7B, and Mistral-7B), on curated architectures from the LEMUR dataset and employing MinHash-Jaccard novelty filtering, the approach generates concise deltas. This method drastically reduces output lengths, achieving a 75-85% reduction compared to full generation (30-50 lines versus 200+ lines). This token-efficient strategy makes LLM-driven NAS far more accessible.

Related startups

Surpassing Full-Generation Baselines

The empirical results are compelling. Across six diverse datasets (CIFAR-10, CIFAR-100, MNIST, SVHN, ImageNette, CelebA), the Delta-Code Generation strategy significantly outperforms the full-generation baseline. DeepSeek-Coder achieved a 75.3% valid rate and 65.8% mean first-epoch accuracy, outperforming the baseline's 50.6% valid rate and 42.3% mean accuracy. On CIFAR-10, the best first-epoch accuracies reached up to 85.5% (Mistral), notably exceeding both the full generation baseline (63.98%) and a concurrent approach. A 50-epoch study further validates that the 1-epoch proxy accurately preserves performance rankings, confirming the efficiency and reliability of this method.

A Versatile and LLM-Agnostic NAS Solution

The demonstrated success across multiple datasets and LLMs highlights the versatility of Delta-Code Generation. It proves to be a multi-domain, LLM-agnostic alternative to computationally intensive full-model synthesis. This makes the methodology broadly applicable for researchers and investors seeking to accelerate the development of efficient and high-performing neural architectures without the prohibitive costs associated with current LLM-based NAS techniques.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.