LLMs Slash Neural Architecture Search Costs

The quest for optimal neural network architectures has been hampered by the immense computational burden of traditional Neural Architecture Search (NAS). Existing methods, which rely on Large Language Models (LLMs) to synthesize complete model implementations from scratch, are prohibitively expensive and generate verbose code. This paper introduces a paradigm shift with Delta-Code Generation, a novel pipeline that leverages fine-tuned LLMs to produce compact unified diffs (deltas) that refine existing baseline architectures.

Efficiency Through Incremental Refinement

The core innovation lies in shifting from full model synthesis to incremental modification. By fine-tuning LLMs, specifically three 7B-class models (DeepSeek-Coder-7B, Qwen2.5-Coder-7B, and Mistral-7B), on curated architectures from the LEMUR dataset and employing MinHash-Jaccard novelty filtering, the approach generates concise deltas. This method drastically reduces output lengths, achieving a 75-85% reduction compared to full generation (30-50 lines versus 200+ lines). This token-efficient strategy makes LLM-driven NAS far more accessible.

Surpassing Full-Generation Baselines

The empirical results are compelling. Across six diverse datasets (CIFAR-10, CIFAR-100, MNIST, SVHN, ImageNette, CelebA), the Delta-Code Generation strategy significantly outperforms the full-generation baseline. DeepSeek-Coder achieved a 75.3% valid rate and 65.8% mean first-epoch accuracy, outperforming the baseline's 50.6% valid rate and 42.3% mean accuracy. On CIFAR-10, the best first-epoch accuracies reached up to 85.5% (Mistral), notably exceeding both the full generation baseline (63.98%) and a concurrent approach. A 50-epoch study further validates that the 1-epoch proxy accurately preserves performance rankings, confirming the efficiency and reliability of this method.

A Versatile and LLM-Agnostic NAS Solution

The demonstrated success across multiple datasets and LLMs highlights the versatility of Delta-Code Generation. It proves to be a multi-domain, LLM-agnostic alternative to computationally intensive full-model synthesis. This makes the methodology broadly applicable for researchers and investors seeking to accelerate the development of efficient and high-performing neural architectures without the prohibitive costs associated with current LLM-based NAS techniques.

LLMs Slash Neural Architecture Search Costs

Efficiency Through Incremental Refinement

Related startups

Surpassing Full-Generation Baselines

A Versatile and LLM-Agnostic NAS Solution

AI Daily Digest