Scientific workflows have long automated execution, but the critical step of translating research questions into executable specifications remained a manual bottleneck. This gap, requiring both domain acumen and infrastructure mastery, has now been addressed by a novel agentic architecture detailed on arXiv.
From Natural Language to Reproducible DAGs
The proposed system employs a three-layer architecture to tackle the semantic translation problem. A Large Language Model (LLM) acts as the semantic layer, interpreting natural language queries into structured intents. These intents are then fed to a deterministic layer where validated generators produce reproducible Directed Acyclic Graphs (DAGs) for the workflow. Crucially, a knowledge layer comprises "Skills"—markdown documents authored by domain experts that encode vocabulary mappings, parameter constraints, and optimization strategies. This decomposition strategically confines LLM non-determinism to the intent extraction phase, ensuring that identical intents consistently yield identical workflows.
Quantifiable Gains in Accuracy and Efficiency
Evaluation on the 1000 Genomes population genetics workflow and Hyperflow WMS running on Kubernetes demonstrates the architecture's efficacy. An ablation study involving 150 queries revealed that the introduction of "Skills" dramatically boosted full-match intent accuracy from 44% to 83%. Furthermore, skill-driven deferred workflow generation led to a remarkable 92% reduction in data transfer. The end-to-end pipeline achieved impressive performance on Kubernetes, with LLM overheads under 15 seconds and operational costs below $0.001 per query, showcasing the viability of these agentic workflow systems for large-scale scientific endeavors.