Automating Visual Workflows with LLMs

A new benchmark, Chat2Workflow, reveals LLMs struggle with generating executable visual workflows, despite progress in capturing intent. A significant gap remains for industrial automation.

1 min read
Diagram illustrating the Chat2Workflow benchmark framework for generating executable visual workflows from natural language.
The Chat2Workflow benchmark aims to automate the creation of executable visual workflows.

The industrial adoption of executable visual workflows, prized for their reliability, is currently hobbled by manual engineering. Developers spend considerable time designing, prompting, and iterating on these complex systems, a process ripe for automation.

Bridging the Gap to Agentic Workflow Generation

To address this, researchers introduce the Chat2Workflow benchmark, a novel dataset comprising real-world business workflows designed for direct deployment on platforms like Dify and Coze. This benchmark serves as a critical tool to investigate the potential of large language models (LLMs) in automating the multi-round interaction required for workflow creation. The goal is to move beyond manual construction towards more autonomous systems.

Related startups

The LLM Frontier: Intent vs. Execution

Experimental results reveal a persistent challenge: while state-of-the-art LLMs can grasp high-level user intent, they falter in generating workflows that are consistently correct, stable, and executable. This gap is particularly pronounced when dealing with complex or evolving requirements. The proposed agentic framework shows promise, achieving up to a 5.34% resolve rate gain in handling recurrent execution errors, yet a significant real-world gap remains, underscoring the need for continued advancement in industrial-grade automation.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.