“Have you ever had your agent working for almost an hour only to understand that he went in the wrong direction? Or in the middle of something very important, he ran out of context window?” This poignant question, posed by Alex Gavrilescu, a backend and web engineering lead at Funstage GmbH, at the AI Engineer Code Summit, cuts to the core of a prevalent challenge in AI-driven development. His solution, Backlog.md, isn't just another project management tool; it’s a meticulously designed workflow and terminal-native Kanban board that fundamentally redefines how humans and AI agents collaborate on software projects.
Gavrilescu's presentation introduced Backlog.md as a pragmatic response to the inherent limitations of large language models, particularly their propensity to lose context or misinterpret complex instructions over extended tasks. He articulated a vision where AI agents, rather than operating in a black box, are integrated into a transparent, iterative development cycle that mirrors human-centric agile methodologies. The essence of Backlog.md lies in its ability to break down large features into smaller, manageable Markdown tasks, providing a structured environment where AI can thrive without veering off course.
Central to Backlog.md’s effectiveness is its approach to structured knowledge for AI agents. By storing all tasks as Markdown files within a Git repository, the system ensures that every piece of work, from high-level descriptions to granular acceptance criteria, is explicit and version-controlled. This design choice addresses the critical need for robust context engineering, allowing AI agents to access a consistent, up-to-date source of truth. Gavrilescu highlighted that this structured format ensures "Markdown tasks allow better context engineering with Backlog structure; AI Agents won't run out of context window." This meticulous organization is paramount for preventing the dreaded "context window exhaustion" and ensuring AI understands the nuances of a given task.
The workflow begins with task creation. A human developer or even another AI agent defines a task, complete with metadata (ID, title, status, assignee, priority), a clear description of the ‘why,’ and detailed acceptance criteria outlining the ‘what.’ This initial phase culminates in a crucial human review. As Gavrilescu emphasized, this is "the moment where you can actually understand if the AI agent has understood your intent and will do a good task." This checkpoint is vital, allowing for immediate correction of any misunderstandings before significant resources are committed.
Once a task is understood and approved, the AI agent proceeds to generate an implementation plan. This involves the AI sifting through project documentation, the existing codebase, and even external resources to formulate a strategic approach. This phase underscores a second core insight: the necessity of iterative human-AI collaboration. A senior software engineer then reviews the AI-generated implementation plan, a critical step to ensure the proposed solution aligns with architectural principles and project standards. This layered review process prevents costly errors and guides the AI toward optimal solutions, ensuring that human expertise remains at the helm of strategic decision-making.
The technical backbone of this seamless interaction is Backlog.md’s utilization of MCP (Micro-Capability Protocol) resources. Backlog.md exposes four key resources to AI agents: a workflow overview, a task creation guide, a task execution guide, and a task completion guide. These guides act as a standardized instruction set, enabling AI agents to understand how to interact with the system—from searching and viewing task details to creating and updating tasks—using native MCP connections or traditional CLI commands. This standardized interface empowers agents to operate effectively within the prescribed workflow.
The actual task execution sees the AI agent writing code to fulfill the acceptance criteria. Backlog.md’s design allows for flexibility, enabling tasks to be moved back from "Done" to "In Progress" if issues are discovered during testing or review. This iterative nature, combined with the atomic structure of tasks, provides a safety net. If something goes awry, developers can easily roll back a single task, refine the specifications, and prompt the AI to try again without disrupting the entire project.
This modularity and clear scope definition are the third pillars of Backlog.md’s success. "Scope is well defined by acceptance criteria," Gavrilescu stated, ensuring that "agents will not do less or more features than requested." This precision minimizes wasted effort and aligns AI output directly with project needs. Furthermore, the Git-based nature of Backlog.md supports parallel development on multiple tasks using Git worktrees, provided there are no dependencies, enhancing team productivity.
Perhaps the most compelling testament to Backlog.md's efficacy is the revelation that the tool's own codebase was "written 99% by AI Agents." This bold statement highlights the transformative potential of such a structured, human-AI collaborative environment. Backlog.md is an open-source, MIT-licensed CLI tool, offering both a Terminal User Interface (TUI) and a localhost web interface. It runs cross-platform on Windows, macOS, and Linux, requiring no external APIs, accounts, databases, or complex configurations. All data lives as plain text Markdown files directly in the Git repository, ensuring transparency, version control, and effortless synchronization across branches and teams.
Backlog.md represents a significant step forward in bridging the gap between human developers and AI agents. By providing a clear, structured, and iterative workflow within the familiar confines of a Git repository and a terminal, it empowers AI to contribute meaningfully to software development while maintaining essential human oversight. It transforms the abstract potential of AI into tangible, manageable progress, fostering a new paradigm of intelligent, collaborative coding.

