Amol Kapoor, CEO of Nori Agentic, presents a compelling case for HTML as the universal language for AI agents tasked with creating visual content. In his talk, "HTML is All You Need (for Agents to Make Graphics)," Kapoor challenges the perception that AI agents are solely code-writing entities. He argues that these agents can produce a wide array of visual artifacts, from slides and documents to entire videos, by leveraging the right tools and formats.
Related startups
The Limitations of Pixel-Based Creation for AI
Kapoor begins by highlighting the inefficiencies and limitations of current visual creation tools when used by AI. He points out that software like PowerPoint, Google Slides, Figma, and Canva are built with human interaction in mind, relying on direct manipulation through clicks, drags, and resizes. This graphical, pixel-based approach is fundamentally at odds with how AI models process information, which is inherently language- and structure-based.
He illustrates the point by referencing Simon Willison's test, which asks AI models to generate an SVG of a pelican riding a bicycle. While models can often produce the SVG code, the visual output is frequently flawed, demonstrating a lack of spatial reasoning. Kapoor asserts that this isn't a failure of the AI models themselves, but rather a mismatch in the medium. Asking an AI to create graphics using pixel manipulation is akin to asking a human to draw complex vector graphics purely by hand, it's inefficient and prone to error.
