If you’re utilizing Large Language Models (LLMs) today, such as ChatGPT or Claude, you’ve likely stumbled upon its quirks: responses that are totally irrelevant, or those that simply aren’t quite what you prompted. And even if you manage to make it work, it’s hard to make changes – change prompts or models. And performance for your customers can quickly be degraded, without notice. Solving this problem is Israeli Generative AI startup Traceloop, a recent Y Combinator graduate that’s setting up guardrails to ensure the Generative AI LLM product you’re building doesn’t veer of course.
In an interview with StartupHub.ai, Traceloop’s CEO Nir Gazit explains it all. Founded in 2022 by Gazit and Gal Kleinman (CTO), graduates of the Israeli defense intelligence corps, Traceloop (formerly Enrolla), originally set out to solve test automation at scale. “We had been accustomed to top-tier software development with so many guardrails preventing you from releasing something errant into production,” explained Gazit during his time as the Tech Lead at Google’s Growth Quality team. Gazit was responsible for optimizing and measuring growth campaigns using machine learning techniques. After his transition to Fiverr as Chief Architect, his first “Aha” moment came to him: the team’s testing configuration was so weak that bad code was pushed into production regularly, prompting him and Kleinman to devise a solution that offers total coverage to the software testing life cycle.
The two set off, applied and got accepted to Y Combinator’s Winter 2022 batch, moved to San Francisco, and secured a Seed funding round. Their startup ambition was set into motion, with substantial momentum. “With a couple design partners, we started working on AI powered test automation,” explained Gazit. “We built autonomous agents that figured out your system and created a test. It’s a fairly complex system to test the system itself, and we went down that rabbit hole.”
Amid the Generative AI revolution that dawned over the last year, Gazit and Kleinman experienced their second realization, influencing them to pivot and service the surging demand. And for good reason too. Bloomberg Intelligence forecasts Generative AI to reach $1.3 trillion by 2032, and every company, from tech to non-tech verticals, are clamoring to get in on the action. The power of LLMs is just too profound to sit this one out. Whether it’s generating website copy, research papers, complex code, or a customer support chatbot, LLMs are categorically value-add, especially to the enterprise.
Through using off the shelf foundational models, like GPT3.5 and GPT4, fine tuning LLMs, or building their own LLM agents, enterprises can tap into the Generative AI revolution upon us. It depends on the level of accuracy, the nature of the dataset, and the security adherence required. The only caveat among them all: there’s still inconsistencies, or hallucinations as referred to by the AI community, that surface in LLM responses. And when building a product, scalable, and in production, the desired tolerance for error with LLMs is zero. Yet, the problem still prevails.
“While talking to our peers, we realized everyone is doing the same: testing their own LLMs with custom testing systems, which led us to our pivot: testing and validating the usage of LLMs,” said Gazit. “It’s complicated because of the difficulty of generating a validated output. You need a way to verify your product is working properly on top of the given model. Think of Notion AI: it’s built on LLMs and they need a way to constantly upgrade and improve their prompts while not breaking existing behavior. Today, everyone needs to have a Generative AI feature in their product. For example, Fiverr launched Neo, a way for you to buy something on Fiverr with a ChatGPT-like interface.”
Traceloop is still in its beta testing phase, but the startup is keen on servicing the potential widespread, complex implementation of Generative AI. “So many companies are only scratching the surface of building their own LLMs, and you can create much more complex outputs than the thin layers that are currently being used.”
The main challenge with integrating LLMs into product development is the paradigm shift it presents to engineers. Traditionally, engineers have been accustomed to deterministic code, where a given input always produces the same output. However, Generative AI, with its inherent unpredictability, throws a wrench into this deterministic framework. Gazit illustrates this with a simple example, “Imagine creating a prompt instructing the model to product an output in a specific format, and then, for some inexplicable reason, the output does the opposite. It’s frustrating.” The deterministic approach simply doesn’t gel with Generative AI, leading to a steep learning curve for engineers. Traceloop’s mission now is to bridge this gap, introducing a semblance of predictability into the unpredictable realm of LLMs.
As for their clientele, Gazit remains tight-lipped, “We’re in collaboration with a few clients, but we’re not ready to announce anything just yet.” Their primary focus is on products that automate tasks, such as code generation and content creation, akin to platforms like Notion AI. The overarching challenge is ensuring that changes made in the utilization of LLMs don’t disrupt existing behaviors. “Testing the output of LLMs is a complex issue,” Gazit admits. “Especially when you have a vast user base, the trial-and-error approach isn’t feasible. Rigor is essential.”
Reflecting on their Y Combinator experience, Gazit’s enthusiasm is palpable. “Being part of the post-COVID YC batch was transformative. We relocated to San Francisco and had the privilege of networking with industry stalwarts, including the founders of Stripe and Airbnb.” The YC network proved invaluable, connecting Traceloop with potential investors and clients. Gazit credits YC for instilling a customer-centric approach in them, emphasizing the importance of truly understanding user needs.
And Gazit’s advice for budding entrepreneurs, shaped by his YC experience, is succinct: “Don’t get lost in perfection. Launch, iterate, engage. Otherwise, you risk building something that nobody wants.”
In the coming days, Traceloop is set to debut their open-source version, Open LLMetry.