Tensormesh exits stealth with $4.5M to slash AI inference caching costs

The generative AI gold rush has an expensive secret: running the models costs a fortune.

2 min read
Tensormesh exits stealth with $4.5M to slash AI inference caching costs

The generative AI gold rush has an expensive secret: running the models costs a fortune. While training gets the headlines, the day-to-day cost of inference—the process of getting an answer from a model—is a massive and growing GPU drain for enterprises. A new startup, Tensormesh, is emerging from stealth today with $4.5 million in seed funding to tackle this problem head-on with a technique known as AI inference caching.

Tensormesh, founded by researchers from the University of Chicago, UC Berkeley, and Carnegie Mellon, claims its platform can cut AI inference costs and latency by up to 10x. The company is commercializing the work behind LMCache, a popular open-source project for KV-caching that’s already integrated into frameworks like vLLM and used by companies including Redis, Red Hat, and WEKA.

The Caching Layer for LLMs

At its core, Tensormesh works by eliminating redundant computation. When multiple users ask similar questions or a prompt builds on previous turns in a conversation, large language models often re-calculate the same intermediate data. Tensormesh’s AI inference caching system captures this data—specifically the key-value cache—and reuses it for subsequent requests, drastically reducing the load on expensive GPUs. This can slash time-to-first-token and make repeated queries nearly instantaneous.

“Enterprises everywhere are wrestling with the huge costs of AI inference,” said Ion Stoica, an advisor to the company and co-founder of Databricks, in a statement. “Tensormesh’s approach delivers a fundamental breakthrough in efficiency and is poised to become essential infrastructure for any company betting on AI.”

The platform is designed as a middle ground for companies that don't want to send sensitive data to third-party APIs but also can't afford to build a hyper-optimized inference stack from scratch. It’s cloud-agnostic and can be deployed as a SaaS product or as standalone software, giving teams control over their own infrastructure.

As inference workloads continue to explode, tools that add a layer of efficiency to the AI stack are becoming critical. Caching is a classic computer science solution for performance bottlenecks, and Tensormesh is betting it can become a standard, must-have component for making enterprise AI economically viable at scale. The company’s beta is available now.