The generative AI gold rush has an expensive secret: running the models costs a fortune. While training gets the headlines, the day-to-day cost of inference—the process of getting an answer from a model—is a massive and growing GPU drain for enterprises. A new startup, Tensormesh, is emerging from stealth today with $4.5 million in seed funding to tackle this problem head-on with a technique known as AI inference caching.
Tensormesh, founded by researchers from the University of Chicago, UC Berkeley, and Carnegie Mellon, claims its platform can cut AI inference costs and latency by up to 10x. The company is commercializing the work behind LMCache, a popular open-source project for KV-caching that’s already integrated into frameworks like vLLM and used by companies including Redis, Red Hat, and WEKA.
