TurboQuant

Name: TurboQuant - StartupHub.ai Profile Assessment
Brand: TurboQuant
Rating: 3.3 (6 reviews)

TurboQuant15

Rate

Google Research's AI compression algorithm that drastically reduces LLM memory requirements by compressing the KV cache.

“Got MTP + TurboQuant running — Qwen3.6-27B -- 80+ t/s at 262K context on a single RTX 4090. So I've been messi…”r/LocalLLaMA

Website

DR 97Active

Rate

Advertise here

About

TurboQuant is a novel AI compression algorithm developed by Google Research. It significantly reduces the memory requirements for large language models (LLMs) by employing advanced quantization techniques, such as PolarQuant and Quantized Johnson, to compress the key-value (KV) cache without compromising performance.

Frequently asked

What does TurboQuant do?

What industry does TurboQuant operate in?

TurboQuant operates in AI Foundation & Compute, Large Language Model, Generative AI, Transformer Architecture, AI Infrastructure, Vector Search.

Does TurboQuant have an affiliate program?

Yes, TurboQuant has an affiliate program. The program runs on Self-hosted affiliate. Register here: https://github.com/google-research/google-research/tree/master/turbo_quant/affiliates

Comments

(6)

2 positive2 mixed2 negative

r/LocalLLaMAu/Haunting-Stretch8069May 19, 2026Negative

“TurboQuant on 16 GB VRAM. I've got Qwen3.6-27B IQ4_XS (14.7 GB, cHunter789's build) on an RX 7800 XT with ROCm 7.1. Display on iGPU, full 16 GB available for compute. Currently running 64K context with q8_0/q4_0 KV cache and ~915 MiB to spa…”

View on Reddit

r/LocalLLaMAu/indrasmirrorMay 8, 2026Positive💎

“Got MTP + TurboQuant running — Qwen3.6-27B -- 80+ t/s at 262K context on a single RTX 4090. So I've been messing around trying to get MTP working alongside TBQ4_0 (TurboQuant's lossless 4.25 bpv KV cache) on Qwen3.6-27B for my own use. So a…”

View on Reddit

r/LocalLLaMAu/StupidScaredSquirrelApr 24, 2026Mixed

“Turboquant on llama.cpp?. Now that the financebro hype has faded, is there an implementation of turboquant for llama.cpp somewhere? Saving even 50% of kv cache memory would be nice.”

View on Reddit

r/LocalLLaMAu/ZarzouApr 22, 2026Negative

“Qwen3.6 does not like Turboquant. https://preview.redd.it/67aud1op3nwg1.png?width=1678&format=png&auto=webp&s=9e584afb7c5aae71c2daed934823c85087dd7009 I've tried a prompt with llamma.cpp, ik_llama.cpp and TheTom/turboquant - I have 2 GPU (3…”

View on Reddit

r/LocalLLaMAu/Interesting-Print366Apr 4, 2026Mixed

“Is Turboquant really a game changer?. I am currently utilizing qwen3.5 and Gemma 4 model. Realized Gemma 4 requires 2x ram for same context length. As far as I understand, what turbo quant gives is quantizing kv cache into about 4 bit and m…”

View on Reddit

r/LocalLLaMAu/gladkosMar 27, 2026Positive

“Google TurboQuant running Qwen Locally on MacAir. Hi everyone, we just ran an experiment. We patched llama.cpp with Google’s new TurboQuant compression method and then ran Qwen 3.5–9B on a regular MacBook Air (M4, 16 GB) with 20000 tokens c…”

View on Reddit

Some comments are pulled from public discussions around the web (look for the source icon). Quotes are excerpts; click through to read the full thread.