IndexCache
IndexCache
I
Active

Optimizing LLM inference for faster and cheaper long-context processing.

About
IndexCache is a novel technique developed by researchers at Tsinghua University and Z.ai that significantly reduces computational redundancy in sparse attention models, leading to faster inference times and lower costs for large language models, especially those with long context windows. It achieves this by caching and reusing attention indices across transformer layers, addressing a key bottleneck in models like DeepSeek Sparse Attention (DSA).

Tags

Performance

Company Timeline

No timeline data for this period

Score Breakdown
10
Traction
0
Team
0
Visibility
10
Profile
0
Community
0
Discussion (0)

Join the discussion

No comments yet. Be the first to share your thoughts!

Frequently Asked Questions
What does IndexCache do?
IndexCache is a novel technique developed by researchers at Tsinghua University and Z.ai that significantly reduces computational redundancy in sparse attention models, leading to faster inference times and lower costs for large language models, especially those with long context windows. It achieves this by caching and reusing attention indices across transformer layers, addressing a key bottleneck in models like DeepSeek Sparse Attention (DSA).
What industry does IndexCache operate in?
IndexCache operates in Artificial Intelligence, Machine Learning, Software, AI Infrastructure.
Contact Info
Similar Startups