IndexCache

IndexCache

I

Active

Optimizing LLM inference for faster and cheaper long-context processing.

IndexCache

Optimizing LLM inference for faster and cheaper long-context processing.

Activestatus

About

IndexCache is a novel technique developed by researchers at Tsinghua University and Z.ai that significantly reduces computational redundancy in sparse attention models, leading to faster inference times and lower costs for large language models, especially those with long context windows. It achieves this by caching and reusing attention indices across transformer layers, addressing a key bottleneck in models like DeepSeek Sparse Attention (DSA).

Tags

Performance

Company Timeline

Metric

No timeline data for this period

Score Breakdown

10

Traction

0

Team

0

Visibility

10

Profile

0

Community

0

Discussion (0)

Join the discussion

No comments yet. Be the first to share your thoughts!

Frequently Asked Questions

What does IndexCache do?

IndexCache is a novel technique developed by researchers at Tsinghua University and Z.ai that significantly reduces computational redundancy in sparse attention models, leading to faster inference times and lower costs for large language models, especially those with long context windows. It achieves this by caching and reusing attention indices across transformer layers, addressing a key bottleneck in models like DeepSeek Sparse Attention (DSA).

What industry does IndexCache operate in?

IndexCache operates in Artificial Intelligence, Machine Learning, Software, AI Infrastructure.

Contact Info

Similar Startups

View all IndexCache alternatives →