#LLM Optimization

8 articles with this tag

TokenPilot: Reining in LLM Context Costs
AI Research

TokenPilot: Reining in LLM Context Costs

TokenPilot offers a dual-granularity context management framework, slashing LLM inference costs by up to 87% while preserving performance.

17 days ago
Compute Once: Unlocking AI Agent Efficiency
AI Research

Compute Once: Unlocking AI Agent Efficiency

A radical proposal to precompute LLM KV caches, slashing inference costs by up to 50x and enabling a new compute-efficient AI agent paradigm.

20 days ago
Unlocking Ultra-Long Context for LLMs
AI Research

Unlocking Ultra-Long Context for LLMs

MiniMax Sparse Attention breaks the context window barrier for LLMs, enabling millions of tokens with significant compute reduction and practical speedups.

21 days ago
MobileMoE LLMs Redefine On-Device AI
AI Research

MobileMoE LLMs Redefine On-Device AI

MobileMoE LLMs redefine on-device AI, setting new performance and efficiency benchmarks for sub-billion parameter models on smartphones.

about 1 month ago
Faster LLMs by Reshaping Sparsity
Technology

Faster LLMs by Reshaping Sparsity

Sakana AI and NVIDIA unveil a new method that reshapes sparsity in LLMs to boost GPU efficiency, achieving over 20% speedups.

about 2 months ago
LLM Reasoning Fix: LPSR
AI Research

LLM Reasoning Fix: LPSR

Latent Phase-Shift Rollback (LPSR) corrects LLM reasoning errors at inference with no fine-tuning, boosting accuracy and efficiency.

2 months ago
Prism: Symbolic Superoptimization for Tensors
AI Research

Prism: Symbolic Superoptimization for Tensors

Prism, a novel symbolic superoptimizer, uses sGraphs to represent tensor program families, achieving significant speedups and reduced optimization time for LLM workloads.

3 months ago
Beyond Token Count: Semantic Compression for LLMs
AI Research

Beyond Token Count: Semantic Compression for LLMs

Researchers recast LLM reasoning as lossy compression using the Conditional Information Bottleneck (CIB), employing semantic surprisal for efficient token pruning.

4 months ago