#Inference Optimization
3 articles with this tag

Technology
Cloudflare Unweights LLMs by 22%
Cloudflare's 'Unweight' system slashes LLM model sizes by up to 22% using lossless compression, enhancing inference speed and efficiency.
about 2 months ago
AI Research
LLM Adaptation Without Retraining
In-Place Test-Time Training enables LLMs to adapt to new data at inference without retraining, enhancing performance and paving the way for continual learning.
about 2 months ago

AI Research
GPT-OSS-Puzzle-88B: Faster AI, Same Brains
GPT-OSS-Puzzle-88B offers substantial inference speedups for large language models without sacrificing accuracy, utilizing techniques like MoE pruning and window attention.
4 months ago