1 articles with this tag
GPT-OSS-Puzzle-88B offers substantial inference speedups for large language models without sacrificing accuracy, utilizing techniques like MoE pruning and window attention.