Together AI Masters MiniMax M3 Inference

Together AI details engineering feats enabling efficient MiniMax M3 inference, unlocking 1M-token context and multimodality.

6 min read
Together AI logo next to MiniMax logo with abstract AI graphics
Together AI partners with MiniMax for efficient M3 model inference.· Together AI

Together AI is positioning itself as the go-to platform for demanding large language models, announcing its role as the preferred cloud partner for MiniMax's latest M3 model. The company has detailed significant engineering breakthroughs enabling efficient MiniMax M3 inference, unlocking the model's ambitious 1 million token context window and native multimodal capabilities.

Visual TL;DR. MiniMax M3 Demands leads to Extreme Context. MiniMax M3 Demands leads to Native Multimodality. Extreme Context addressed by MiniMax Sparse Attention. Native Multimodality addressed by MiniMax Sparse Attention. MiniMax Sparse Attention supported by Together AI Platform. MiniMax Sparse Attention enables Efficient Inference. Together AI Platform enables Efficient Inference. Efficient Inference leads to Advanced AI Unlocked.

Related startups

  1. MiniMax M3 Demands: advanced coding, agentic workflows, multimodal reasoning needs
  2. Extreme Context: unlocking 1 million token context window
  3. Native Multimodality: rich input processing requirements for diverse data
  4. MiniMax Sparse Attention: novel mechanism reducing computational burden of long contexts
  5. Together AI Platform: preferred cloud partner for MiniMax M3
  6. Efficient Inference: enabling complex systems challenges for cutting-edge AI
  7. Advanced AI Unlocked: powering demanding large language models
Visual TL;DR
Visual TL;DR — startuphub.ai MiniMax M3 Demands leads to Extreme Context. MiniMax M3 Demands leads to Native Multimodality. Extreme Context addressed by MiniMax Sparse Attention. Native Multimodality addressed by MiniMax Sparse Attention. MiniMax Sparse Attention enables Efficient Inference addressed by addressed by enables MiniMax M3 Demands Extreme Context Native Multimodality MiniMax Sparse Attention Efficient Inference From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai MiniMax M3 Demands leads to Extreme Context. MiniMax M3 Demands leads to Native Multimodality. Extreme Context addressed by MiniMax Sparse Attention. Native Multimodality addressed by MiniMax Sparse Attention. MiniMax Sparse Attention enables Efficient Inference addressed by addressed by enables MiniMax M3Demands Extreme Context NativeMultimodality MiniMax SparseAttention EfficientInference From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai MiniMax M3 Demands leads to Extreme Context. MiniMax M3 Demands leads to Native Multimodality. Extreme Context addressed by MiniMax Sparse Attention. Native Multimodality addressed by MiniMax Sparse Attention. MiniMax Sparse Attention enables Efficient Inference addressed by addressed by enables MiniMax M3 Demands advanced coding, agentic workflows,multimodal reasoning needs Extreme Context unlocking 1 million token context window Native Multimodality rich input processing requirements fordiverse data MiniMax Sparse Attention novel mechanism reducing computationalburden of long contexts Efficient Inference enabling complex systems challenges forcutting-edge AI From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai MiniMax M3 Demands leads to Extreme Context. MiniMax M3 Demands leads to Native Multimodality. Extreme Context addressed by MiniMax Sparse Attention. Native Multimodality addressed by MiniMax Sparse Attention. MiniMax Sparse Attention enables Efficient Inference addressed by addressed by enables MiniMax M3Demands advanced coding,agentic workflows,multimodal… Extreme Context unlocking 1 milliontoken contextwindow NativeMultimodality rich inputprocessingrequirements for… MiniMax SparseAttention novel mechanismreducingcomputational… EfficientInference enabling complexsystems challengesfor cutting-edge AI From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai MiniMax M3 Demands leads to Extreme Context. MiniMax M3 Demands leads to Native Multimodality. Extreme Context addressed by MiniMax Sparse Attention. Native Multimodality addressed by MiniMax Sparse Attention. MiniMax Sparse Attention supported by Together AI Platform. MiniMax Sparse Attention enables Efficient Inference. Together AI Platform enables Efficient Inference. Efficient Inference leads to Advanced AI Unlocked addressed by addressed by supported by enables enables MiniMax M3 Demands advanced coding, agentic workflows,multimodal reasoning needs Extreme Context unlocking 1 million token context window Native Multimodality rich input processing requirements fordiverse data MiniMax Sparse Attention novel mechanism reducing computationalburden of long contexts Together AI Platform preferred cloud partner for MiniMax M3 Efficient Inference enabling complex systems challenges forcutting-edge AI Advanced AI Unlocked powering demanding large language models From startuphub.ai · The publishers behind this format
Visual TL;DR — startuphub.ai MiniMax M3 Demands leads to Extreme Context. MiniMax M3 Demands leads to Native Multimodality. Extreme Context addressed by MiniMax Sparse Attention. Native Multimodality addressed by MiniMax Sparse Attention. MiniMax Sparse Attention supported by Together AI Platform. MiniMax Sparse Attention enables Efficient Inference. Together AI Platform enables Efficient Inference. Efficient Inference leads to Advanced AI Unlocked addressed by addressed by supported by enables enables MiniMax M3Demands advanced coding,agentic workflows,multimodal… Extreme Context unlocking 1 milliontoken contextwindow NativeMultimodality rich inputprocessingrequirements for… MiniMax SparseAttention novel mechanismreducingcomputational… Together AIPlatform preferred cloudpartner for MiniMaxM3 EfficientInference enabling complexsystems challengesfor cutting-edge AI Advanced AIUnlocked powering demandinglarge languagemodels From startuphub.ai · The publishers behind this format

This collaboration highlights Together AI's commitment to tackling complex systems challenges for cutting-edge AI. MiniMax M3, designed for advanced coding, agentic workflows, and multimodal reasoning, presents unique serving demands, particularly with its extended context length and rich input processing requirements.

Engineering for Extreme Context and Multimodality

The core of MiniMax M3's efficiency challenge lies in its novel MiniMax Sparse Attention (MSA) mechanism. This architecture reduces the computational burden of long contexts by limiting the tokens each query attends to, a critical departure from quadratic scaling. Together AI's team developed a KV-Block-Major sparse attention kernel to optimize this, improving arithmetic intensity by reorganizing data flow.

Further enhancing long-context handling, Together AI integrated MSA with paged attention. This allows for dynamic KV cache management, crucial for variable request lengths, and reportedly yielded a 5% boost in decode throughput.

The model's multimodal capabilities necessitated a dedicated preprocessing pipeline. A new Rust-based Serving Model Gateway (SMG) now handles image and video decoding, resizing, and patching on the CPU. This offloads GPU resources, ensuring the inference engine focuses on generation.

These optimizations collectively resulted in performance improvements of 81% to 125% across various concurrency levels for agentic-style workloads, according to Together AI's internal benchmarks.

Together AI will host the open-weights MiniMax M3 model as a developer endpoint upon its public release.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.