Skip to main content
Back to News
Xiaomi launches MiMo-V2.5-Pro-UltraSpeed model achieving 1,000+ tokens/s throughput on general-purpose GPUs
Technology
2 min read
CN

Xiaomi launches MiMo-V2.5-Pro-UltraSpeed model achieving 1,000+ tokens/s throughput on general-purpose GPUs

The AMW Read

Novelty 2: Xiaomi is already a mapped player with MiMo models, but this speed claim at 1T parameters on general-purpose GPUs meaningfully updates the inference-efficiency baseline. Significance 3: Resets enterprise-production latency expectations and pressures the entire Chinese foundation-model fie
NoveltySignificance
Foundation Models · Case StudiesFoundation Models · Recurring PatternsCompute Economics

Xiaomi launches MiMo-V2.5-Pro-UltraSpeed model achieving 1,000+ tokens/s throughput on general-purpose GPUs

Chinese consumer electronics and AI company Xiaomi has released MiMo-V2.5-Pro-UltraSpeed, a high-speed variant of its flagship MiMo-V2.5-Pro model. The 1-trillion-parameter model supports 1M-token context windows and delivers over 1,000 tokens per second (TPS) of single-API inference throughput on standard GPUs, without relying on custom silicon. According to third-party testing by QbitAI, the model generated a complete 500-line web app including thinking time in seven seconds, and sustained output speeds exceeding 3,300 TPS peak. Xiaomi attributes the performance to a full-stack co-design spanning model architecture (hybrid sliding-window attention reducing compute to ~1/7th of full attention), FP4 quantization on expert modules, speculative decoding with parallel drafting (its DFlash scheme), and GPU-level optimizations including persistent kernel execution and warp specialization.

Why it matters: Xiaomi's achievement directly attacks the longstanding tradeoff between model quality, inference speed, and hardware generality — a structural tension that has constrained enterprise deployment of frontier models in latency-sensitive domains like high-frequency trading, real-time fraud detection, and ad-tech bidding. By demonstrating that a 1T-parameter model can run at 1,000+ TPS on merchant GPUs without resorting to custom ASICs (as Groq does), Xiaomi positions itself as a credible contender in the foundation-model inference-efficiency race. This narrative arc — from leading open-source model (MiMo topping global rankings), to aggressive price cuts on MiMo-2.5, to this speed breakthrough — signals a deliberate, system-level assault on the commercialization barriers that have kept large models out of production workloads. The pattern echoes the hyperscaler-distribution moat logic: proprietary inference optimization that compounds with each new model generation and each deployment scale.

For the AI market, Xiaomi's move sharpens a critical open debate about whether inference speed or raw capability will differentiate frontier models in the next 18 months. If the 1,000+ TPS threshold proves stable and reproducible across diverse workloads, it could reset enterprise expectations for what 'production-ready' means — pushing rivals like ByteDance, Alibaba, and Tencent to accelerate their own inference optimization investment or risk losing low-latency use cases to Xiaomi's stack. The fact that the optimization is model- and hardware-agnostic (transferable to future GPU generations) suggests a widening moat for Xiaomi's AI platform, analogous to how OpenAI's API latency improvements created lock-in for developer workflows.

#Xiaomi#MiMo-V2.5-Pro-UltraSpeed#inference speed#foundation models#China AI competition#model optimization
Read Original

How This Connects

Based on Foundation Models · Case Studies

  1. 17h agoXiaomi launches MiMo-V2.5-Pro-UltraSpeed model achieving 1,000+ tokens/s throughput on general-purpose GPUs · THIS ARTICLE
  2. 3d agoApple AI runs on Nvidia chips. At a WWDC 2026 tech talk, Apple disclosed that its Private Cloud Comp...
  3. 4d agoOpenAI proposes mandatory AI safety assessment framework, diverging from Trump administration's voluntary NSA-led approachOpenAI
  4. 1w agoDeepSeek in talks to raise $7 billion from Tencent, CATL and other investorsDeepSeek
  5. 2w agoAnthropic nears US$30 billion funding round, surpassing OpenAI as most valuable AI startupAnthropic
  6. 1mo agoAnthropic and Blackstone Launch Joint Venture to Accelerate Claude Adoption Among SMEsAnthropic

Related News

Discover AI Startups

Explore 2,000+ AI companies with VC-grade analysis, funding data, and investment insights.

Explore Dashboard