DeepSeek releases V4-Pro and V4-Flash models with multi-chip support
The AMW Read
DeepSeek is a canonical case-study (01.§4). The multi-chip adaptation advances the cross-substrate compute economics (cross.§A) and structural forces around chip dependency (01.§3.5). Novelty=2 because it's incremental to the V4 series but impactful for ecosystem; significance=2 as it affects segmen
DeepSeek releases V4-Pro and V4-Flash models with multi-chip support
DeepSeek today unveiled DeepSeek-V4-Pro, a 1.86-trillion-parameter flagship model, and DeepSeek-V4-Flash, a 284-billion-parameter efficient MoE model. The V4-Flash uses hybrid attention mechanisms (CSA+HCA), manifold-constrained hyperconnections, and Muon optimizer, pre-trained on over 32 trillion tokens. Crucially, the Beijing Academy of Artificial Intelligence (BAAI)'s FlagOS system has completed Day-0 adaptation of V4-Flash across eight AI chips—including Haiguang, Muxi, Huawei Ascend, Moore Threads, Kunlun Core, Pingtouge Zhenwu, Tianshu, and NVIDIA (via FP8)—and is working on V4-Pro migration.
This event signals the maturation of China's multi-chip inference ecosystem. FlagOS's three technical breakthroughs—FlagGems full-operator replacement eliminating CUDA dependency, independent tensor parallelism for grouped output projections, and FP4-to-BF16 precision conversion—enable V4-Flash to run on domestic chips lacking FP4 support (only NVIDIA Blackwell+ supports FP4 natively). This directly addresses the capital-compression arc where Chinese AI labs face GPU supply constraints and must optimize for heterogeneous hardware. The pattern mirrors the 'context-engineering moat' seen in other segments, now applied at the silicon abstraction layer, reducing vendor lock-in for enterprise deployments.
The expert take: By decoupling frontier model performance from specific chip requirements, FlagOS exemplifies the 'infrastructure as moat' strategy. This could accelerate enterprise AI adoption in China, where heterogeneous GPU fleets are common. However, the true test will be inference cost-per-token parity with NVIDIA-native deployments. If FlagOS achieves near-lossless performance, it validates a playbook other state-backed AI initiatives may replicate, potentially reshaping global AI infrastructure competition.
