Alibaba unveils Qwen3.7-Max agent model with 35-hour autonomous task execution

Alibaba has released Qwen3.7-Max, a foundation model optimized for agentic tasks including code generation, debugging, multi-file software engineering, office automation, and long-horizon autonomous execution. The model achieved 60.6 on SWE-Pro and 80.4 on SWE-Verified, matching Anthropic's Claude Opus 4.6 Max on the latter. In an extended demonstration, Qwen3.7-Max ran for 35 hours, making 1,158 tool calls and 432 kernel evaluations to autonomously optimize a GPU kernel, delivering a 10x performance improvement over baseline implementations. Alibaba attributes the advance to an 'environment scaling' strategy that trains across diverse real-world agentic setups rather than optimizing for narrow benchmarks.

Why it matters: This release positions Alibaba as a serious contender in the AI agent foundation-model race, directly challenging Western frontier labs on autonomous coding and long-duration task execution. The 'environment scaling' approach represents a deliberate departure from single-benchmark optimization, mirroring the pattern of context-engineering moat-building (Recurring Patterns) where model providers differentiate on robustness across varied agent frameworks like Claude Code, OpenClo, and Qwen Code. However, Qwen3.7-Max still trails top US frontier models on LM Arena, underscoring the persistent gap in the foundation-layer leaderboard even as Chinese labs close the gap on specific agentic benchmarks.

Expert take: Alibaba is threading a narrow needle — asserting technical parity on agentic tasks while the broader foundation-model gap with Anthropic and OpenAI remains. The 35-hour autonomous kernel optimization is impressive as a demonstration of reliable long-horizon execution, a capability that has been the 'holy grail' for AI agents. But the real market signal is Alibaba's ability to package this within the existing cloud API ecosystem, supporting OpenAI and Anthropic API compatibility plus the preserve_thinking feature. This lowers switching costs for enterprise developers, a classic hyperscaler-distribution play. The fact that the agent outperformed prior-gen models by 5.9x in a simulated startup revenue benchmark (YC-Bench) suggests Alibaba is targeting practical business automation, not just coding benchmarks. The open-weight strategy embedded in the broader Qwen lineage also pressures closed competitors on pricing, even if Qwen3.7-Max itself is initially API-only.

#Alibaba #Qwen3.7-Max #AIagents #foundation-models #autonomous-coding #China-AI #hyperscaler-distribution

Alibaba unveils Qwen3.7-Max agent model with 35-hour autonomous task execution

The AMW Read

How This Connects

Related News

Saltware unveils AI-driven Kubernetes operations automation strategy for application deployment and fault analysis

SpaceXAI, the rebranded AI division of Elon Musk's SpaceX, has launched Grok 4.5, described as an ag...

Anthropic introduces usage-based pricing for Claude Fable 5, ending flat-rate AI subscription era

OpenAI rolls out GPT-5.6 and announces ChatGPT Work agent after government clearance. OpenAI has rec...

Corvic AI launches V5 to turn one-off prompts into repeatable workflows

Discover AI Startups